When all words are not converted to a single case, the vocabulary size will increase drastically as words like Up/up or Fast/fast or This/this will be treated differently which isn’t a correct behaviour for the NLP task.
Sparsity is higher when building the language model since the cat is treated differently from The cat. Suppose we are building an ngram model, we might end up with many ngrams in test set that never appeared in the training set.
Note: There are some situations where we do not want to do case correction too ! For instance, if you are working on tasks such as sentiment (“TERRIBLE” probably sounds worse than “terrible” ) or text generation, the case might play an important role and might be best not to normalize it.