TF-IDF (term frequency Inverse document frequency) is a popular approach that can be leveraged to eliminate stop words. This technique is language independent. The intuition here is that commonly occurring words, that occur in almost all documents are stop words. On the other hand, words that occur commonly, but only in some of the documents…
If you don’t have a stop-word dictionary or are working on a new language, what approach would you take to remove stop words?
Posted on