“Stop words” is a term used in the realm of search engine optimization[1] (SEO) and data processing. These are common function words like ‘and’, ’the’, ‘in’, which are often removed from queries to save space and time in data processing. This concept has roots in creating concordances and has been developed over time by various researchers. Notably, Hans Peter Luhn is credited with coining the phrase and C.J. Van Rijsbergen proposed the first standardized list of these words. Today, the use of stop words has evolved with the advancement of machine learning[2]. While they were initially removed for faster query processing, search engines like Google[3] now advise against worrying about stop words and encourage writing in a natural way. They are still used in specific circumstances like narrowing search results. This concept is related to other topics like concept mining, information extraction, and query expansion.
Stop words are the words in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are deemed insignificant. There is no single universal list of stop words used by all natural language processing tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. The "general trend in [information retrieval] systems over time has been from standard use of quite large stop lists (200–300 terms) to very small stop lists (7–12 terms) to no stop list whatsoever".