How to remove stop words with NLTK in Python

Spread the love

In this we will learn, how to write a program to removing stop words with NLTK in Python. Here we are using nltk library for this program.

What are Stop words?

Stop word are most common used words like a, an, the, in etc.

First we need to import the stopwords and word tokentize. We have to set those stopwords, then we have to split the sentence into words. Then we need to remove those stopwords from given text using for loop. In this program we are using English language, you can use other languages also.

Program :



from nltk.corpus import stopwords
from nltk import word_tokenize
stop_words = set(stopwords.words('english'))

print(stop_words)
text = word_tokenize("The quick brown fox jumps over the lazy dog")
#print(nltk.pos_tag(text))

new_sentence =[]

for w in text:
if w not in stop_words:
new_sentence.append(w)

print(text)
print(new_sentence)

Output:

{‘whom’, “you’d”, ‘them’, ‘ve’, “isn’t”, ‘some’, ‘was’, ‘are’, ‘been’, “don’t”, “shan’t”, ‘myself’, ‘by’, ‘until’, ‘who’, ‘is’, “needn’t”, “shouldn’t”, “wouldn’t”, ‘won’, ‘just’, ‘did’, ‘themselves’, ‘how’, ‘nor’, ‘over’, ‘before’, ‘further’, ‘above’, ‘same’, ‘haven’, ‘or’, ‘of’, ‘re’, ‘shan’, “mustn’t”, ‘ourselves’, ‘yourself’, ‘being’, ‘be’, “won’t”, ‘s’, ‘its’, ‘so’, ‘up’, ‘now’, ‘where’, ‘theirs’, ‘do’, ‘more’, ‘too’, ‘here’, ‘should’, ‘herself’, ‘at’, ‘off’, ‘there’, ‘she’, ‘has’, ‘to’, “hasn’t”, “couldn’t”, ‘wouldn’, ‘ain’, ‘because’, ‘for’, ‘not’, ‘mustn’, ‘t’, ‘again’, ‘hasn’, ‘itself’, ‘can’, ‘isn’, ‘ours’, ‘had’, ‘their’, “it’s”, ‘no’, ‘his’, ‘down’, ‘after’, “wasn’t”, ‘does’, ‘on’, ‘all’, ‘me’, ‘him’, ‘ll’, ‘you’, ‘shouldn’, “you’re”, ‘once’, “doesn’t”, ‘an’, ‘her’, ‘below’, ‘this’, ‘didn’, ‘y’, “didn’t”, ‘each’, “should’ve”, ‘weren’, ‘with’, “hadn’t”, ‘in’, ‘against’, ‘hers’, ‘doesn’, ‘your’, ‘o’, ‘have’, ‘the’, ‘out’, ‘into’, ‘why’, “aren’t”, ‘what’, ‘but’, ‘hadn’, ‘few’, ‘from’, ‘any’, ‘than’, “haven’t”, ‘himself’, “you’ll”, ‘own’, ‘he’, ‘very’, ‘as’, ‘ma’, ‘yourselves’, ‘those’, ‘about’, ‘we’, ‘our’, ‘needn’, ‘having’, ‘most’, ‘wasn’, ‘mightn’, ‘which’, ‘while’, ‘then’, ‘will’, ‘during’, “weren’t”, ‘m’, ‘both’, ‘a’, ‘these’, ‘couldn’, “she’s”, ‘that’, ‘doing’, ‘if’, ‘aren’, ‘were’, ‘i’, ‘yours’, ‘when’, ‘and’, ‘through’, “you’ve”, ‘only’, ‘don’, “mightn’t”, ‘am’, ‘my’, ‘such’, ‘under’, ‘d’, ‘between’, ‘it’, “that’ll”, ‘they’, ‘other’}
[‘The’, ‘quick’, ‘brown’, ‘fox’, ‘jumps’, ‘over’, ‘the’, ‘lazy’, ‘dog’]
[‘The’, ‘quick’, ‘brown’, ‘fox’, ‘jumps’, ‘lazy’, ‘dog’]

Advantages :

If we are doing sentiment analysis for movie reviews or twitter analysis or any other , we need to remove these stopwords in the given text. It will help us to get accurate analysis to build better models.

How to remove stop words with NLTK in Python

What are Stop words?

Program :

Advantages :

admin

Top 10 Advantages of Natural Language Processing(NLP)

Udacity Natural Language Processing Nanodegree Review

64 Natural language processing interview questions and answers | 2019

Nltk FreqDist Function with example

How to tokenize tweets in python nltk ?

How to remove punctuation and stopwords in python nltk

Leave a Reply Cancel reply

Best Courses

Best Nanodegree Reviews

Latest Courses

Trending Courses

How to remove stop words with NLTK in Python

What are Stop words?

Program :

Advantages :

Related posts:

admin

Top 10 Advantages of Natural Language Processing(NLP)

Udacity Natural Language Processing Nanodegree Review

64 Natural language processing interview questions and answers | 2019

Nltk FreqDist Function with example

How to tokenize tweets in python nltk ?

How to remove punctuation and stopwords in python nltk

Leave a Reply Cancel reply