How to remove punctuation and stopwords in python nltk

Spread the love




In this tutorial, You will learn how to write a program to remove punctuation and stopwords in python using nltk library.

How to remove punctuation in python nltk

We will regular expression with wordnet library.

from nltk.tokenize import RegexpTokenizer tokenizer = RegexpTokenizer(r'\w+') result = tokenizer.tokenize('hey! how are you ? buddy') print(result)

Output:

[‘hey’, ‘how’, ‘are’, ‘you’, ‘buddy’]

How to remove stopwords in python nltk

from nltk.corpus import stopwords from nltk import word_tokenize stop_words = set(stopwords.words('english')) text = word_tokenize("The quick brown fox jumps over the lazy dog") #print(nltk.pos_tag(text)) new_sentence =[] for w in text: if w not in stop_words: new_sentence.append(w) print(text) print(new_sentence)




Output:

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
['The', 'quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

 

admin

admin

Leave a Reply

Your email address will not be published. Required fields are marked *