Lemmatizing In Natural Language Processing

We have been through the process of stemming in which we had reduced inflected words to their word stem (base form). There is a similar concept called lemmatizing. In stemming, there are chances of getting the non-existent word but in lemmatizing, we only get actual words. The resulting words we get after the process of lemmatizing are called lemmas.

So, the words you get from the process of stemming might not be found in the dictionary. But in case of lemmas, it is for sure that you will find them in your dictionary.

Let's see some examples.

  1. from nltk.stem import WordNetLemmatizer  
  2.   
  3. lemmatizer = WordNetLemmatizer()  
  4.   
  5. print(lemmatizer.lemmatize("cats"))  
  6. print(lemmatizer.lemmatize("cacti"))  
  7. print(lemmatizer.lemmatize("geese"))  
  8. print(lemmatizer.lemmatize("rocks"))  
  9. print(lemmatizer.lemmatize("python"))  
  10. print(lemmatizer.lemmatize("better", pos="a"))  
  11. print(lemmatizer.lemmatize("best", pos="a"))  
  12. print(lemmatizer.lemmatize("run"))  
  13. print(lemmatizer.lemmatize("run",'v'))  

We do have options to put the speech parameters on lemmas using ‘POS’. If not tagged, it will be a noun by default. One thing to note, you will get the closest lemma according to your POS.

Here is the list of POS tags:

POS tag list

CC     coordinating conjunction
CD     cardinal digit
DT     determiner
EX     existential there (like: "there is" ... think of it like "there exists")
FW     foreign word
IN     preposition/subordinating conjunction
JJ     adjective    'big'
JJR    adjective, comparative     'bigger'
JJS    adjective, superlative     'biggest'
LS     list marker  1)
MD     modal  could, will
NN     noun, singular 'desk'
NNS    noun plural  'desks'
NNP    proper noun, singular      'Harrison'
NNPS   proper noun, plural 'Americans'
PDT    predeterminer 'all the kids'
POS    possessive ending   parent\'s
PRP    personal pronoun    I, he, she
PRP$   possessive pronoun  my, his, hers
RB     adverb very, silently,
RBR    adverb, comparative better
RBS    Adverb, superlative best
RP     particle     give up
TO     to     go 'to' the store.
UH     interjection errrrrrrrm
VB     verb, base form     take
VBD    verb, past tense    took
VBG    verb, gerund/present participle  taking
VBN    verb, past participle      taken
VBP    verb, sing. present, non-3d      take
VBZ    verb, 3rd person sing. present   takes
WDT    wh-determiner which
WP     wh-pronoun   who, what
WP$    possessive wh-pronoun      whose
WRB    wh-abverb    where, when