In this article, we will see how to use Python code to remove word ambiguity using the Lesk algorithm.
For example, in the sentences below, the word “bank” has different meanings based on the context of the sentence.
Text1 = 'I went to the bank to deposit my money'
Text2 = 'The river bank was full of dead fishes'
The Lesk algorithm is the seminal dictionary-based method.
This is the definition from Wikipedia: "It is based on the hypothesis that words used together in text are related to each other and that the relation can be observed in the definitions of the words and their senses. Two (or more) words are disambiguated by finding the pair of dictionary senses with the greatest word overlap in their dictionary definitions. It searches for the shortest path between two words: the second word is iteratively searched among the definitions of every semantic variant of the first word, then among the definitions of every semantic variant of each word in the previous definitions and so on.Finally, the first word is disambiguated by selecting the semantic variant which minimizes the distance from the first to the second word."
Basically, the context is chosen from meaning of the nearest words. Following is the simplified pictorial representation of the same...
Let's see the code to implement the Lesk algorithm in Python.
First install the library pywsd - python implementation of Word Sense Disambiguation (WSD)
-
- pip install pywsd
-
-
- from pywsd.lesk import simple_lesk
- sentences = ['I went to the bank to deposit my money',
- 'The river bank was full of dead fishes']
-
- print ("Context-1:", sentences[0])
- answer = simple_lesk(sentences[0],'bank')
- print ("Sense:", answer)
- print ("Definition : ", answer.definition())
Result -
Context-1
I went to the bank to deposit my money
Sense - Synset ('depository_financial_institution.n.01')
Definition - a financial institution that accepts deposits and channels the money into lending activities
- print ("Context-2:", sentences[1])
- answer = simple_lesk(sentences[1],'bank')
- print ("Sense:", answer)
- print ("Definition : ", answer.definition())
Context-2
The river bank was full of dead fishes
Sense - Synset ('bank.n.01')
Definition - sloping land (especially the slope beside a body of water)
Observe that in context-1, “bank” is a financial institution, but in context-2, “bank” is sloping land.
Another example,
new_sentences = ['The workers at the plant were overworked',
'The plant was no longer bearing flowers',
'The workers at the industrial plant were overworked']
-
- print ("Context-1:", new_sentences[0])
- answer = simple_lesk(new_sentences[0],'plant')
- print ("Sense:", answer)
- print ("Definition : ", answer.definition())
Result -- not exactly as expected
Context-1
The workers at the plant were overworked Sense: Synset('plant.v.06') Definition : put firmly in the mind
- print ("Context-2:", new_sentences[1])
- answer = simple_lesk(new_sentences[1],'plant')
- print ("Sense:", answer)
- print ("Definition : ", answer.definition())
Result -- as expected
Context-2
The plant was no longer bearing flowers Sense: Synset('plant.v.01') Definition : put or set (seeds, seedlings, or plants) into the ground
- print ("Context-3:", new_sentences[2])
- answer = simple_lesk(new_sentences[2],'plant')
- print ("Sense:", answer)
- print ("Definition : ", answer.definition())
Result -- as expected. One extra word can make a difference in context.
Context-2
The workers at the industrial plant were overworked Sense: Synset('plant.n.01') Definition : buildings for carrying on industrial labor
Simple Lesk is somewhere in between using more than the original Lesk algorithm (1986) and using fewer signature words than adapted Lesk (Banerjee and Pederson, 2002)