The Porter Method - An Approach to Stemming in Information Retrieval and Text Analysis

The Porter Stemming Algorithm, or just “Porter Stemmer,” is a notable technique in information retrieval and natural language processing (NLP), focusing on condensing words to their root or base forms. Crafted by Martin Porter in 1979, this method involves the removal of affixes, such as suffixes and prefixes, from words, ensuring that words of similar connotations are grouped. This enhances text processing and optimizes the execution of text retrieval operations like database searches.

As a methodology in NLP, stemming focuses on simplifying inflected or derivative words down to their stems, roots, or base forms. Such a process is instrumental in boosting the performance and precision of systems, primarily in search engines and text analysis mechanisms. By doing so, stemming facilitates a better understanding and interpretation of the essence of words within documents or a collection of texts, thereby augmenting the efficacy of text-related analytics and optimizing information retrieval processes for relevance and effectiveness.

For instance, in the application of the Porter method, words such as “fishing,” “fished,” and “fisher” are all reduced to the stem “fish.” It’s also crucial to note that a stem doesn’t necessarily have to stand as a valid word. A prime illustration of this is seen in how the Porter algorithm simplifies words like “argue,” “argued,” “argues,” “arguing,” and “argus” down to the standard stem “argu.”

What is the aim of NLP stemming?

Stemming in Natural Language Processing (NLP) optimizes textual data processing and analysis in several ways. It promotes uniformity, search results, space efficiency, text processing and standardization. Stemming unites word forms to a common root, encouraging uniformity and reducing words to their basic meanings. This improves search engine results, saves storage space, and improves text processing performance and accuracy. Stemming also helps standardize literature by maintaining word consistency across texts and situations. Stemming has several benefits, but severe truncation might undermine word meanings. Stemming is crucial to NLP, text analytics, and information retrieval. Therefore, many companies and technologies use it in their text processing and search algorithms.

Let’s see each aspect of stemming:

  1. Uniformity: Words frequently change their forms for grammatical reasons (such as tense, plurality, and so on). Stemming guarantees that all spellings of the same term are recognized as a single underlying concept or unit.
    For example, the words “running,” “runner,” and “ran” would all be reduced to the stem “run.”
  2. Improving Search Results: Stemming aids in improving search results in information retrieval systems such as search engines. A search for “running” should return documents containing the words “run” or “ran.” It broadens the search and increases the likelihood of finding relevant information.
  3. Space Efficiency: Because many versions of a word are condensed to a single form, stemming can assist in minimizing the total size of the text or database. It saves storage space and increases processing speed.
  4. Improving Text Processing: Stemming can be very useful in text processing, notably in building text analytics models. It can simplify the model by reducing the vocabulary size. It causes a concentration on semantics or meaning rather than specific word forms.
  5. Text standardization: Stemming helps to standardize text. Different writers may use different spellings of the same term and stemming aids in preserving uniformity across texts.
  6. Drawbacks: There are certain drawbacks to stemming, such as the potential loss of meaning due to overly forceful truncation of words.

Various companies and products use Stemming in text processing and search algorithms, particularly those involved in search engines, text analytics, natural language processing, and information retrieval. Here are some examples of companies and products that use stemming:

  1. Search Engines
    • Google’s search algorithms are expected to employ stemming to improve search results by matching stemmed words with user queries.
    • Bing (Microsoft): Bing, like Google, employs stemming algorithms to improve the efficiency and relevance of its search results.
  2. E-commerce Platforms
    • Amazon: Amazon’s search and recommendation algorithms employ stemming to match products with user queries and preferences.
    • eBay: Stemming is used in eBay’s search functionality to help users find what they seek more quickly.
  3. Social Media Platforms
    • Twitter: Twitter may employ stemming for trend analysis and keyword categorization of messages.
    • Facebook: In numerous sections of its platform, such as search and content recommendation, Facebook uses stemming in information retrieval and text analysis.
  4. CMS (Content Management Systems) and Blogging Platforms
    • WordPress: WordPress plugins employ stemming to increase search functionality and SEO optimization.
  5. Learning Management Systems (LMS)
    • Platforms like Moodle or Blackboard may employ stemming to improve search capabilities and content categorization.
  6. Frameworks and libraries
    • NLTK (Natural Language Toolkit): A prominent Python natural language processing toolkit incorporating stemming techniques.
    • Apache Lucene: A search library that incorporates stemming capabilities and is utilized by various products and platforms.
    • Elasticsearch: Elasticsearch, based on Lucene, employs stemming in its full-text search capabilities.
  7. SEO Tools
    • Yoast SEO: A prominent WordPress SEO tool that may employ stemming to analyze and optimize content for search engines.
    • SEMrush, Ahrefs, and Moz: SEO analysis tools that may employ stemming to analyze and recommend keywords.
  8. Customer Service and CRM Systems
    • Zendesk, HubSpot: To better categorize and respond to client requests, these platforms may use stemming in their search and automated response systems.

Note: Because the facts of stemming utilization in private systems like those of Google, Amazon, and others are not publicly revealed, this is a broad generalization based on industry standards.

Here is a sample of stemming:

string sparkle = """
    Once in a vibrant valley dotted with flowers and rainbows, lived a magical pony named Sparkle. Sparkle wasn’t just any pony; she was blessed with wings that shimmered under the sunlight, and a mane that flowed like a waterfall. Every day, she would frolic around, leaving a trail of glitter and joy wherever she went.

    One curious morning, as the dew still clung to the petals, Sparkle discovered something unusual. "What's this?" she wondered, finding a path she had never seen before. The path was sprinkled with golden dust, and it seemed to beckon her. Filled with curiosity and excitement, she decided to follow it, embarking on an adventure she hadn’t anticipated.

    As she trotted along, Sparkle encountered Mr. Mole, who popped out of the ground. "Good morning, Sparkle!" he exclaimed. "You seem to be on a fascinating journey today. Where are you headed?" Mr. Mole asked, his eyes twinkling with curiosity.

    Sparkle, with a flick of her luxurious mane, replied, "Hello, Mr. Mole! I found this enchanting path and decided to see where it leads. Would you like to join me?" she offered, her eyes sparkling with the spirit of adventure.

    "Indeed, I would love to!" Mr. Mole accepted, clambering out of his hole. Together, they continued along the mysterious path. As they journeyed, they encountered several surprises; a waterfall that sang melodies, trees that told stories, and flowers that danced in rhythm.

    The path soon led them to a magnificent castle that stood tall and majestic against the horizon. "Wow! Who lives here?" Sparkle questioned, her eyes wide with awe and wonder.

    Just then, a soft and melodious voice answered, "Welcome, dear Sparkle and Mr. Mole! I am Princess Liana, the guardian of this magical realm." A beautiful princess appeared before them, adorned in a gown that sparkled like the night sky.

    "Princess Liana," Sparkle exclaimed, "your castle is extraordinary! But why have we never seen it before?" She asked, fluttering her wings with excitement.

    Princess Liana smiled gently, "This castle is magical, and it reveals itself only to those pure of heart and filled with curiosity. You are here because you are special, Sparkle," she explained, her voice as soft as a whisper.

    Their hearts filled with gratitude and excitement, Sparkle and Mr. Mole spent the day exploring the castle, uncovering its many secrets and marvels. The castle was a realm of endless possibilities, where every room told a different tale and every corridor led to a new adventure.

    As the day came to a close, Princess Liana shared a secret, "Sparkle, this castle holds a magical gem that grants wishes. But it must be found before the stroke of midnight," she revealed, her eyes shimmering like stars.

    Touched by the princess’s trust, Sparkle and Mr. Mole embarked on a quest to find the magical gem. Their journey was sprinkled with challenges and trials, but their hearts were steadfast, and their spirits were unyielding.

    Every room they entered presented a riddle, "Solve this puzzle, and you may proceed," echoed a mysterious voice. With determination and wisdom, Sparkle and Mr. Mole solved each riddle, moving closer to finding the magical gem.

    As the clock ticked and midnight neared, they finally reached a chamber where the gem rested, radiating a light as bright as a thousand suns. "We found it!" Sparkle exclaimed, her heart racing with joy and anticipation.

    "Make your wish, Sparkle," urged Mr. Mole, "and let the magic unfold!" His eyes were filled with hope and admiration for their incredible journey.

    With a heart full of dreams and eyes closed tight, Sparkle made a wish. A wave of magic enveloped them, and the castle shimmered with a light more magnificent than ever before.

    As they stepped outside, they realized the magic had transformed the valley, making it even more enchanting and beautiful. The flowers bloomed brighter, the rainbows shone more vividly, and a melody of joy echoed through the air.

    Princess Liana, with gratitude in her eyes, said, "Thank you, Sparkle and Mr. Mole. Your hearts have brought magic and wonder to our realm. Remember, the path of curiosity and courage always leads to enchantment," she whispered, as the castle disappeared into the soft glow of the moonlight.

    Sparkle and Mr. Mole, their hearts filled with memories and magic, returned to their valley, carrying tales of an adventure that would be told for generations. And so, the legend of Sparkle’s magical journey lived on, inspiring hearts to believe in the power of curiosity, courage, and dreams.
    """;
var textProcessor = new TextProcessor();
textProcessor.ProcessAndLogSimilarWords(sparkle);
class TextProcessor
{   
    private string StemWord(string word)
    {
#if (LARGE_SUFFIXEX)
        string[] suffixes = {
                        // Noun suffixes
                        "ment", "ness", "tion", "sion", "ship", "age", "ance", "ence", "er", "or", "ist", "ty", "ity", "cy", "dom", "ism",
    
                        // Verb suffixes
                        "ate", "en", "ify", "fy", "ize", "ise",
    
                        // Adjective suffixes
                        "able", "ible", "al", "esque", "ful", "ic", "ical", "ious", "ous", "ish", "ive", "less", "y",
    
                        // Adverb suffixes
                        "ly", "ward", "wise",
    
                        // Plural and verb forms
                        "s", "es", "ies", "ed", "ing", "er", "est",
    
                        // Prefixes (for completeness, even though they are not technically suffixes)
                        "un", "re", "in", "im", "non", "dis", "pre", "post", "anti", "de", "fore", "hyper", "semi", "sub", "super", "trans"
                    };
#else
        string[] suffixes = { "ing", "ed", "ly", "es", "s", "ment" };
#endif

        foreach (var suffix in suffixes)
        {
            if (word.EndsWith(suffix))
            {
                return word[..^suffix.Length];
            }
        }
        return word;
    }
    
    public void ProcessAndLogSimilarWords(string text)
    { 
        string[] words = text.Split(new[] { '\r', '\n', ' ', '.', ',', ';', '!', '?' }, StringSplitOptions.RemoveEmptyEntries);

        Dictionary<string, List<string>> stemmedWords = new Dictionary<string, List<string>>();

        foreach (var word in words)
        {
            string stemmedWord = StemWord(word.ToLower());
            if (!stemmedWords.ContainsKey(stemmedWord))
            {
                stemmedWords[stemmedWord] = new List<string>();
            }

            stemmedWords[stemmedWord].Add(word);
        }

        foreach (var entry in stemmedWords)
        {
            if (entry.Value.Count > 1)
            {
                Console.WriteLine($"Similar words: {string.Join(", ", entry.Value)}");
            }
            
        }
    }
}

This code outputs:

Similar words: in, in, in, in, in
Similar words: a, a, a, a, a, as, a, As, a, a, As, a, a, a, A, a, as, as, a, a, a, a, As, a, a, a, a, a, a, As, a, a, as, as, a, a, a, A, a, As, a, as
Similar words: valley, valley, valley
Similar words: with, with, with, with, with, with, with, with, with, with, with, with, With, with, with, With, with, with, with
Similar words: flowers, flowers, flowers
Similar words: and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, And, and
Similar words: rainbows, rainbows
Similar words: lived, lives, lived
Similar words: magical, magical, magical, magical, magical, magical, magical
Similar words: pony, pony
Similar words: Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle
Similar words: just, Just
Similar words: she, she, she, she, she, she, she, she, she, She, she, she, she
Similar words: was, was, was, was
Similar words: wings, wings
Similar words: that, that, that, that, that, that, that, that, that
Similar words: shimmered, shimmering, shimmered
Similar words: the, the, the, The, the, the, the, The, the, the, the, the, the, The, the, the, the, the, the, the, the, the, the, the, the, The, the, the, the, the, the, the, the, the
Similar words: mane, mane
Similar words: like, like, like, like
Similar words: waterfall, waterfall
Similar words: Every, every, every, Every
Similar words: day, day, day
Similar words: would, Would, would, would
Similar words: of, of, of, of, of, of, of, of, of, of, of, of, of, of, of, of, of
Similar words: joy, joy, joy
Similar words: morning, morning
Similar words: to, to, to, to, to, to, to, to, to, to, to, to, to, to, to, to, to
Similar words: this, this, this, this, this
Similar words: ", ", ", ", ", ", ", ", ", ", ", ", ", ", ", "
Similar words: wondered, wonder, wonder
Similar words: finding, find, finding
Similar words: path, path, path, path, path, path
Similar words: had, had
Similar words: never, never
Similar words: seen, seen
Similar words: before, before, before, before, before
Similar words: sprinkled, sprinkled
Similar words: it, it, it, it, it, its, it, it, it
Similar words: seemed, seem
Similar words: her, her, her, her, her, her, her, her, her
Similar words: Filled, filled, filled, filled, filled
Similar words: curiosity, curiosity, curiosity, curiosity, curiosity
Similar words: excitement, excitement, excitement
Similar words: decided, decided
Similar words: embarking, embarked
Similar words: on, on, only, on, on
Similar words: an, an
Similar words: adventure, adventure, adventure, adventure
Similar words: along, along
Similar words: encountered, encountered
Similar words: Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr
Similar words: Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole
Similar words: who, Who
Similar words: out, out
Similar words: exclaimed, exclaimed, exclaimed
Similar words: be, be, be
Similar words: journey, journeyed, journey, journey, journey
Similar words: Where, where, where, where
Similar words: are, are, are
Similar words: you, you, You, you, you, you
Similar words: asked, asked
Similar words: his, his, His
Similar words: eyes, eyes, eyes, eyes, eyes, eyes, eyes
Similar words: I, I, I, is, is
Similar words: found, found, found
Similar words: enchanting, enchanting, enchantment
Similar words: leads, leads
Similar words: sparkling, sparkled
Similar words: spirit, spirits
Similar words: they, they, they, they, they, they, they
Similar words: mysterious, mysterious
Similar words: told, told, told
Similar words: led, led
Similar words: them, them, them
Similar words: magnificent, magnificent
Similar words: castle, castle, castle, castle, castle, castle, castle, castle
Similar words: here, here
Similar words: soft, soft, soft
Similar words: voice, voice, voice
Similar words: Princess, princess, Princess, Princess, Princess
Similar words: Liana, Liana, Liana, Liana, Liana
Similar words: realm, realm, realm
Similar words: beautiful, beautiful
Similar words: But, But, but
Similar words: have, have
Similar words: reveals, revealed
Similar words: heart, hearts, hearts, heart, heart, hearts, hearts, hearts
Similar words: whisper, whispered
Similar words: Their, Their, their, their, their, their, their
Similar words: gratitude, gratitude
Similar words: secrets, secret
Similar words: room, room
Similar words: gem, gem, gem, gem
Similar words: wishes, wish, wish
Similar words: midnight, midnight
Similar words: were, were, were
Similar words: riddle, riddle
Similar words: echoed, echoed
Similar words: light, light
Similar words: your, Your
Similar words: magic, magic, magic, magic, magic
Similar words: for, for
Similar words: dreams, dreams
Similar words: more, more, more
Similar words: courage, courage

Adding this const line on the top:

#define LARGE_SUFFIXEX

We use these suffixes, adding more meaning to the word stemming:

string[] suffixes = {
    // Noun suffixes
    "ment", "ness", "tion", "sion", "ship", "age", "ance", "ence", "er", "or", "ist", "ty", "ity", "cy", "dom", "ism",

    // Verb suffixes
    "ate", "en", "ify", "fy", "ize", "ise",

    // Adjective suffixes
    "able", "ible", "al", "esque", "ful", "ic", "ical", "ious", "ous", "ish", "ive", "less", "y",

    // Adverb suffixes
    "ly", "ward", "wise",

    // Plural and verb forms
    "s", "es", "ies", "ed", "ing", "er", "est",

    // Prefixes (for completeness, even though they are not technically suffixes)
    "un", "re", "in", "im", "non", "dis", "pre", "post", "anti", "de", "fore", "hyper", "semi", "sub", "super", "trans"
};

And we get a different output:

Similar words: in, in, in, in, in
Similar words: a, a, a, a, a, as, a, As, a, are, a, As, a, a, a, A, a, are, are, as, as, a, a, a, a, As, a, a, a, a, a, a, As, a, a, as, as, a, a, a, A, a, As, a, as
Similar words: valley, valley, valley
Similar words: with, with, with, with, with, with, with, with, with, with, with, with, With, with, with, With, with, with, with
Similar words: flowers, flowers, flowers
Similar words: and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, And, and
Similar words: rainbows, rainbows
Similar words: lived, lived
Similar words: magical, magical, magical, magical, magical, magical, magical
Similar words: pony, pony
Similar words: Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle, Sparkle
Similar words: just, Just
Similar words: any, an, an
Similar words: she, she, she, she, she, she, she, she, she, She, she, she, she
Similar words: was, was, was, was
Similar words: wings, wings
Similar words: that, that, that, that, that, that, that, that, that
Similar words: shimmered, shimmering, shimmered
Similar words: the, the, the, The, the, the, they, the, they, they, The, the, the, the, the, the, The, the, the, the, the, they, the, the, they, the, the, the, they, they, the, the, The, the, the, the, the, the, the, the, the
Similar words: mane, mane
Similar words: like, like, like, like
Similar words: waterfall, waterfall
Similar words: Every, every, every, Every
Similar words: day, day, day
Similar words: would, Would, would, would
Similar words: of, of, of, of, of, of, of, of, of, of, of, of, of, of, of, of, of
Similar words: joy, join, joy, joy
Similar words: morning, morning
Similar words: to, to, to, to, to, to, to, to, to, to, to, to, to, to, to, to, to
Similar words: this, this, this, this, this
Similar words: ", ", ", ", ", ", ", ", ", ", ", ", ", ", ", "
Similar words: finding, find, finding
Similar words: path, path, path, path, path, path
Similar words: had, had
Similar words: never, never
Similar words: seen, seen
Similar words: before, before, before, before, before
Similar words: sprinkled, sprinkled
Similar words: it, it, it, it, it, its, it, it, it
Similar words: seemed, seem
Similar words: her, her, her, her, her, her, her, her, her
Similar words: Filled, filled, filled, filled, filled
Similar words: curiosity, curiosity, curiosity, curiosity, curiosity
Similar words: excitement, excitement, excitement
Similar words: decided, decided
Similar words: embarking, embarked
Similar words: on, on, on, on
Similar words: adventure, adventure, adventure, adventure
Similar words: along, along
Similar words: encountered, encountered
Similar words: Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr, Mr
Similar words: Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole, Mole
Similar words: who, Who
Similar words: out, out
Similar words: he, here, here
Similar words: exclaimed, exclaimed, exclaimed
Similar words: be, be, be
Similar words: journey, journey, journey, journey
Similar words: Where, where, where, where
Similar words: you, you, You, you, you, you
Similar words: asked, asked
Similar words: his, his, His
Similar words: eyes, eyes, eyes, eyes, eyes, eyes, eyes
Similar words: I, I, I, is, is
Similar words: found, found, found
Similar words: enchanting, enchanting, enchantment
Similar words: leads, leads
Similar words: sparkling, sparkled
Similar words: spirit, spirits
Similar words: mysterious, mysterious
Similar words: told, told, told
Similar words: led, led
Similar words: them, them, them
Similar words: magnificent, magnificent
Similar words: castle, castle, castle, castle, castle, castle, castle, castle
Similar words: wonder, wonder
Similar words: soft, soft, soft
Similar words: melodious, melody
Similar words: voice, voice, voice
Similar words: Princess, princess, Princess, Princess, Princess
Similar words: Liana, Liana, Liana, Liana, Liana
Similar words: realm, realm, realm
Similar words: beautiful, beautiful
Similar words: But, But, but
Similar words: have, have
Similar words: we, were, were, were
Similar words: reveals, revealed
Similar words: heart, hearts, hearts, heart, heart, hearts, hearts, hearts
Similar words: Their, Their, their, their, their, their, their
Similar words: gratitude, gratitude
Similar words: secrets, secret
Similar words: room, room
Similar words: tale, tales
Similar words: gem, gem, gem, gem
Similar words: midnight, midnight
Similar words: riddle, riddle
Similar words: may, made
Similar words: echoed, echoed
Similar words: closer, closed
Similar words: light, light
Similar words: bright, brighter
Similar words: your, Your
Similar words: wish, wish
Similar words: magic, magic, magic, magic, magic
Similar words: for, for
Similar words: dreams, dreams
Similar words: more, more, more
Similar words: ever, even
Similar words: courage, courage

Conclusion

Stemming simplifies words to their roots and is essential in text processing and analysis. It improves search results, saves space, speeds text processing, and standardizes content. However, severe truncation might destroy word meanings in stemming. Web search engines, e-commerce, social media, content management systems, learning management systems, frameworks and libraries, SEO tools, customer service and CRM systems use stemming. Stemming is a crucial method in natural language processing and information retrieval that affects text analytics and operations.


Similar Articles