FlashText In Python

Introduction

In the data analytics world, often we need to search for matching keywords in a document or replace the specific word with another. FlashText is a library in Python that is designed and developed specifically for “searching” and “replacing” words in a document or text.

The article explains  

  • FlashText Installation
  • Understanding the basic classes and their behaviors
  • Searching for keywords in a Text
  • Replacing a keyword with another

Let’s explore

Installation

The FlashText installation is very straightforward, the article shows FlashText installation using ‘pip’.

Install FlashText

Searching for a keyword

In this section, the details on the important objects in FlashText will be explained along with how to search for a specific Keyword within the Text.

Say we have a Text,

When an Open Data standard is created and promoted, it’s important to think why - what change is this trying to drive? What will people do with this data that they couldn’t do before?

In this text, we are going to search for a few words using FlashText. The first step is to instantiate a KeywordProcessor object for which an import from the flashtext library is required

from flashtext import KeywordProcessor
keywordprocessor = KeywordProcessor()

The KeywordProcessor class takes care of the keywords provided to a KeywordProcessor class, for that we need to leverage the add_keyword function by providing the word we intended to search for.

keywordprocessor.add_keyword('Data')

Then for extracting a text, we need to pass the text to an “extract_keywords” function, and if the word exists it will be returned in a list, if not an empty list is returned.

extracted_words = keywordprocessor.extract_keywords(Text)
print(extracted_words)

FlashText

The full Program is as follows

from flashtext import KeywordProcessor

keywordprocessor = KeywordProcessor()
keywordprocessor.add_keyword('Data')

Text = 'When an Open Data standard is created and promoted, it’s important to think why - what change is this trying to drive? What will people do with this data that they couldn’t do before?'

extracted_words  = keywordprocessor.extract_keywords(Text)
print(extracted_words )

The above program is running in case-insensitive mode, the Text has 2 appearances of the word ‘Data’ and ‘data’, to run the program in case sensitive mode enable the ‘case_sensitive’ as True

from flashtext import KeywordProcessor

keywordprocessor = KeywordProcessor(case_sensitive=True)
keywordprocessor.add_keyword('Data')

Text = 'When an Open Data standard is created and promoted, it’s important to think why - what change is this trying to drive? What will people do with this data that they couldn’t do before?'

extracted_words  = keywordprocessor.extract_keywords(Text)
print(extracted_words )

FlashText

Keyword Replacing

The library is very straightforward; keyword replacing is also very straightforward just like the searching. All we need to do is to supply the word we intend to replace with another word in the “add_keyword” function.

Say we have a Text,

Functional programming languages are specially designed to handle symbolic computation and list processing applications. Functional programming is based on mathematical functions.

In the above text let’s replace the word “Functional programming” with “FP”

from flashtext import KeywordProcessor

Text = "Functional programming languages are specially designed to handle symbolic computation and list processing applications. Functional programming is based on mathematical functions."

keywordprocessor = KeywordProcessor(case_sensitive=True)
keywordprocessor.add_keyword('Functional programming', 'FP')

replaced  = keywordprocessor.replace_keywords(Text)
print(replaced )

FlashText

Summary

The article explains the FlashText installation, description of import classes and methods along with an example of Searching and Replacing a keyword. The next article on FlashText will explore more advanced use-case.


Similar Articles