April 22, 2020
The world around data has been growing since we started to recognize patterns, in one of my earlier blogs I brought light to news analyticsin which I also indicated a thorough walk through of it. It all begins with text mining which will further be used as input and this input will actually give us a complete report of what does the text hold and in-depth report on what it means in terms of statistics and analytics. The impact of that text will be in front of you like what that means.

Text mining is a process of by which data essentials are extracted from standard text may it be from text messages, documents, emails, files or news articles; everything that is written in standard language. Text mining is used to recognize patterns and draw insights from such data. The market revolving based on it has seen an exponential growth.

Steps involved in text mining:

  1. Finding the sources – websites or data API’s
  2. Collecting the data
  3. Cleaning the data
  4. Indexing or organizing the data
  5. Finding features and mining for patterns
  6. Used for analysis

Finding sources:

This step involves collecting of sources from text to data is to collected and gathered. This requires research in which special identification techniques may be employed to get best available data.

Collecting Data:

Next step, involves gathering of data from those data sources which were identified in the previous step. This step also involves making the data viable to use in similar format by storing it in some form of database or tables.

Cleaning Data:

This step involves cleansing the data by removing unwanted parts or unnecessary text and creating raw material for the use of algorithms.


The data though unstructured until now is clean and ready for processing, so next step is converting it to a list. Structuring the data is a necessary step towards processing it.

Finding Features:

This is a pre-processing check to features or patterns which will convert to insights and then finally the data is ready for processing.

Where is it used?

Text mining technology is broadly used to a wide variety of government, research and business needs namely:

  • Legal and law firms
  • National Security
  • Scientific Research
  • Financial
  • Biomedical
  • Computer Science and Software
  • Online Media
  • Business and Marketing
  • Sentiment Analysis
  • Literature
  • Intellectual Property
  • Digital Humanities
  • Computational Sociology

With an explicitly wide field of analytics it is clear why everyone should have basic knowledge of its existence and use cases. Increasing interest is being paid to multilingual text mining: The ability to gain information in many languages and then being able to make clusters of patterns. This has made life easier for many and is an accomplishment which shall be cherished and utilized to the max.

