Even though the word “Analytics” has exploded everywhere on the business scene, this field is really still in its infancy. One of the problems with the word is that “Analytics” means different things to different people. For example, when talking about “Google Analytics,” this generally means web foot-traffic, represented in counts, charts, frequencies, etc. For statisticians and data miners, “Analytics” refers to taking data, whether it is financial records, customer data, behavioral data, etc. and building predictive models – models that tell us about likely future behavior – that are not just descriptive of past or current phenomena but predictive of future phenomena: The purpose is to develop a model to answer important and actionable business questions.
“Analytics” may also refer to using open-ended fields – or textual data to create categories which can be joined back to structured data sets through a technique known as National Language Processing (NLP). It is important to point out that these methods are sensitive to the context. For example, if the word that is being viewed is “football,” the algorithms that are applied are able to determine if the word is being used in a negative or positive or even neutral way, such as, “He hates football, “(negative) versus, “They were excited about the football game” (positive). During the process, the analyst, just as with structured data, makes many important choices along the way.
One of the questions I am frequently asked is what type of textual data can be analyzed? The answer is almost any type of data and very large datasets are desirable. Examples of these datasets include streaming data (RSS) feeds from the web, Twitter feeds, blogs, PDF documents, open-end questions on surveys. Analyzing these datasets can be very labor-intensive and time-consuming. We are in an age where information has become overwhelming; processing and analyzing such information may be difficult, non-standardized, and expensive. Text analytics/text mining is a standardized, less expensive approach to glean competitive intelligence and to acquire a better understanding of the voice of customers. Using a data mining stream one can continuously run it, and refresh it to find new and important results at regular intervals.
What does it take to have a text analytics model built? Evans Analytics uses SPSS Modeler, which has a set of premier text analytics tools. SPSS Modeler comes with libraries already built-into the software. A library is a pre-defined set of sensitive terms and algorithms that can identify and categorize words and phrases. These libraries are a great place to start with a new project.
Many clients will request that an analyst take the project a step or two further. The next step would be for the analyst to build custom libraries – specifically developed for the industry, the company, or the project that is analyzed so that the most relevant terms are developed. These libraries may be saved and be reused, as needed.
Some clients may just want simple counts. For example, a client may only want to know a percentage of customers who preferred product X to product Y or a higher percentage of customers provided more positive comments than negative comments about a particular service. Other clients may request newly created categories to join back to other structured data, and then predictive modeling or customer segmentation. They may also want to know that customers who preferred product X were also more likely to live in a specific region, be in a certain age range, and also drive a minivan! Text Analytics becomes more powerful when added to other data to examine whether differences occur by subgroup.
So, how can you leverage text analytics for your business? Do you have competitors who are blogging or Tweeting or are there news or RSS feeds that are out there as competitive intelligence, but you haven’t gleaned the important information from them that you should be leveraging? Do you have open ends in surveys that have overwhelmed you, but you know that important information can be extracted? Do you have research that has previously been handled through qualitative methods, but you think it would be stronger if it was analyzed and joined with your structured data? If you have answered yes to one of these questions, you have a strong case to consider text analytics!
In my next installation, I will explain how to bring previously constructed categories into SPSS Modeler and re-use old qualitative research in a quantitative way.
Dawn Marie Evans is Group Owner and Manager of SACG; She is an external consultant and trainer at IBM/SPSS and Managing Partner at Evans Analytics.