Statistics & Analytics Consultants Group Blog

The Statistics & Analytics Consultants group is a network of over 9,000 members. Businesses have the ability to work with consulting firms and individual consultants and eliminate costs. There is also a job board where you can post statistics and analytics jobs. Our members offer a variety of courses to ensure that your company can compete on analytics. Courses range from basic applied understanding of statistical concepts and methods involved in carrying out and interpreting research to advanced modeling and programming.

This blog is a place where featured members are invited to share their expertise and opinions. Opinions are not necessarily the opinions of SACG.

Sunday, December 11, 2011

Alter Type: It’s Not What You Think, by Steven J. Fink, Evans Analytics

As I was reviewing a colleague’s SPSS syntax code the other day, I came across a command called “Alter Type.”  It sounded like a new scary movie, a psychiatric DSM code, or an abnormal personality attribute. 

I looked up this code in the Command Syntax Reference Manual (available through the Help menu) and there it was—a very useful command which can be applied in many applications.  


In brief, it does exactly what the name implies.  It changes the Variable Type (string or numeric) or Format of variables, including the Width of string variables.  As I was reading the explanation, it appears to be a new and improved Format statement, a so-called Format on steroids! 
Format statements are often used to change the width and decimals of numeric variables or the format of a date variable.  The Alter Type command changes the Variable Type of any variable in one short command—no need to write elaborate or unnecessary code…just one easy statement. 

As an example, the dataset below comprises 3 variables and 2 lines of data.

DATA LIST FREE
/Numvar (F2)     StringVar (A5)   Datevar (Adate10).

BEGIN DATA
1 1234 10/28/2007
4 5678 10/28/2007
End data.

To change a numeric variable to a string (alphanumeric) variable, the command is:

 Alter Type Numvar (A2).

To change a string (alphanumeric) variable to a numeric variable, the command is:

Alter Type Stringvar (F6.0).

To change a date variable to a string variable, the command is:

Alter Type Datevar (A10).

One note of caution: The Alter Type command does not allow you to create a new version of the variable.  So you may want to save your data first or create a copy of the variable.

So, the next time you need to perform a calculations or merge data and the variable is not in the right Format or Type, use the Alter Type command.  After all, it’s free, you are not crazy, and it’s cool! 

About Steven: Steven works as a Statistics & Analytics Consultant at Evans Analytics. He has developed or analyzed over 300 surveys including customer satisfaction, work environment, needs assessment, program evaluation, and compensation surveys for public and private sector customers. He has provided SPSS instruction to more than 3,000 analysts, covering a wide variety of topics, including questionnaire design/writing, sample design, data collection strategies, multivariate analyses, and presentation of tables/graphs. He can be reached at: steven@evansanalytics.com

Tuesday, December 6, 2011

What is a Scratch Variable in IBM SPSS Statistics Syntax? by Keith McCormick


This might seem an obscure topic, but it is easily grasped and has the potential to make your SPSS Syntax more readable. Readable is good.

A Scratch Variable is a Variable with a # Symbol in front of it. It is available temporarily for an intermediate step in a Transformation calculation. Once a Procedure occurs, it is no longer available. If the distinction between Transformation and Procedure is new to you, you should put researching that on your to-do list. Start with Appendix B of the Syntax Reference Guide.

You can use the following lines to create a tiny data set.

DATA LIST /LocationName 1-50 (A) .
BEGIN DATA
Raleigh, North Carolina
Durham, North Carolina
Cary, North Carolina
END DATA.

Let’s say that you wanted to pull out just State from the three examples in the data set. The first step would be to identify the location of the comma because the last letter of the last name is always one character before the comma. It is not a constant value because the names are of variable length.

This bit of code will do it:

COMPUTE CommaLocation = INDEX(LocationName,',').

This next step would complete the process, but would also create a new variable that you don’t need.

STRING State (A50).
COMPUTE State = substring(LocationName,CommaLocation+2).

Warning: you only need to run the STRING command once.

Do we actually want to create this variable? What are we going to do with it after we complete the calculation? We could use DELETE VARIABLES once we are done, but we have two better options. In this example, DELETE VARIABLES is harmless, but it would be slower on large data sets, and therefore some programmers would consider it inelegant. It is noteworthy that for decades the language got by just fine without the fairly recent addition of DELETE VARIABLES.

(Note that I have not included EXECUTE commands in any of these code examples. Curious Why? You really shouldn’t use EXECUTE if there will be any procedures later in the code, and there are always procedures later on in the code. That same Appendix B in the Syntax Reference Guide mentioned earlier is a good place to read more about this.)

We could put a function inside of a function:

STRING State(A50).
COMPUTE State = substring(LocationName,INDEX(LocationName,',')+2).

We could also use a Scratch variable:

COMPUTE #CommaLocation = INDEX(LocationName,',').
STRING State(A50).
COMPUTE State = substring(LocationName,#CommaLocation+2).

In an example as straightforward as this, the function inside of a function might be best. As the complexity grows, there will be opportunities to use the Scratch variable option to break up a calculation into two or more steps instead of a single very long, and potentially confusing, line of code.

And who doesn’t want more tools in their Syntax tool chest?

About Keith: Keith McCormick is an independent data mining professional who blogs at:  http://www.keithmccormick.com/

Thursday, November 10, 2011

Leveraging Text Analytics to Help Answer Your Business Questions, by Dawn Marie Evans

Even though the word “Analytics” has exploded everywhere on the business scene, this field is really still in its infancy.  One of the problems with the word is that “Analytics” means different things to different people.  For example, when talking about “Google Analytics,” this generally means web foot-traffic, represented in counts, charts, frequencies, etc.  For statisticians and data miners, “Analytics” refers to taking data, whether it is financial records, customer data, behavioral data, etc. and building predictive models – models that tell us about likely future behavior – that are not just descriptive of past or current phenomena but predictive of future phenomena:  The purpose is to develop a model to answer important and actionable business questions.

“Analytics” may also refer to using open-ended fields – or textual data to create categories which can be joined back to structured data sets through a technique known as National Language Processing (NLP).  It is important to point out that these methods are sensitive to the context.  For example, if the word that is being viewed is “football,” the algorithms that are applied are able to determine if the word is being used in a negative or positive or even neutral way, such as, “He hates football, “(negative) versus, “They were excited about the football game” (positive).  During the process, the analyst, just as with structured data, makes many important choices along the way.

One of the questions I am frequently asked is what type of textual data can be analyzed?  The answer is almost any type of data and very large datasets are desirable.  Examples of these datasets include streaming data (RSS) feeds from the web, Twitter feeds, blogs, PDF documents, open-end questions on surveys.  Analyzing these datasets can be very labor-intensive and time-consuming.  We are in an age where information has become overwhelming; processing and analyzing such information may be difficult, non-standardized, and expensive.  Text analytics/text mining is a standardized, less expensive approach to glean competitive intelligence and to acquire a better understanding of the voice of customers.  Using a data mining stream one can continuously run it, and refresh it to find new and important results at regular intervals.

What does it take to have a text analytics model built? Evans Analytics uses SPSS Modeler, which has a set of premier text analytics tools. SPSS Modeler comes with libraries already built-into the software.  A library is a pre-defined set of sensitive terms and algorithms that can identify and categorize words and phrases. These libraries are a great place to start with a new project. 

Many clients will request that an analyst take the project a step or two further. The next step would be for  the analyst to build custom libraries – specifically developed for the industry, the company, or the project that is analyzed so that the most relevant terms are developed.  These libraries may be saved and be reused, as needed.

 Some clients may just want simple counts.  For example, a client may only want to know a percentage of customers who preferred product X to product Y or a higher percentage of customers provided more positive comments than negative comments about a particular service.  Other clients may request  newly created categories to join back to other structured data, and then predictive modeling or customer segmentation. They may also want to know that customers who preferred product X were also more likely to live in a specific region, be in a certain age range, and also drive a minivan!  Text Analytics becomes more powerful when added to other data to examine whether differences occur by subgroup.

So, how can you leverage text analytics for your business?  Do you have competitors who are blogging or Tweeting or are there news or RSS feeds that are out there as competitive intelligence, but you haven’t gleaned the important information from them that you should be leveraging?  Do you have open ends in surveys that have overwhelmed you, but you know that important information can be extracted? Do you have research that has previously been handled through qualitative methods, but you think it would be stronger if it was analyzed and joined with your structured data?  If you have answered yes to one of these questions, you have a strong case to consider text analytics!

In my next installation, I will explain how to bring previously constructed categories into SPSS Modeler and re-use old qualitative research in a quantitative way. 

Dawn Marie Evans is Group Owner and Manager of SACG; She is an external consultant and trainer at IBM/SPSS and Managing Partner at Evans Analytics.

Tuesday, November 8, 2011

Potential Explanatory Variables, Not All Qualify by David Abbott

It is tempting to think that all explanatory variables (also called covariates or independent variables) available for a given project would be valid and useful variables to include in a regression model. Well, it is not that simple. Candidate explanatory variables can prove unsuitable for regression for a number of reasons. Analysts can save themselves time and trouble by evaluating the suitability of candidate explanatory variables both prior to and during analysis. Here’s a handful of ways that a candidate explanatory variable can fail to quality…

Insufficient variation
To learn about the effect of an explanatory variable requires that the distribution of the explanatory variable in the analytic dataset not be too concentrated. For example, you can’t learn much about the effect of age if almost all the subjects you are studying are retirement age and just a handful in their 20s, 30s, 40s or 50s. The extreme case of this problem is a categorical variable that takes on only a single value in the analytic dataset.

Meaning inconsistent
If the meaning of an explanatory variable differs among the experimental units, high bias can result. For example, personal income taken from social security records for subjects aged 10-50 years exhibits this problem. Clearly, low income in ages 10-20 years has a very different meaning from low income in AGES 30-50 years. If income is being used as a proxy for socioeconomic status, such a shift in meaning could lead the analyst to markedly overstate the effect of socioeconomic status on automobile accidents.

Excessive measurement error
Some measurement error in explanatory variables is routinely tolerated. However, an abundance of it can wash out the actual effect of the explanatory variable or, worse, introduce bias. This issue is commonly a concern when subjects self-report on emotionally charged measures, e.g., number of sexual partners during the study period. If an explanatory variable is seriously contaminated by measurement error it should either be cleaned up or not used.

Meaning not generalizable
Usually, it is important for the findings of a study to be arguably generalizable to situations other than the experiment that generated the data. So, explanatory variables that only have meaning in the context of the study are best avoided, treated as nuisance variables, or reserved for investigating quality/bias issues in the study. For example, the gender of the person administering a survey may be useful to check for surveyor induced bias, but including it as an explanatory variable in the primary regression results clearly raises questions about the generalizability of study findings.

Substantially duplicative
Each explanatory variable included in the model should measure a distinct dimension of the data. When two explanatory variables are too similar – either in their meaning or the pattern of their variation (i.e. highly correlated) – regression results are unstable and sometimes not even calculable. For example, chronological age and number of years of driving experience are highly correlated in US adults and so are substantially duplicative. Hence, when both are used in a model of accident rates the variance of both estimates is inflated and results are hard to interpret. This problem is a special case of a more general problem known as multicollinearity.

Influenced by the outcome
One assumption of regression methods is that explanatory variables influence the outcome (also called response variable or dependend variable) but the outcome should not influence the explanatory variables. This is usually the case, for example subject age is often used as an explanatory variable and it is almost always preposterous to think that the outcome being analyzed influences subject age. A subject’s age is what it is regardless of the outcome. However, sometimes the value obtained for a candiate explanatory variable is strongly influenced by the outcome . Consider, for example, a study using students’ ratings of a teacher to explain students’ grades and further assume the ratings are collected from students after the grades are known. The grade received by a student and his/her rating of the teacher are very much intertwined. It is as easy to argue that the grade influences the rating as it is to argue that the rating influences the grade. In this case, the better way to proceed is view teacher ratings and student grades as two outcomes of the instructional process whose success is predicted by explanatory other variables like class size, text used, student success in prior courses, etc. that are not influenced by the grade received by the student or the student’s rating of the teacher. This situation is sometimes called “reverse causation” and when it is present it distorts and dilutes regression findings and very much muddies the conceptual waters of the study.

In conclusion
Put your explanatory variables to the test using the six criteria discussed above, drop or improve the variables found lacking, and I think you will find your effort put toward explanatory variable qualification amply repaid.

David Abbott is currently a statistician at Durham Veterans Affairs Health Services Research where he supports researches in both medicine and public health. He has advanced degrees in Statistics from the University of North Carolina and Computer Science from Clemson University. He is a heavy user of SAS Base, SAS Stat, and other related SAS products.

Saturday, October 22, 2011

SACG – Who We Are and What We Do

Statistics & Analytics Consultants, which was started in 2008, is a group that very shortly will be over 10,000 members worldwide. Statistics & Analytics Consultants Group is dedicated to providing statisticians the opportunity to network with others in their field and share ideas and make business contacts.  

Our Goal is to introduce statisticians and analysts to business contacts for consulting opportunities. We also would like statisticians to start discussions to share ideas and best practices and connect with each other. Anyone with a Statistical background is welcome and all statistical disciplines are welcome.  The group comprised of those who are involved in different aspects of many disciplines related to statistics and analytics – including actuaries, academia, corporations, banking, programmers, pharmacy, biostatistics, manufacturing, engineering, etc.  However, the focus is on supporting the consultant and their skills in the industry in which they practice – from statistical, technical, project management, business development, etc.

On LinkedIn, responding to requests from members, we recently started subgroups in different software areas: SPSS, SAS, R-Project, Excel, and Stata.  Heading these subgroups, as moderators are leaders in these particular areas.  Some of the discussion topics we have had in the group have included:

·         “How to detect fraudulent behavior of sale personnel of the Company through statistical analysis of Sales Data”

·         “Checking for Falsification or Duplication of Records”

·         “Removing Multicollinearity”

·         “Is Statistical analysis a part of Data mining or Data mining is the part of Statistical analysis?”

·         “A bank has a test designed to establish the credit rating of a loan applicant. Of the persons, who default (D), 90% fail the test (F). Of the persons, who will repay the bank (ND), 5% fail the test. Furthermore, it is given that 4% of the population is not worthy of credit; i.e., P(D) = .04. Given that someone failed the test, what is the probability that he actually will default?”



Our discussions are rich and varied and the discussions are frequently helpful and sometimes quite vibrant! We invite you to join us on this website, as well as LinkedIn.  To post questions in the forums, to share code and datasets (shortly there will be a place for such) and to submit a guest blog, which can be hyperlinked back to your own website or Twitter account.  To submit a blog send your submission to:  info@statisticalconsultants.net  To join our group on LinkedIn, apply at this link: Statistics & Analytics Consultants . 


In addition, because we are interested in what statistics, analytics, and business intelligence tools you are using, to better serve you, please take our survey: SACG Survey

Friday, October 21, 2011

Trainer Tip: Multicollinearity, by Steve Poulin, Ph.D., Trainer & Consultant, IBM/SPSS

Multicollinearity (sometimes called Collinearity)
This describes the condition of high correlations among two or more independent variables used in a multiple regression technique.  Based on my experience, it is one of the most common threats to accurately ranking the effects of the independent variables used in a regression analysis.  This condition affects any technique based on regression principles, including linear regression, binary and multinomial logistic regression, the Cox regression survival analysis technique, and an ARIMA time-series analysis that uses transfer functions to measure the effect of two or more external series.

The most direct test of multicollinearity is available in the Linear Regression procedure (Analyze/Linear Regression) within the IBM SPSS Statistics software and the Regression node in version 14 of the IBM SPSS Modeler software.  Within IBM SPSS Statistics, clicking on the Statistics button in the Linear Regression dialog box opens the following subdialog box:


In version 14 of IBM SPSS Modeler, collinearity diagnostics are requested from a very similar dialog box that is invoked from the Regression node:


The collinearity diagnotics option will produce two new columns in the Coefficients table and a Collinearity Diagnostics table.  All of these diagnostics will tell a similar story, and the most commonly used diagnostic is the Tolerance statistic that appears in the Coefficients table for each independent variable.  The Tolerance statistic measures how much variance in each independent variable is NOT explained by the other independent variables.  Tolerance values below .3 (30%) are likely to indicate a problem with multicollinearity, which means that the B and Beta coefficients produced for those variables may be incorrect.

There other methods available in IBM SPSS Statistics and Modeler for detecting multicollinearity.  Bivariate correlation tests can be run for all of the independent variables.  However, as the name implies, these tests can only test for high correlations among two variables at a time, while multicollinearity refers to the correlations between each independent variable and all of the other independent variables.  Nevertheless, if multicollearity is occurring because of high correlations among a few variables, this method will be sufficient.  Correlation coefficients above .8 or below -.8 on a scale between -1 and 1 usually indicate multicollinearity at a level that will distort regression coefficients.

Factor analysis is a better test of multicollinearity because it can detect high correlations among any number of variables.  Another advantage is that factor analysis can produce factor scores that can be used in lieu of the original independent variables.  If orthogonal (uncorrelated) factor scores care created, this method will completely remove multicollinearity (with tolerance values of 1!).  However, the coefficients associated with the factor scores used as independent variables in a regression can be difficult to interpret.

Linear Regression in IBM SPSS Statistics and the Regression node in Modeler 14 are the only statistical procedures that offer collinearity tests.  However, any set of independent variables can be tested in the Linear Regression procedure, regardless of the regression-based procedures that will be used.  Since the collinearity test only applies to the independent variables, any dependent variable can be designated as the dependent variable (even the subject’s ID numbers!).  However, to simplify the output, you should deselect the Model fit and Regression coefficients, which are selected by default

The simplest way to fix a multicollinearity problem is to simple pick one of a set of variables that are highly correlated, especially if high correlations suggest redundancy.  Another simple solution is to use the mean of highly correlated variables.  Regardless of the solution used, tests for multicollinearity should be run before the analyst begins his or her interpretation of the regression coefficients.

About Steve:  Steve has been an education consultant for SPSS since November of 1997.  For most of that time he worked as an external consultant, and in March of 2010 he began working for IBM SPSS full-time.  Before he began working full-time for IBM SPSS, he worked as a researcher for the Center for Mental Health Policy and Services Research at the University of Pennsylvania.  Steve received a PhD in Social Policy, Planning, and Policy Analysis from Columbia University.  He loves to travel, and is an avid backpacker with his son.





Thursday, October 6, 2011

The Data Miner’s somewhat surprising role as Honest Broker and Change Agent by Keith McCormick


They say that you can’t be a prophet in your own land. As someone who is always an outsider to the organizations that I Data Mine in, I find this to be true. I find that building a model is rarely more than 10-20% of the time I spend in front of the laptop, and fully a third of my time is not spent in front of a laptop at all. This is an explanation of what I find myself doing in all of those many hours that I am not using Data Mining software, or any software. What else is there to do?

Inspire Calm: I am often greeted with the admission that my new client’s Data Warehouse is not quite as complete, nor quite as sophisticated as they would like. No one’s is! It is interesting that it is one of the first facts that is shared because it implies that if only they had the perfect Data Warehouse that the Data Mining project would be easy. Well, they are never easy. Important work is hard work, and no one really has a perfect Data Warehouse because IT has a hard job to do as well. So, the experienced Data Miner is in a good position to explain that the client really isn’t so far behind everyone else.

Advocate for the Analysis Team’s time within their department:  Yes, this is a full time endeavor! It is surprising how often Data Mining is confused with ad hoc queries like “How many of X did we sell in Q1 in Region A?” I am not sure where this comes from, but new Data Miners are left wondering how they can perform all six stages of CRISP-DM in time for next Tuesday’s meeting. By the time an external consulting resource is involved this confusion is largely cleared up, but sometimes a little bit of it lingers. How can the internal members perform all of their ongoing functions, and commit to a full time multi-week effort? Of course, they can’t. A bit of realism often sinks in during the first week of a project. Much better addressed earlier than later.

Inspire loftier goals:  Data preparation is said to take 70-90% of the effort. I have experienced little to convince me that this estimate is far off. The ‘let’s do something preliminary’ thing can be inefficient if you aren’t careful because on a daily basis one is making decisions about how the inputs interact. Refreshing the model on more recent data is straightforward, but if you substantively change the recipe of the variable gumbo that you are mining, you have to repeat a lot of work, and revisit a lot of decisions. It is possible, with careful planning, to minimize the impact, but you risk increasing (albeit not doubling) the data preparation time. It is ultimately best to communicate the importance of the endeavor, knock on doors, marshal resources, and do the most complete job you can right now.

Act as a liaison with IT:  An almost universal truth is that IT has been warned that the Data Miner needs their data, but IT has not been warned that the Data Miner needs their time and attention. Of course, no one wants to be a burden to another team, but some additional burden is inevitable. The analyst about to embark on a Data Mining project is going to have unanswered questions or unfulfilled needs that are going to require the IT team. The external Data Mining resource will often to have to explain to IT management that there is no way to completely eliminate this; that it is natural, and it is not the analysis team’s fault. Concurrent with that, the veteran Data Miner has to anticipate when the extra burden will occur, act to mitigate it, and try to schedule it as conveniently as possible.

Fight for project support (and data) from other departments: Certain players in the organization are expecting to be involved, like IT. Often the word has to get out that a successful Data Mining project is a top to bottom search for relevant data. Some will be surprised that it is a stone in their department that has been left unturned. They may not be pleased. Excited as they may be about the benefit that the entire company will derive, you are catching them at inopportune moment as they leave for vacation, or as a critical deadline looms. Fair warning is always wise, and it should come early. Done properly, the key player in a highly visible project gets a little (not a lot of) political capital which they should spend carefully.

Help get everyone thinking about Deployment and ROI from the start:   Far too often it is assumed that the analysts are in charge of the “insights”, and the management team, having received the magic power point slides will pick it up from there, and ride the insights all the way to deployment and ROI. Has this ever happened? The Data Miner must coach, albeit gently, that a better plan must be in place, and the better planning must begin the very first week of a data mining project. Let executives play their critical role, but a little coaching is good for everyone. After all, it might be everyone’s first Data Mining project.

Fade into the background:   Everyone wants credit for their hard work, but the wise Data Miner lets the project advocates and internal customers do all the talking at the valedictory meeting.  The best place to be is on hand, but quiet. Frankly, if the Data Miner is still shoulder deep in the project, the project isn’t ready for a celebration. The “final” meeting, probably the first of many final meetings should be about passing the torch, reporting initial (or estimated) ROI, and announcing deployment details.

Keith in an independent consultant who blogs at http://www.keithmccormick.com/

From Survey Questions to Business Applications By Dawn Marie Evans & Steven J. Fink

As a manager you have important business questions you need answered – and with the explosion of analytics, managers are expected to use the data to drive decisions.  Buzzwords like “Voice of the Customer,” “Customer Segmentation,” “Competitive Intelligence,” and “Business Intelligence are bandied about – but how can you nail down a definitive methodology to answer your important question?

One tool for gaining access to the voice of your customers, employees, or population of interest, is a survey. How do you know when it is time to launch a survey?  The short answer to this is when the available data that you have on hand (generally within your company’s databases) fall short in answering your most pressing business questions.  Why hire an expert?  Because if not properly constructed or sampled, the survey most likely will yield results that will either tell you very little of importance, cannot be joined back to your own data with confidence, or may not be representative of your population of interest.  You want to have confidence in the tool itself and in the results that it yields.

Below are two business case examples where surveys have been used to answer important business questions.  You may find these of interest within your own business context:

Customer Segmentation for an Online Company

Working with a company whose products were sold exclusively online, they had a database of customer records on hand.  However, this information was incomplete regarding certain attitudinal information, as well as behavioral information as to how customers were shopping with competitors – both online and in-store.  Launching a survey to a large sample of customers allowed us to gain insight into attitudes and behaviors of customers.  Using a clustering technique, customers were segmented into several key segments that had very different characteristics, based on attitudes, shopping preferences, demographics, etc.

Using principal components analysis, the survey was then reduced to just a few main questions.  When future customers registered on the site and answered these few questions, along with key demographics, they were placed into one of the segments where they would receive targeted marketing messages. This survey helped to answer business questions of: Who are our customers?  What are their motivations for shopping with us?  What are their buying behaviors by segment and demographics? Who are the major competitors by segments?  From here, the marketing department was able to develop the creative messages targeted specifically to each segment.

What Does a Survey Have to Do With Your Salary?

In another key application, an association requested the administration of an annual Compensation Survey to collect data from their members about how much they earn, how much extra they receive in cash bonuses, and deferred compensation.  Survey results may be disaggregated by level of education, position, region of the country, academic vs. non-academic, public vs. private, etc.  Associations may also examine trend data of their members over 2, 3, or 5 years.  In asking such sensitive information of workers, it is important to hire those who are skilled at constructing surveys in such a way that respondents are likely to follow through to the end of the survey.  If you start with questions that are too sensitive early on – or too complex, it is unlikely that those taking the survey will finish.  It is also important this be done by evaluators external to a person’s place of business – there needs to be a buffer, a sense of safety in answering questions that may be attitudinal with regards to their work, salary, work environment, and so forth.

Who uses this information?  Human Resources departments use this information to figure out how much to offer prospective employees or to determine whether their employees are in line with industry practices.   Similarly, prospective employees may use this information to know how much they can expect to earn.  Current employees may also use this information to compare their compensation to their peers. 

So, the next time you want to know whether you are being paid fairly, go to an association website to compare how much you could be earning.  Where did they get this information? From a survey, of course! 

If you have an important business question, and you current data cannot provide all the answers, ask Evans Analytics at info@evansanalytics.com to design and analyze a survey for you.