Statistics & Analytics Consultants Group Blog

The Statistics & Analytics Consultants group is a network of over 9,000 members. Businesses have the ability to work with consulting firms and individual consultants and eliminate costs. There is also a job board where you can post statistics and analytics jobs. Our members offer a variety of courses to ensure that your company can compete on analytics. Courses range from basic applied understanding of statistical concepts and methods involved in carrying out and interpreting research to advanced modeling and programming.

This blog is a place where featured members are invited to share their expertise and opinions. Opinions are not necessarily the opinions of SACG.

Tuesday, December 6, 2011

What is a Scratch Variable in IBM SPSS Statistics Syntax? by Keith McCormick


This might seem an obscure topic, but it is easily grasped and has the potential to make your SPSS Syntax more readable. Readable is good.

A Scratch Variable is a Variable with a # Symbol in front of it. It is available temporarily for an intermediate step in a Transformation calculation. Once a Procedure occurs, it is no longer available. If the distinction between Transformation and Procedure is new to you, you should put researching that on your to-do list. Start with Appendix B of the Syntax Reference Guide.

You can use the following lines to create a tiny data set.

DATA LIST /LocationName 1-50 (A) .
BEGIN DATA
Raleigh, North Carolina
Durham, North Carolina
Cary, North Carolina
END DATA.

Let’s say that you wanted to pull out just State from the three examples in the data set. The first step would be to identify the location of the comma because the last letter of the last name is always one character before the comma. It is not a constant value because the names are of variable length.

This bit of code will do it:

COMPUTE CommaLocation = INDEX(LocationName,',').

This next step would complete the process, but would also create a new variable that you don’t need.

STRING State (A50).
COMPUTE State = substring(LocationName,CommaLocation+2).

Warning: you only need to run the STRING command once.

Do we actually want to create this variable? What are we going to do with it after we complete the calculation? We could use DELETE VARIABLES once we are done, but we have two better options. In this example, DELETE VARIABLES is harmless, but it would be slower on large data sets, and therefore some programmers would consider it inelegant. It is noteworthy that for decades the language got by just fine without the fairly recent addition of DELETE VARIABLES.

(Note that I have not included EXECUTE commands in any of these code examples. Curious Why? You really shouldn’t use EXECUTE if there will be any procedures later in the code, and there are always procedures later on in the code. That same Appendix B in the Syntax Reference Guide mentioned earlier is a good place to read more about this.)

We could put a function inside of a function:

STRING State(A50).
COMPUTE State = substring(LocationName,INDEX(LocationName,',')+2).

We could also use a Scratch variable:

COMPUTE #CommaLocation = INDEX(LocationName,',').
STRING State(A50).
COMPUTE State = substring(LocationName,#CommaLocation+2).

In an example as straightforward as this, the function inside of a function might be best. As the complexity grows, there will be opportunities to use the Scratch variable option to break up a calculation into two or more steps instead of a single very long, and potentially confusing, line of code.

And who doesn’t want more tools in their Syntax tool chest?

About Keith: Keith McCormick is an independent data mining professional who blogs at:  http://www.keithmccormick.com/

2 comments:

  1. Three additional points:
    1. Scratch variables isolate the code that uses them from the variables in your dataset. Since scratch variables are never part of the "real" data, you can be confident that you are not overwriting a permanent variable.

    2. They are automatically initialized to 0 or blank, according to type.

    3. These variables are not reset when a case is read, so they are handy for constants and accumulators as well as things like loop indexes.

    ReplyDelete
  2. Thanks Jon ... I have explicitly initialized a scratch variable, I think. I didn't know about point 2. You've just saved me a step.

    ReplyDelete