Statistics & Analytics Consultants Group Blog

The Statistics & Analytics Consultants group is a network of over 9,000 members. Businesses have the ability to work with consulting firms and individual consultants and eliminate costs. There is also a job board where you can post statistics and analytics jobs. Our members offer a variety of courses to ensure that your company can compete on analytics. Courses range from basic applied understanding of statistical concepts and methods involved in carrying out and interpreting research to advanced modeling and programming.

This blog is a place where featured members are invited to share their expertise and opinions. Opinions are not necessarily the opinions of SACG.

Tuesday, January 3, 2012

Adapting Statistical Programming Languages, by David Abbott

The fit between you and any given general purpose statistical analysis language is apt to be far from perfect, especially if you have experience with multiple programming languages. You want to do X but the tool’s language only supports Y. Bummer! But you don’t always have to just tolerate this situation; sometimes, you can make the fit better by clever use of the language’s extensibility features.

Some years ago when using SAS Base, I found myself wanting to repeat chunks of code for each of a small collection of things, e.g., SAS datasets, SAS variables, etc. , but knew from experience that the clone and edit way of coding was not the way to go; it makes code bulky and miserable to maintain. Fortunately, I recalled a useful construct in the UNIX shell language “for kk in ds1 ds2 ds3 do done” that obviated the need to clone and edit. Perhaps I could use the SAS macro feature to achieve something similar in SAS? It seemed worth a try. Basically, I just needed to make three things work to get the intended result:

  • A way to delineate the set of statements that I wanted to be repeatedly invoked, similar to the do and done of the UNIX shell command. Well, the %macro and %mend statements of SAS could fill that bill.


  • A means to specify the list of words to successively substitute, a la the “in ds1 ds2 ds3 ds4” of the UNIX shell command. The %let statement of SAS could fulfill that function.


  • A looping construct to drive the execution through the list of words executing the desired set of statements for each word. The %do %while looping construct of SAS seemed up to the job.

I came up with a SAS macro of several lines with signature %wordLoop(wordlist, contentMacro) and it did the trick for me:

%macro wordLoop(wordList=, contentMacro=);
%local word;
%let cnt=0;
%do %while(1 eq 1);
%let cnt = %eval(&cnt+1);
%let word = %scan(&wordList,&cnt,%str( ));
%if &word= %then %return;
%&contentMacro;
%end;
%mend wordLoop;

For example, here is how I used this macro to winnow several datasets to just the IDs occurring in the dataset subsetOfIdsDs as follows:

%let toBeSubsetted= Ds1 Ds2 DsA DsB DsWhatever;
%macro tmpMacro;
data &word._subset;
merge &word subsetOfIdsDs(in=in2); by id;
if in2;
run;
%mend;
%let toBeSubsetted= Ds1 Ds2 DsA DsB DsC;
%wordLoop(wordList=&toBeSubsetted, contentsMacro=tmpMacro);


After execution I had five new datasets: Ds1_subset, Ds2_subset, DsA_subset, DsB_subset, and DsC_subset all of which had been restricted to the patients/subjects included in subsetOfIdsDs.

The only tricky part to implementing %wordLoop was determining how to get SAS to invoke a macro whose name was provided via the contentMacro parameter of %WordLoop. Fortunately, SAS macro language allows this to be done via the simple construct: %&contentMacro. Of course, the stripped-down implementation of %wordLoop above can be improved on – for example by checking arguments for validity and the like – contact me for my latest version if interested.

The need to perform the same group of statements for multiple datasets comes up frequently for me in statistical analysis . Likewise, in data cleaning, the need to do the same statements for multiple variables comes up frequently. %wordLoop provides a quick and pleasing solution to both situations.

So, let me suggest two take-aways from this blog entry:

  • For the SAS programmer, %wordLoop is a nifty little macro that helps you avoid clone and edit and it illustrates how you can use the SAS macro facility to extend the SAS command language.


  • For users of SPSS, R, etc., top tier statistical programming languages provide language extension mechanisms and with a little work and cleverness you can use them to make the language work more the way you want it too; you can improve the fit of the language to the way you like to work.

About David: David Abbott has degrees in statistics and computer science and is currently working for Veterans Affairs Health Services Research. He can be reached at david.abbott@alumni.duke.edu.

No comments:

Post a Comment