Tuesday, June 21, 2016

SAS: option varinitchk



One thing I like about Java is that it is a tightly-typed language. In other words, every variable must be declared with a name and a type before using.

I wish SAS was tightly typed.  How many times have we been burned by “variable is uninitialized” (which all too often means you were simply inconsistent in your spelling.)

Frustrated by my most recent error with time zones, I went looking for some way to prevent that from happening again.

I found option varinitchk, which allows you to specify how the “uninitialized” error should be handled.  There are four settings: nonote, note, warn, and error.

It may be old to some, but it is new to me.  In the future, I will be using option varinitchk=error in all of my programs.  Personal choice.

Here’s some source code to demonstrate:

* demo option varinitchk ;

options varinitchk=note;  * the default ;
data work.step1;
b = a + 1;
run;

options varinitchk=warn;
data work.step2;
b = a + 1;
run;

options varinitchk=error;
data work.step3;
b = a + 1;
run;


And here’s the log:

1    * demo option varinitchk ;
2
3    options varinitchk=note;  * the default ;
4    data work.step1;
5    b = a + 1;
6    run;

NOTE: Variable a is uninitialized.
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 5:7
NOTE: The data set WORK.STEP1 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.05 seconds
      cpu time            0.04 seconds


7
8    options varinitchk=warn;
9    data work.step2;
10   b = a + 1;
11   run;

WARNING: Variable a is uninitialized.
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 10:7
NOTE: The data set WORK.STEP2 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


12
13   options varinitchk=error;
14   data work.step3;
15   b = a + 1;
16   run;

ERROR: Variable a is uninitialized.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.STEP3 may be incomplete.  When this step was stopped there were 0
         observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds



Saturday, April 2, 2016

SAS: Select datetime field into a macro variable

How to select a datetime field into a macro variable, and use it later...
  1. use format=date16.
  2. enclose macro variable in quotes, followed by dt


proc sql noprint;
select max(dttm) format=date16. into :max_dttm from some_table;
quit;run;

data work.subset;set work.my_input (where=(dttm = "&max_dttm"dt));
* etc. ;
run;

Tuesday, January 19, 2016

SAS: Write to Word (rtf)

It's easy to write SAS output directly to Word (more precisely, to .rtf format).

    (1)  use the ods rtf file statement at the beginning to specify path, and
    (2)  use the ods rtf close statement at the end...


libname perm "F:\sasdata" access=readonly;

ods rtf file = 'F:\inclass\ojuice.rtf';

proc print data=perm.ojuice;
title 'Listing of Perm.Ojuice';
run;

goptions reset=all border hsize=5in vsize=4in;
title 'Scatter-plot of Ojuice';
proc gplot data=perm.ojuice;
plot sweetindex * pectin;
run;

title 'Regression of Ojuice';
proc reg data=perm.ojuice plots=none;
model Sweetindex = Pectin;
run;

ods rtf close;




Tuesday, January 5, 2016

HTML: xbar, yhat, etc.



Some cool HTML codes: I use these for embedding statistical symbols in R Markdown.


x-bar   x̄  x̄

y-hat   ŷ  ŷ

Wednesday, December 23, 2015

SAS: Macro to replace missing values with zeroes for all numeric fields

Came across this nice macro. Good not only for work but for demoing the use of arrays and _numeric_.

    %macro miss2zero;
    array miss {*} _numeric_;

    do i =1 to dim(miss);
       if miss{i} = . then miss{i} = 0;
    end;
    %mend miss2zero;






Saturday, November 21, 2015

SAS: SQLOBS and creating macro variables within PROC SQL



I just came across the automatic macro variable SQLOBS...this example is fully self-contained...just copy it into SAS and run...


* ------------------------------------------------------------ ;
DEMO USE OF SQLOBS WHEN CREATING MACRO VARIABLES FROM DATA
* ------------------------------------------------------------ ;

* CREATE TEST DATASET... ;
data work.grandkids;
input name $ gender $;
datalines;
Kamina F
Raelani F
Elliott M
Calliope F
Henni F
;
run;

* SHOW TEST DEMO DATASET... ;
proc print data=work.grandkids;
run;

* CREATE MACRO VARIABLES THE OLD WAY... ;
data _null_;
set work.grandkids end=eof;
call symput("name" || trim(left(_n_)), name);
call symput("gender" || trim(left(_n_)), gender);
if eof then call symput("howmany", trim(left(_n_)));
run;

%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;

* CREATE MACRO VARIABLES THE NEW WAY... ;
proc sql noprint;
select name, gender
into :name1 - :name999, :gender1 - :gender999
from work.grandkids;
quit;
run;

* SHOW SQLOBS...THE WHOLE POINT OF THIS EXAMPLE... ;
%put sqlobs=&sqlobs;

* REASSIGN TO SQLOBS TO HOWMANY AND RUN OLD CODE AGAIN... ;
%let howmany = %eval(&sqlobs);
%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;





Wednesday, November 11, 2015

Data Minng: Jaccard Similarity for bags


I was already familiar with the Jaccard Similarity index but not Jaccard Similarity for bags.

Source: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

"If ratings are 1-to-5-stars, put a movie in a customer’s set n times if they rated the movie n-stars. Then, use Jaccard similarity for bags when measuring the similarity of customers. The Jaccard similarity for bags B and C is defined by counting an element n times in the intersection if n is the minimum of the number of times the element appears in B and C. In the union, we count the element the sum of the number of times it appears in B and in C."

"Example 3.2 : The bag-similarity of bags {a, a, a, b} and {a, a, b, b, c} is 1/3. The intersection counts a twice and b once, so its size is 3. The size of the union of two bags is always the sum of the sizes of the two bags, or 9 in this case. Since the highest possible Jaccard similarity for bags is 1/2, the score of 1/3 indicates the two bags are quite similar, as should be apparent from an examination of their contents."