Wednesday, December 23, 2015

SAS: Macro to replace missing values with zeroes for all numeric fields

Came across this nice macro. Good not only for work but for demoing the use of arrays and _numeric_.

    %macro miss2zero;
    array miss {*} _numeric_;

    do i =1 to dim(miss);
       if miss{i} = . then miss{i} = 0;
    end;
    %mend miss2zero;






Saturday, November 21, 2015

SAS: SQLOBS and creating macro variables within PROC SQL



I just came across the automatic macro variable SQLOBS...this example is fully self-contained...just copy it into SAS and run...


* ------------------------------------------------------------ ;
DEMO USE OF SQLOBS WHEN CREATING MACRO VARIABLES FROM DATA
* ------------------------------------------------------------ ;

* CREATE TEST DATASET... ;
data work.grandkids;
input name $ gender $;
datalines;
Kamina F
Raelani F
Elliott M
Calliope F
Henni F
;
run;

* SHOW TEST DEMO DATASET... ;
proc print data=work.grandkids;
run;

* CREATE MACRO VARIABLES THE OLD WAY... ;
data _null_;
set work.grandkids end=eof;
call symput("name" || trim(left(_n_)), name);
call symput("gender" || trim(left(_n_)), gender);
if eof then call symput("howmany", trim(left(_n_)));
run;

%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;

* CREATE MACRO VARIABLES THE NEW WAY... ;
proc sql noprint;
select name, gender
into :name1 - :name999, :gender1 - :gender999
from work.grandkids;
quit;
run;

* SHOW SQLOBS...THE WHOLE POINT OF THIS EXAMPLE... ;
%put sqlobs=&sqlobs;

* REASSIGN TO SQLOBS TO HOWMANY AND RUN OLD CODE AGAIN... ;
%let howmany = %eval(&sqlobs);
%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;





Wednesday, November 11, 2015

Data Minng: Jaccard Similarity for bags


I was already familiar with the Jaccard Similarity index but not Jaccard Similarity for bags.

Source: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

"If ratings are 1-to-5-stars, put a movie in a customer’s set n times if they rated the movie n-stars. Then, use Jaccard similarity for bags when measuring the similarity of customers. The Jaccard similarity for bags B and C is defined by counting an element n times in the intersection if n is the minimum of the number of times the element appears in B and C. In the union, we count the element the sum of the number of times it appears in B and in C."

"Example 3.2 : The bag-similarity of bags {a, a, a, b} and {a, a, b, b, c} is 1/3. The intersection counts a twice and b once, so its size is 3. The size of the union of two bags is always the sum of the sizes of the two bags, or 9 in this case. Since the highest possible Jaccard similarity for bags is 1/2, the score of 1/3 indicates the two bags are quite similar, as should be apparent from an examination of their contents."



Saturday, October 24, 2015

SAS: Initialize random number seed

Initialize random number seed:


&let SEED = 1234;

data _null_;
call streaminit(&SEED);
run;


Saturday, October 17, 2015

R: Which package has this function?

Use the sos package in order to determine which package has a function you need.
library(sos)
findFn("partial residual plots")
 
 

Thursday, October 8, 2015

SAS: Drop variable if it exists



* =============================================================== ;
*         M A C R O   D R O P _ V A R _ I F _ E X I S T S
* =============================================================== ;

%macro drop_var_if_exists(SASFILE=,VAR=);

%* See also http://sasrunner.blogspot.com/2009/09/check-variable-exist-in-data-step-run.html;

data _null_;
dsid = open("&SASFILE");
call symput("var_exists", varnum(dsid, "&VAR"));
rc = close(dsid);
run;

%if (&var_exists eq 0) %then %do;
    %put Variable &VAR does not exist in &SASFILE ;
%end;
%else %do;
    proc sql noprint;
    alter table &SASFILE drop &VAR;
    quit;
    run;
    %put Variable &VAR was dropped from file &SASFILE ;
%end;

%mend drop_var_if_exists;




* =============================================================== ;
*    T E S T   M A C R O   D R O P _ V A R _ I F _ E X I S T S
* =============================================================== ;

* Create test dataset;
data work.mactest;
 set sashelp.class (obs=10);
run;

title "Before drops";
proc print data=work.mactest;
run;
title;

%drop_var_if_exists(SASFILE=work.mactest, VAR=name);
%drop_var_if_exists(SASFILE=work.mactest, VAR=xyz);
%drop_var_if_exists(SASFILE=work.mactest, VAR=age);

title "After drops";
proc print data=work.mactest;
run;
title;

* Done ;




Wednesday, October 7, 2015

SAS: Remove carriage return from string

I had a problem with a dataset with a single string field, created using datalines, and the line ended with carriage return + line feed ('0a0d'x), but code was copied to Linux where it was run, and Linux ends its lines with line feed only ('0a'x), so the field contained the carriage return.  I needed to strip the carriage return from the field. I added this line within the data step that created the file.

  my_field = TRANWRD(my_field,'0D'x,'');




SAS: Hash table lookup

SAS hash table lookup example taken from http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#n1b4cbtmb049xtn1vh9x4waiioz4.htm and included in this bog so I can find it easily.


data match;
   length k 8;
   length s 8;
   if _N_ = 1 then do;
      /* load SMALL data set into the hash object */
     declare hash h(dataset: "work.small");
      /* define SMALL data set variable K as key and S as value */
      h.defineKey('k');
      h.defineData('s');
      h.defineDone();
      /* avoid uninitialized variable notes */
      call missing(k, s);
   end;

/* use the SET statement to iterate over the LARGE data set using */
/* keys in the LARGE data set to match keys in the hash object */
set large;
rc = h.find();
if (rc = 0) then output;
run;






Tuesday, September 15, 2015

SAS: SAS on the Mac

I have little experience with the Mac, and no experience with SAS on the Mac. Some of my students had trouble running SAS on the Mac. One of my students sent me this message. Saving it here just in case...



1. Exit the VM

2. Go to 'Terminal'

3. Type    sudo nvram boot-args=debug=0x10

4. Run the VM again and the bug will be fixed



R: Dataframe slice example

R dataframe slice example.

Note: slice = whole[rows satisfying this condition, tuple of column names]



    > head(mywhole)
      obs ew id ba x
    1   1  E  1  B 8
    2   2  E  1  A 9
    3   3  E  2  B 4
    4   4  E  2  A 3
    5   5  E  3  B 6
    6   6  E  3  A 7
    > myslice = mywhole[ba=="B", c("id","x")]
    > myslice
       id x
    1   1 8
    3   2 4
    5   3 6
    7   4 6
    9   5 5
    11  6 9
    13  7 6
    15  8 5
    17  9 7
    19 10 9





Tuesday, July 7, 2015

SAS: Using "in" within a macro %if



This was new to me…want to use an “in” within a macro “%if”?  Then you need to use the sas option minoperator.  I found a use for it as follows:

options minoperator; * required for in within a macro if ;   
%if (&RUN_TIME_ROUNDED in &GLBL_EMAIL_DASHBOARD_RECON_WHEN) %then %do;

Where

%let GLBL_EMAIL_DASHBOARD_RECON_WHEN = (21600 43200 64800);

Note “in” values are space-delimited. Comma-delimited requires yet another macro option (not system option).  Details at http://support.sas.com/kb/35/591.html




Wednesday, June 3, 2015

SAS: Create an empty file

There are two ways to create an empty SAS dataset having the same fields and formats as an existing data set. I knew about the DATA step way (shown first) but did not know about the PROC SQL way (shown second):


data work.orig;
a = 1; b = 2; c = 3; output;
a = 2; b = 3; c = 4; output;
format b 3.1 c 4.2; * to show format is copied also ;
run;

data work.copy1;
set work.orig;
delete;
stop;
run;

proc contents data=work.copy1;
run;

proc sql noprint;
create table work.copy2 like work.orig;
quit;
run;

proc contents data=work.copy2;
run;





Tuesday, June 2, 2015

SAS: PROC PWENCODE



Something that has always bothered me is that the passwords for our databases are readable to anyone who can see our SAS code.

I define all database user IDs and passwords as macro variables in a constants file which is %included in my programs.  They are used in the libname statement as follows:

libname XXXLIB odbc datasrc=oracle user=&GLBL_ORACLE_USERID password="&GLBL_ORACLE_PASSWORD" schema=XXX;


A coworker was nice enough to point me to PROC PWENCODE, which I was not previously aware of.

This proc encodes a password. The encoded password appears in the log, and is stored as macro variable _PWENCODE. 

I decided to try it with a UP file to see if it worked.  Here is my code:


* encode the real password... ;
proc pwencode in="&GLBL_ORACLE_PASSWORD";
run;

%put The encoded password is &_PWENCODE;
%let GLBL_ORACLE_PASSWORD = &_PWENCODE;

libname XXXLIB odbc datasrc=oracle user=&GLBL_ORACLE_USERID password="&GLBL_ORACLE_PASSWORD" schema=XXX;


The encoded password looks like this – the {SAS002} prefix is recognized by the SAS access engine and decoded at that time:

The encoded password is {SAS002}8083755C07C000D90BB19F7A20A761E9


So I could encode the password in a one-time program, then store that password (with the {SAS002} prefix) as GLBL_ORACLE_PASSWORD in my constants file.

Of course, it would still be plain text in the constants file, but at least it would be useable only thru SAS, and we would not be exposing the “real” password to the whole world.






Saturday, April 11, 2015

Excel: Generate random numbers following a Normal distribution


Here's how to generate random numbers following a Normal distribution using Excel:

=norminv(rand(),mean,stdev)



Thursday, March 26, 2015

SAS: FTP an external file (flat file)

I had a lot of trouble with SAS' FTP access engine. It appeared to be randomly corrupting the data.  This worked though. I relied heavily on the indicated web page, and then converted it to a macro.


* ----------------------------------------------------- ;
*              M A C R O :  F T P F I L E
* ----------------------------------------------------- ;
*  FTP the hard way...but to overcome failure of the
*  FTP access engine, which corrupted the data.
*  Source: https://communities.sas.com/message/114111
*  Bill Qualls, 20150326
* ----------------------------------------------------- ;

%macro ftpfile(folder=&RAWDATA_FOLDER, file=, ftpcom=&RAWDATA_FOLDER/ftpcom.txt);

* file to which ftp commands will be written ;
filename ftpcom "&FTPCOM" lrecl=80;

data _null_;
file ftpcom;

* ftp ession user and password ;
put "user theuser thepassword";

* disable prompting cmd ;
put "prom no";

* enable binary transfer mode cmd ;
put "bin";

* go to local folder cmd ;
put "lcd &FOLDER";

* go to remote folder cmd ;
put "cd OurFolderOnFTPServer";

* send file cmd ;
put "send &file";

run;


* set up piping into ftp commands into ftp instruction ;
filename doftp pipe "ftp -n ftp.xxxxxxxx.com < &FTPCOM";


* enable ftp session and pass it the commands file ;
data _null_;
infile doftp;
input;
put _infile_;
run;

%mend ftpfile;

Monday, March 23, 2015

Linux: Zipping files in Linux

My co-worker Ken shared this with me.



Here is how I am zipping files:

First, navigate to the directory where the files are located. Then, 

zip –mT  abc  abc*

This creates the archive abc.zip, then adds to it all files in the current directory named like abc*, then tests the archive (the T option), then if test is OK, deletes the files (effectively moving the files to the zip archive (the –m option)).