Saturday, November 21, 2015

SAS: SQLOBS and creating macro variables within PROC SQL



I just came across the automatic macro variable SQLOBS...this example is fully self-contained...just copy it into SAS and run...


* ------------------------------------------------------------ ;
DEMO USE OF SQLOBS WHEN CREATING MACRO VARIABLES FROM DATA
* ------------------------------------------------------------ ;

* CREATE TEST DATASET... ;
data work.grandkids;
input name $ gender $;
datalines;
Kamina F
Raelani F
Elliott M
Calliope F
Henni F
;
run;

* SHOW TEST DEMO DATASET... ;
proc print data=work.grandkids;
run;

* CREATE MACRO VARIABLES THE OLD WAY... ;
data _null_;
set work.grandkids end=eof;
call symput("name" || trim(left(_n_)), name);
call symput("gender" || trim(left(_n_)), gender);
if eof then call symput("howmany", trim(left(_n_)));
run;

%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;

* CREATE MACRO VARIABLES THE NEW WAY... ;
proc sql noprint;
select name, gender
into :name1 - :name999, :gender1 - :gender999
from work.grandkids;
quit;
run;

* SHOW SQLOBS...THE WHOLE POINT OF THIS EXAMPLE... ;
%put sqlobs=&sqlobs;

* REASSIGN TO SQLOBS TO HOWMANY AND RUN OLD CODE AGAIN... ;
%let howmany = %eval(&sqlobs);
%put howmany=&howmany;

* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
    if (symget("gender" || trim(left(i))) = "F") then do;
        name = symget("name" || trim(left(i)));
        output;
    end;
end;
run;

* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;





Wednesday, November 11, 2015

Data Minng: Jaccard Similarity for bags


I was already familiar with the Jaccard Similarity index but not Jaccard Similarity for bags.

Source: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

"If ratings are 1-to-5-stars, put a movie in a customer’s set n times if they rated the movie n-stars. Then, use Jaccard similarity for bags when measuring the similarity of customers. The Jaccard similarity for bags B and C is defined by counting an element n times in the intersection if n is the minimum of the number of times the element appears in B and C. In the union, we count the element the sum of the number of times it appears in B and in C."

"Example 3.2 : The bag-similarity of bags {a, a, a, b} and {a, a, b, b, c} is 1/3. The intersection counts a twice and b once, so its size is 3. The size of the union of two bags is always the sum of the sizes of the two bags, or 9 in this case. Since the highest possible Jaccard similarity for bags is 1/2, the score of 1/3 indicates the two bags are quite similar, as should be apparent from an examination of their contents."