Saturday, November 21, 2015
SAS: SQLOBS and creating macro variables within PROC SQL
I just came across the automatic macro variable SQLOBS...this example is fully self-contained...just copy it into SAS and run...
* ------------------------------------------------------------ ;
* DEMO USE OF SQLOBS WHEN CREATING MACRO VARIABLES FROM DATA
* ------------------------------------------------------------ ;
* CREATE TEST DATASET... ;
data work.grandkids;
input name $ gender $;
datalines;
Kamina F
Raelani F
Elliott M
Calliope F
Henni F
;
run;
* SHOW TEST DEMO DATASET... ;
proc print data=work.grandkids;
run;
* CREATE MACRO VARIABLES THE OLD WAY... ;
data _null_;
set work.grandkids end=eof;
call symput("name" || trim(left(_n_)), name);
call symput("gender" || trim(left(_n_)), gender);
if eof then call symput("howmany", trim(left(_n_)));
run;
%put howmany=&howmany;
* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
if (symget("gender" || trim(left(i))) = "F") then do;
name = symget("name" || trim(left(i)));
output;
end;
end;
run;
* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;
* CREATE MACRO VARIABLES THE NEW WAY... ;
proc sql noprint;
select name, gender
into :name1 - :name999, :gender1 - :gender999
from work.grandkids;
quit;
run;
* SHOW SQLOBS...THE WHOLE POINT OF THIS EXAMPLE... ;
%put sqlobs=&sqlobs;
* REASSIGN TO SQLOBS TO HOWMANY AND RUN OLD CODE AGAIN... ;
%let howmany = %eval(&sqlobs);
%put howmany=&howmany;
* SELECT GRANDDAUGHTERS (ONLY)... ;
data work.granddaughters (keep=name);
attrib name length=$8;
do i = 1 to &howmany;
if (symget("gender" || trim(left(i))) = "F") then do;
name = symget("name" || trim(left(i)));
output;
end;
end;
run;
* SHOW GRANDDAUGHTERS... ;
proc print data=work.granddaughters;
run;
Wednesday, November 11, 2015
Data Minng: Jaccard Similarity for bags
I was already familiar with the Jaccard Similarity index but not Jaccard Similarity for bags.
Source: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf
"If ratings are 1-to-5-stars, put a movie in a customer’s set n times if they rated the movie n-stars. Then, use Jaccard similarity for bags when measuring the similarity of customers. The Jaccard similarity for bags B and C is defined by counting an element n times in the intersection if n is the minimum of the number of times the element appears in B and C. In the union, we count the element the sum of the number of times it appears in B and in C."
"Example 3.2 : The bag-similarity of bags {a, a, a, b} and {a, a, b, b, c} is 1/3. The intersection counts a twice and b once, so its size is 3. The size of the union of two bags is always the sum of the sizes of the two bags, or 9 in this case. Since the highest possible Jaccard similarity for bags is 1/2, the score of 1/3 indicates the two bags are quite similar, as should be apparent from an examination of their contents."
Subscribe to:
Posts (Atom)