Thursday, June 20, 2019

Python: Pandas read_csv encoding

I was unable to read a client's data file as I normally would due to odd encoding.

Normally I would open the files with Notepad++ to convert encoding, but all but one file was too large to open with Notepad++. The actual encoding for the one file which I could open was "UCS-2 LE BOM".

In order to read that with Pandas read_csv must use: encoding="utf_16_le"

df = pd.read_csv(IMPORT_FILE, sep="\t", low_memory=False, encoding="utf_16_le")






Monday, May 20, 2019

PySpark: counts with commas

PySpark: counts with commas.


print("There are {:,} rows.".format(df.count()))



Thursday, February 21, 2019

SAS: put (_all_)(=/);



From my co-worker Ken A:


I just now discovered   put (_all_)(=/);

Makes a nice list of data step variables in the log.



Wednesday, January 16, 2019

SAS: Eliminate duplicate records without variables list


SAS: Eliminate duplicate records without variables list


Trick: Use noduprecs on the proc sort statement, and _ALL_ in the by statement.


proc sort data=work.with_dupes out=work.without_dupes noduprecs;
by _ALL_;
run;







Wednesday, January 9, 2019

SAS: Issues with creating macro variables within a parameterized macro?





Issues with creating macro variables within a parameterized macro?


See also http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#tw3514-symput.htm



SOURCE CODE FOLLOWS:


%macro with_args(x=);



     data _null_;

     call symputx("macvar1", "1");

     run;



     %put In &=macvar1;



     data _null_;

     var = "&macvar1";

     put var=;

     run;



%mend with_args;





%macro no_args;



     data _null_;

     call symputx("macvar2", "2");

     run;



     %put In &=macvar2;



     data _null_;

     var = "&macvar2";

     put var=;

     run;



%mend no_args;



%with_args(x=junk);

%no_args;



%put Out &=macvar1;

%put Out &=macvar2;






LOG FOLLOWS:


MLOGIC(WITH_ARGS):  Beginning execution.

MLOGIC(WITH_ARGS):  Parameter X has value junk

MPRINT(WITH_ARGS):   data _null_;

MPRINT(WITH_ARGS):   call symputx("macvar1", "1");

MPRINT(WITH_ARGS):   run;



NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     



MLOGIC(WITH_ARGS):  %PUT In &=macvar1

SYMBOLGEN:  Macro variable MACVAR1 resolves to 1

In MACVAR1=1

MPRINT(WITH_ARGS):   data _null_;

SYMBOLGEN:  Macro variable MACVAR1 resolves to 1

MPRINT(WITH_ARGS):   var = "1";

MPRINT(WITH_ARGS):   put var=;

MPRINT(WITH_ARGS):   run;



VAR=1

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     



MLOGIC(WITH_ARGS):  Ending execution.

MLOGIC(NO_ARGS):  Beginning execution.

MPRINT(NO_ARGS):   data _null_;

MPRINT(NO_ARGS):   call symputx("macvar2", "2");

MPRINT(NO_ARGS):   run;



NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     



MLOGIC(NO_ARGS):  %PUT In &=macvar2

SYMBOLGEN:  Macro variable MACVAR2 resolves to 2

In MACVAR2=2

MPRINT(NO_ARGS):   data _null_;

SYMBOLGEN:  Macro variable MACVAR2 resolves to 2

MPRINT(NO_ARGS):   var = "2";

MPRINT(NO_ARGS):   put var=;

MPRINT(NO_ARGS):   run;



VAR=2

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     



MLOGIC(NO_ARGS):  Ending execution.

WARNING: Apparent symbolic reference MACVAR1 not resolved.

Out macvar1

SYMBOLGEN:  Macro variable MACVAR2 resolves to 2

Out MACVAR2=2






Thursday, January 3, 2019

SAS: PROC FORMAT PICTURE example.



SAS: PROC FORMAT PICTURE example


proc format ;
 picture ExcelDate
  LOW - HIGH = '%0m/%0d/%Y %0H:%0M %p' ( DATATYPE = DATETIME )
  ;
run;


data _NULL_;
set work.some_date;
EFFECTIVE_DTTM_TEXT = put( EFFECTIVE_DTTM , ExcelDate. -L ) ;
put EFFECTIVE_DTTM_TEXT=;
run;