I was unable to read a client's data file as I normally would due to odd encoding.
Normally I would open the files with Notepad++ to convert encoding, but all but one file was too large to open with Notepad++. The actual encoding for the one file which I could open was "UCS-2 LE BOM".
In order to read that with Pandas read_csv must use: encoding="utf_16_le"
df = pd.read_csv(IMPORT_FILE, sep="\t", low_memory=False, encoding="utf_16_le")
Thursday, June 20, 2019
Monday, May 20, 2019
PySpark: counts with commas
PySpark: counts with commas.
print("There are {:,} rows.".format(df.count()))
Thursday, February 21, 2019
SAS: put (_all_)(=/);
From my co-worker Ken A:
I just now discovered put (_all_)(=/);
Makes a nice list of data step variables in the log.
Wednesday, January 16, 2019
SAS: Eliminate duplicate records without variables list
SAS: Eliminate duplicate records without variables list
Trick: Use noduprecs on the proc sort statement, and _ALL_ in the by statement.
proc sort data=work.with_dupes out=work.without_dupes noduprecs;
by _ALL_;
run;
Wednesday, January 9, 2019
SAS: Issues with creating macro variables within a parameterized macro?
Issues with creating macro variables within a parameterized macro?
See also http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#tw3514-symput.htm
SOURCE CODE FOLLOWS:
%macro with_args(x=);
data _null_;
call symputx("macvar1", "1");
run;
%put In &=macvar1;
data _null_;
var = "&macvar1";
put var=;
run;
%mend with_args;
%macro no_args;
data _null_;
call symputx("macvar2", "2");
run;
%put In &=macvar2;
data _null_;
var = "&macvar2";
put var=;
run;
%mend no_args;
%with_args(x=junk);
%no_args;
%put Out
&=macvar1;
%put Out
&=macvar2;
LOG FOLLOWS:
MLOGIC(WITH_ARGS): Beginning execution.
MLOGIC(WITH_ARGS): Parameter X has value junk
MPRINT(WITH_ARGS): data _null_;
MPRINT(WITH_ARGS): call symputx("macvar1",
"1");
MPRINT(WITH_ARGS): run;
NOTE: DATA
statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MLOGIC(WITH_ARGS): %PUT In &=macvar1
SYMBOLGEN: Macro variable MACVAR1 resolves to 1
In MACVAR1=1
MPRINT(WITH_ARGS): data _null_;
SYMBOLGEN: Macro variable MACVAR1 resolves to 1
MPRINT(WITH_ARGS): var = "1";
MPRINT(WITH_ARGS): put var=;
MPRINT(WITH_ARGS): run;
VAR=1
NOTE: DATA
statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MLOGIC(WITH_ARGS): Ending execution.
MLOGIC(NO_ARGS): Beginning execution.
MPRINT(NO_ARGS): data _null_;
MPRINT(NO_ARGS): call symputx("macvar2",
"2");
MPRINT(NO_ARGS): run;
NOTE: DATA
statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MLOGIC(NO_ARGS): %PUT In &=macvar2
SYMBOLGEN: Macro variable MACVAR2 resolves to 2
In MACVAR2=2
MPRINT(NO_ARGS): data _null_;
SYMBOLGEN: Macro variable MACVAR2 resolves to 2
MPRINT(NO_ARGS): var = "2";
MPRINT(NO_ARGS): put var=;
MPRINT(NO_ARGS): run;
VAR=2
NOTE: DATA
statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MLOGIC(NO_ARGS): Ending execution.
WARNING: Apparent
symbolic reference MACVAR1 not resolved.
Out macvar1
SYMBOLGEN: Macro variable MACVAR2 resolves to 2
Out MACVAR2=2
Thursday, January 3, 2019
SAS: PROC FORMAT PICTURE example.
SAS: PROC FORMAT PICTURE example
proc format ;
picture ExcelDate
LOW - HIGH = '%0m/%0d/%Y %0H:%0M %p' ( DATATYPE = DATETIME )
;
run;
data _NULL_;
set work.some_date;
EFFECTIVE_DTTM_TEXT = put( EFFECTIVE_DTTM , ExcelDate. -L ) ;
put EFFECTIVE_DTTM_TEXT=;
run;
Thursday, August 23, 2018
SAS: Time each task
%put >>>>>--------------------------------------------------------------------------------------------- ;
%put >>>>> Begin PROC FREQ at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
%put >>>>>--------------------------------------------------------------------------------------------- ;
proc freq data=work.event_codes;
tables event_code;
run;
%put >>>>>--------------------------------------------------------------------------------------------- ;
%put >>>>> Job complete at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
%put >>>>>--------------------------------------------------------------------------------------------- ;
Subscribe to:
Posts (Atom)