Thursday, June 20, 2019

Python: Pandas read_csv encoding

I was unable to read a client's data file as I normally would due to odd encoding.

Normally I would open the files with Notepad++ to convert encoding, but all but one file was too large to open with Notepad++. The actual encoding for the one file which I could open was "UCS-2 LE BOM".

In order to read that with Pandas read_csv must use: encoding="utf_16_le"

df = pd.read_csv(IMPORT_FILE, sep="\t", low_memory=False, encoding="utf_16_le")