-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Closed
Labels
IO SASSAS: read_sasSAS: read_sasPerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
Very excited to have this new feature in Pandas. I have a few comments to share:
pd.read_sas()doesn't read SAS date variable correctly (this is noted in the doc). Dates are read asnumpy.float64. Note in SAS, dates are recorded as numbers relative to 1960-1-1. It would be helpful to allow some sort of arguments to parse the date variable correctly.- Moreover, SAS has some special missing variables such as
.Bor.R. I wonder how are these cases treated? - Not nearly as fast as
read_csv(). To read a 700MB SAS data. The time is
CPU times: user 1min 47s, sys: 955 ms, total: 1min 48s
Wall time: 1min 48s
The time for the same CSV file (I covered the same file to CSV using SAS) is
CPU times: user 3.93 s, sys: 343 ms, total: 4.28 s
Wall time: 4.29 s
Metadata
Metadata
Assignees
Labels
IO SASSAS: read_sasSAS: read_sasPerformanceMemory or execution speed performanceMemory or execution speed performance