-
Notifications
You must be signed in to change notification settings - Fork 51
Description
In this dask/hdfs3 issue I detailed that for some I/O-bound workloads, buffering and copying can become a bottleneck. For these cases it would be nice if the buffering could be disabled. I found out that setting input.localread.default.buffersize to a small number and disabling CRC verification results zero memory copies, that is, data is read directly into the buffer provided by the user. That should be exactly the case if this condition is hit.
I think the best long-term way would be to compute the CRC on the user buffer somehow, but in the meantime, it would be nice if verification could be disabled from the C API.
I propose adding a new configuration property, let's say input.read.default.verify, to set the value for verification when opening the file via the C API
Now, my question is: does the configuration only apply to this one place in libhdfs3, or others? And: can we just invent our own configuration parameters - is the namespace somehow divided such that libhdfs3 owns input.*?