Skip to content

Allow for unbuffered reading #10

@sk1p

Description

@sk1p

In this dask/hdfs3 issue I detailed that for some I/O-bound workloads, buffering and copying can become a bottleneck. For these cases it would be nice if the buffering could be disabled. I found out that setting input.localread.default.buffersize to a small number and disabling CRC verification results zero memory copies, that is, data is read directly into the buffer provided by the user. That should be exactly the case if this condition is hit.

I think the best long-term way would be to compute the CRC on the user buffer somehow, but in the meantime, it would be nice if verification could be disabled from the C API.

I propose adding a new configuration property, let's say input.read.default.verify, to set the value for verification when opening the file via the C API

Now, my question is: does the configuration only apply to this one place in libhdfs3, or others? And: can we just invent our own configuration parameters - is the namespace somehow divided such that libhdfs3 owns input.*?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions