Reading is slow when requesting more bytes than available

This may arguably be a questionable use case, but I noticed that `read(1 << 30)` (try to read 1 GiB from a 60 MB file) can be ~3x slower than `read(-1)`. I would have expected both to be equally fast.

Consider this benchmark:

```python3
import random
import time
import numpy as np
import fsspec.implementations.sftp
import sshfs

hostname = "127.0.0.1"
port = 22
file_path = 'silesia.tar.gz'

fsspec_fs = fsspec.implementations.sftp.SFTPFileSystem(hostname, port=port)
sshfs_fs = sshfs.SSHFileSystem(hostname, port=port)

file_size = len(sshfs_fs.open(file_path).read())
print(f"Test file sized: {file_size} B")

for chunk_size_in_KiB in [-1, 4 << 20, 2 << 20, 1 << 20, 512 * 1024, 128 * 1024, 4 * 1024, 32]:
    chunk_size = chunk_size_in_KiB * 1024 if chunk_size_in_KiB >= 0 else chunk_size_in_KiB
    print(f"Try to read {chunk_size_in_KiB} KiB")
    for open_file_name in ['fsspec_fs', 'sshfs_fs']:
        file = globals()[open_file_name].open(file_path)
        t0=time.time()
        size = 0
        for i in range((file_size + chunk_size - 1) // chunk_size if chunk_size > 0 else 1):
            size += len(file.read(chunk_size))
        t1=time.time()
        assert size == file_size
        file.close()
        print(
            f"Read {size / 1e6:.2f} MB in {chunk_size_in_KiB} KiB chunks with {open_file_name} "
            f"in {t1-t0:.2f} s -> {size/(t1-t0)/1e6:.2f} MB/s"
        )
```

Output:

```
Test file sized: 68238807 B
Try to read -1 KiB
Read 68.24 MB in -1 KiB chunks with fsspec_fs in 16.92 s -> 4.03 MB/s
Read 68.24 MB in -1 KiB chunks with sshfs_fs in 2.08 s -> 32.74 MB/s
Try to read 4194304 KiB
Read 68.24 MB in 4194304 KiB chunks with fsspec_fs in 50.17 s -> 1.36 MB/s
Read 68.24 MB in 4194304 KiB chunks with sshfs_fs in 25.15 s -> 2.71 MB/s
Try to read 2097152 KiB
Read 68.24 MB in 2097152 KiB chunks with fsspec_fs in 42.09 s -> 1.62 MB/s
Read 68.24 MB in 2097152 KiB chunks with sshfs_fs in 13.55 s -> 5.04 MB/s
Try to read 1048576 KiB
Read 68.24 MB in 1048576 KiB chunks with fsspec_fs in 42.13 s -> 1.62 MB/s
Read 68.24 MB in 1048576 KiB chunks with sshfs_fs in 7.37 s -> 9.26 MB/s
Try to read 524288 KiB
Read 68.24 MB in 524288 KiB chunks with fsspec_fs in 43.73 s -> 1.56 MB/s
Read 68.24 MB in 524288 KiB chunks with sshfs_fs in 4.67 s -> 14.62 MB/s
Try to read 131072 KiB
Read 68.24 MB in 131072 KiB chunks with fsspec_fs in 42.39 s -> 1.61 MB/s
Read 68.24 MB in 131072 KiB chunks with sshfs_fs in 2.37 s -> 28.78 MB/s
Try to read 4096 KiB
Read 68.24 MB in 4096 KiB chunks with fsspec_fs in 14.38 s -> 4.74 MB/s
Read 68.24 MB in 4096 KiB chunks with sshfs_fs in 2.03 s -> 33.63 MB/s
Try to read 32 KiB
Read 68.24 MB in 32 KiB chunks with fsspec_fs in 14.33 s -> 4.76 MB/s
Read 68.24 MB in 32 KiB chunks with sshfs_fs in 4.35 s -> 15.69 MB/s
```

So, trying to read 2 GiB, but only getting 60 MB, takes ~25 s, while it only takes 2 s with `read(-1)`.

I do not know whether this is an issue with this wrapper or asyncssh directly because I was unable to adjust my (synchronously running) benchmark to use asnycssh directly.

I would guess that some code tries to do something O(size) for whatever reason even when it should iterate in chunks according to the buffer size, I'd think. I don't see memory usage spiking, so at least nothing that large seems to get allocated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading is slow when requesting more bytes than available #691

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reading is slow when requesting more bytes than available #691

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions