Skip to content

flate change bit reader Bits to usize #24719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 7, 2025
Merged

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Aug 6, 2025

After #24704 32-bit platform build passes with usize Bits instead of u64. There are two tests which are testing invalid inputs and raise different error in 32 than on 64 bit platform now.

After ziglang#24704 32-bit platform build passes with usize Bits instead of
u64. There are two tests which are testing invalid inputs and raise
different error in 32 than on 64 bit platform now.
@andrewrk
Copy link
Member

andrewrk commented Aug 6, 2025

Hmm, it really should not be giving different error codes depending on target pointer size. Why does it not cause EndOfStream in the 32-bit case?

Don't see why byte returned from specialPeek needs to be shifted by
remaining_needed_bits.
I believe that decision in specialPeek should be done on the number of
the remaining bits not of the content of that bits.

Some test result are changed, but they are now consistent with the
original state as found in:
https://github.com/ziglang/zig/blame/5f790464b0d5da3c4c1a7252643e7cdd4c4b605e/lib/std/compress/flate/Decompress.zig

Changing Bits from usize to u32 or u64 now returns same results.
@@ -1032,7 +1031,7 @@ test "failing invalid-tree01" {
try testFailure(.raw, @embedFile("testdata/fuzz/invalid-tree01.input"), error.IncompleteHuffmanTree);
}
test "failing invalid-tree02" {
try testFailure(.raw, @embedFile("testdata/fuzz/invalid-tree02.input"), error.EndOfStream);
try testFailure(.raw, @embedFile("testdata/fuzz/invalid-tree02.input"), error.IncompleteHuffmanTree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it is end of stream, isn't it? why are we pretending there are zeroes after the stream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretending there are zeros after the stream happens to fix the two panics I found

#24695 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I pass this case through infgen:

$infgen invalid-tree02.input -dd -m
! infgen 3.5 output
!
last                    ! 1
dynamic                 ! 10
count 257 11 19         ! 1111 01010 00000
code 18 2               ! 010 000 000
code 0 2                ! 010
code 3 1                ! 001 000 000 000 000 000 000 000 000 000
zeros 128               ! 1110101 11 000 000 000 000 000
zeros 128               ! 1110101 11
lens 3                  ! 0
lens 0                  ! 01
lens 3                  ! 0
lens 0                  ! 01
lens 3                  ! 0
lens 0                  ! 01
lens 3                  ! 0
lens 3                  ! 0
lens 3                  ! 0
lens 3                  ! 0
lens 3                  ! 0
lens 3                  ! 0
! litlen 256 3
! dist 1 3
! dist 3 3
! dist 5 3
! dist 6 3
! dist 7 3
! dist 8 3
! dist 9 3
! dist 10 3
end                     ! 000
                        ! 00

It can be decoded but fails to generate tree.
But I still need to explain myself every bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slowly recollecting my understanding...

peekBits is used in two places; peekBits(u7) and peekBits(u15). It means that algorithm will need at most that amount of bits. After finding symbol it will know exact number of bits.
If it asks for 7 bits when there are less then 7 bits available peekBits should return that available bits. If there are 0 available bits then EndOfStream should be raised.

ref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok well I think this is better than status quo, but it is still probably wrong unit tests because if the stream ends abruptly, it should not be trying to construct a huffman tree, or doing a match.

`peekBits` returns at most asked number of bits. Fails with EndOfStream
when there are no available bits. If there are less bits available than
asked still returns that available bits.
Hopefully this change better reflects intention. On first input stream
peek error we break the loop.
@andrewrk andrewrk merged commit 6de2310 into ziglang:master Aug 7, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants