Skip to content

Conversation

@ianic
Copy link
Contributor

@ianic ianic commented Dec 11, 2023

Closes #14310.

Go test cases are copied to lib/std/tar/testdata folder.

New functionality added to pass tests:

  • reading link_name from pax attribute
  • reading file size from pax attribute
  • handling sizes greater than 8GB in gnu format
  • gnu extended headers for name, link_name; L and K header types
  • calculating and checking header checksum
  • few pax attribute checks: no null in key, break value on first null, attribute ending with newline

This passes all relevant Go tests expect those connected with sparse files. Those functionality is not yet implemented. We are using this for package manager, meaning that most of the files will be source code files, pretty unlikely that we will find there tar with sparse files.

@ianic ianic force-pushed the tar_tests branch 2 times, most recently from 9ae3edd to bf53d4e Compare December 20, 2023 10:08
ianic added 28 commits January 13, 2024 19:37
Split reading/parsing tar file and writing results to the disk in two
separate steps. So we can later test parsing part without need to write
everyting to the disk.
Move reader into Buffer and make it BufferedReader. This doesn't
introduce any new functionality just grouping similar things.
Just adding tests, without changing functionality.
Name of symbolic link can be also found in pax attribute.
Make it more readable.
Reference:
https://www.gnu.org/software/tar/manual/html_node/Extensions.html#Extensions

If the leading byte is 0x80 (128), the non-leading bytes of the field
are concatenated in big-endian order, with the result being a positive
number expressed in binary form.
That makes names strings stable during the iteration. Otherwise string
buffers can be overwritten while reading file content.
To make it little easier to filter from all stdlib tests.
So we have information to set executable bit on write to file system.
Make it little readable.
Use explicit buffers for name, link_name instead.
It is cleaner that way.
Create std/tar/test.zig for test which uses cases from testdata.
Like in other tests which uses testdata files (compress). That enables
wasi testing also, was failing because file system operations in tests.
Itarator has `next` function, iterates over tar files. When using from
outside of module with `tar.` prefix makes more sense.

var iter = tar.iterator(reader, null);
while (try iter.next()) |file| {
...
}
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
@andrewrk andrewrk merged commit d55d1e3 into ziglang:master Jan 14, 2024
@andrewrk
Copy link
Member

Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

std.tar: copy another project's full test suite and make them all pass

2 participants