Skip to content

Conversation

@dasimmet
Copy link
Contributor

@dasimmet dasimmet commented Oct 29, 2023

adds gnu tar FileTypes and handle GNUTYPE_LONGNAME by reading the next chunks into the filename buffer

test case:

$ zig fetch https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
1220862c0e5172fddd0095ea7d08c14beb6079c45d8750a016da45ab12745e7d876a

$ mkdir build
$ curl -L https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz | gunzip | tar x -C build

$ diff -r ~/.cache/zig/p/1220862c0e5172fddd0095ea7d08c14beb6079c45d8750a016da45ab12745e7d876a/ build/Python-3.12.0/
$

still missing GNUTYPE_LONGLINK support

@dasimmet dasimmet force-pushed the dasimmet-gnu-tar branch 2 times, most recently from b395151 to 6105c91 Compare October 29, 2023 20:54
@dasimmet dasimmet changed the title Fetch Support GNU specific tarball entries Fetch Support for tar GNUTYPE_LONGNAME chunks Oct 29, 2023
@dasimmet dasimmet marked this pull request as ready for review October 29, 2023 21:07
}
},
.GNUTYPE_LONGNAME => {
assert(std.mem.eql(u8, header.name(), "././@LongLink"));
Copy link
Contributor

@truemedian truemedian Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this name guaranteed? And why is this enum field not consistent with the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is mentioned here: https://ftp.gnu.org/old-gnu/Manuals/tar-1.12/html_node/tar_117.html
but i guess this check can be removed as the type should be enough to assume the following chunks to be a filename.

I just matched the Globals in the GNU standard. By "consistent", you mean it should be lowercased, without "GNUTYPE_", or what do you propose?
I thought it may be useful to be able to distinguish which standards the different values come from.

Copy link
Contributor

@truemedian truemedian Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename seems to be nothing more than a fallback for tar implmentations that don't handle GNU fields, a longname header is identified uniquely by the type.

At the very least the enum fields should follow the style guide and be lowercase.

assert(std.mem.eql(u8, header.name(), "././@LongLink"));

while (true) {
const filename_chunk = try buffer.readChunk(reader, 512);
Copy link
Contributor

@truemedian truemedian Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't simply read the data straight into the filename buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll try it, thanks. Its just what I came up with thinking not advancing the header buffer may cause issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after looking into it, one reason to do it this way could be if we barely reach MAX_PATH_BYTES, the 512 byte padding or next chunk could overflow the filename buffer.
It is highly unlikely though, as MAX_PATH_BYTES probably is always a multiple of 512, only a path with exactly MAX_PATH_BYTES will need to read another chunk from the stream to find the null terminator and overflow the buffer

Copy link
Contributor

@truemedian truemedian Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unpadded length of the longname is given in the header, if it's more than MAX_PATH_BYTES, throw an error.

@andrewrk
Copy link
Member

Solved instead by #18261 which included tests.

@andrewrk andrewrk closed this Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants