Skip to content
This repository was archived by the owner on Dec 28, 2024. It is now read-only.

Conversation

@emidoots
Copy link

@emidoots emidoots commented Nov 22, 2021

  • first commit - on my M1 Mac:
    • updated to latest macOS Monterey
    • updated to latest Xcode version
    • committed the result of fetch_them_macos_headers fetch
  • second commit - on my Intel Mac:
    • updated to latest macOS Monterey
    • updated to latest Xcode version
    • committed the result of fetch_them_macos_headers fetch

I've also sent a PR to Zig here: ziglang/zig#10202 - we can keep discussion about if this is the right thing to do over there

andrewrk added a commit that referenced this pull request Nov 23, 2021
@andrewrk
Copy link
Member

andrewrk commented Nov 23, 2021

Thanks for this- I merged it to a different branch called more-data where I have collected multiple versions of macos headers at once.

I played with the data a bit and I have a proposal to support multiple macOS versions. First, the observations that I made from exploring the data:

[nix-shell:~/dev/fetch-them-macos-headers]$ ls headers/
10  11  12

[nix-shell:~/dev/fetch-them-macos-headers]$ ls headers/{10,11,12}
headers/10:
x86_64-macos-gnu

headers/11:
aarch64-macos-gnu  x86_64-macos-gnu

headers/12:
aarch64-macos-gnu  x86_64-macos-gnu

[nix-shell:~/dev/fetch-them-macos-headers]$ du -sh headers/{10,11,12}
6.1M	headers/10   # note: this is smaller because there are no catalina headers for aarch64
13M	headers/11
13M	headers/12

If we naively shipped all 3 versions of these headers, it would come out to 32.1 MiB installation size, which is more than I am comfortable with. So what can we do?

Side note, the meld project is a really nice UI for exploring the diffs between directory trees.

Anyway my observations are:

  • The differences between versions (e.g. 11 and 12) are few, even on a per-file basis.
  • The differences between architectures (e.g. x86_64 and aarch64) are very few, even on a per-file basis.

I have not run a program to measure exactly but I think if we had this scheme, we could ship multiple versions without adding too much to the installation size:

  • Layer 1: x86_64-macos.10 x86_64-macos.11 x86_64-macos.12 aarch64-macos.11 aarch64-macos.12
  • Layer 2: x86_64-macos aarch64-macos
  • Layer 3: any-macos

I've removed the C ABI component from the triple here for macOS because it is not meaningful.

So how this works is that we have a script that takes in the data from this project, fetch-them-macos-headers, which contains an independent copy of the headers for every pair of {arch, version}. The script finds out which files are 100% in common between everything, and puts them in any-macos. Next it finds out which files are common between all versions of x86_64 and puts those into x86_64-macos. Likewise it puts files common between all versions of aarch64 and puts those into aarch64-macos. Finally, the leftover files go into directories that are specific to the {arch, version} tuple.

The Zig compiler's job, then is to pass -isystem layer1 -isystem layer2 -isystem layer3 where layer{1,2,3} are the appropriate directory, depending on the target OS minimum version and the target architecture.

One key feature of doing the compression scheme this way is that the zig compiler does not need to move files around on disk or involve a cache system; it is done entirely by making careful use of -isystem and overlaying include directories on top of one another.


Alternative layer scheme:

  • Layer 1: x86_64-macos.10 x86_64-macos.11 x86_64-macos.12 aarch64-macos.11 aarch64-macos.12
  • Layer 2: any-macos.10 any-macos.11 any-macos.12
  • Layer 3: any-macos

This one has the only difference in layer 2, which has differences for versions but not architectures. It's not obvious to me which way of organizing the layers would be better; I think we should use the script to try both schemes and use the one that results in a smaller installation size. My hunch is that the alternative scheme will be better, due to observation number 2.

@andrewrk
Copy link
Member

andrewrk commented Nov 23, 2021

I updated the script in d9d8958 and did the smallest amount I could to test out some results. I think the script is only capable of 2 layers without some more improvements, but the results are already promising:

andy@ark ~/D/z/l/l/include (master)> pwd
/home/andy/Downloads/zig/lib/libc/include
andy@ark ~/D/z/l/l/include (master)> du -sh *macos*
2.6M	aarch64-macos.11-gnu
1.1M	aarch64-macos.12-gnu
6.1M	any-macos-any
3.4M	x86_64-macos.10-gnu
2.6M	x86_64-macos.11-gnu
1.2M	x86_64-macos.12-gnu

This totals 17 MiB which is within what I would consider to be the acceptable range. Introducing a Layer 2 should improve it more. This is compared to status quo master branch of Zig which has 7 MiB installation size:

860K	aarch64-macos-gnu
5.4M	any-macos-any
912K	x86_64-macos-gnu

Also worth noting is that the fetch sub-command needs to be improved to detect the OS version and update the corresponding directory.


@kubkon and @slimsag: what are your thoughts on this proposal?

@emidoots
Copy link
Author

After mulling this around, I agree this is likely the best approach.

In the future when the package manager distributes .tbd framework stubs, we will likely want to do the same thing.

One thing that is unclear to me from your target triple examples above: what version does the target triple e.g. x86_64-macos default to? I think that choosing a sensible default is actually the most important thing here because if we don't we will end up with Zig libraries in the ecosystem requiring different versions (sometimes without knowing it) and thus being incompatible with one another. e.g. I would personally like to make sure I am using the same version as most others in the Zig community where possible for maximum compatibility.

@andrewrk
Copy link
Member

You can find the answer to this question programmatically by using zig build-exe --show-builtin -target x86_64-macos. The generated builtin.zig file will be printed to stdout and show the version range. This logic can be adjusted if necessary. You can also pass an explicit version range in the target triple.

@kubkon
Copy link
Member

kubkon commented Nov 23, 2021

@slimsag @andrewrk thanks so much for your work and input - it's very valuable! I'll take it over from here and hopefully finish the script so that it generates the desired 3-layer solution. I'll then move onto updating the zig's frontend to make use of this. In terms of the default, this really matters mainly for cross-compilation environments as when building natively, we fallback to the native sysroot (unless it's not there, that is; e.g., when the user didn't install either the CLT or Xcode). In cross-compilation environments, I think by default we should target the oldest supported macOS version which as far as I understand should run on every version above it - for x86_64 that'd be 10.5, and aarch64 11. The user will then have the option to opt in to any explicitly specified version provided it's one of the supported, latest 3 macOS releases. This will also be possible when compiling natively. Does that make sense, and would you agree?

@kubkon
Copy link
Member

kubkon commented Nov 24, 2021

EDIT: Made a mistake in the code, here are corrected numbers again.

Some prelim results for alternative layering:

fetch-them-macos-headers/dedup on  more-data [!?]
❯ du -d1 -h
3.2M    ./any-macos.12-any
1.7M    ./any-macos.11-any
252K    ./x86_64-macos.12-gnu
3.4M    ./x86_64-macos.10-gnu
888K    ./x86_64-macos.11-gnu
196K    ./aarch64-macos.12-gnu
836K    ./aarch64-macos.11-gnu
3.5M    ./any-macos-any
 14M    .

This scheme totals 14M, max per subdir is 3.5M as you can see.

I'll now investigate alternative 1 where we create layer 2 based on architecture rather than OS version.

@emidoots
Copy link
Author

SGTM

also just to reiterate, I no longer have any immediate need for this change so if you're doing any of this on my behalf.. :D

@kubkon
Copy link
Member

kubkon commented Nov 24, 2021

Results for alternative 1:

fetch-them-macos-headers/dedup on  more-data [!?]
❯ du -d1 -h
8.0K    ./aarch64-macos-any
896K    ./x86_64-macos-any
3.4M    ./x86_64-macos.12-gnu
3.3M    ./x86_64-macos.10-gnu
2.5M    ./x86_64-macos.11-gnu
3.4M    ./aarch64-macos.12-gnu
3.3M    ./aarch64-macos.11-gnu
2.7M    ./any-macos-any
 19M    .

Clearly, alternative 2 wins and by a lot.

@kubkon
Copy link
Member

kubkon commented Nov 24, 2021

Merged in 46310be.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants