Skip to content

Conversation

assafvayner
Copy link

Porting xet specification to final location in hub docs (from xet-core github).

Includes complete specification for xet protocol.

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would maybe make sense to nest under https://huggingface.co/docs/hub, but what do you think?

@assafvayner
Copy link
Author

i think it would maybe make sense to nest under https://huggingface.co/docs/hub, but what do you think?

That works for me. I was initially going to the path /docs/xet but /docs/hub/xet works too, (I'm not very opinionated on this).

Still also thinking where to link into the xet spec out of the other docs (storage-backends page 👍 , but anything else TBD)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@assafvayner
Copy link
Author

Is it at all possible to replace the table of contents left-side sidebar on the /xet/... pages while it still being part of the "hub" scope docs?

I don't think it makes sense to add all the new pages to the sidebar on all of the hub documentation pages, but while on the specification I do think users should be able to navigate between all the pages.

@assafvayner assafvayner changed the title [WIP] xet protocol specification in hub docs xet protocol specification in hub docs Sep 25, 2025
Copy link
Contributor

@rajatarya rajatarya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something about the parentage/tree of this that is weird. On the bottom of every page the 'Next page' isn't another protocol page but the root of HF docs. Not sure if that is related to PR view or TOC construction.


This leads to a worse developer experience along with a proliferation of additional storage.

## Open Source Xet Protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved up to BEFORE the Backwards Compatibility with LFS section. Putting at the bottom of the page buries its discoverability. The flow of the page is 'overview, usage, protocol, backwards compat, security, legacy'

The primary limitation of Git LFS is its file-centric approach to deduplication. Any change to a file, irrespective of how large of small that change is, means the entire file is versioned - incurring significant overheads in file transfers as the entire file is uploaded (if committing to a repository) or downloaded (if pulling the latest version to your machine).

This leads to a worse developer experience along with a proliferation of additional storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier section (not sure how to get comment over to area not in diff, sigh) - Security Model: I think you can deep link into protocol to refer to areas of the protocol that speak to the privacy preserving global dedup.

Copy link
Author

@assafvayner assafvayner Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's 2 things to link to in regards to "security model"

A. in deduplication.md the very short section titled #### HMAC Security Mechanism is relevant to what you described.
B. the auth.md file as a whole if it's desired to show how the hub is the source of truth on auth for CAS

Copy link
Contributor

@jsulz jsulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Just did a first pass mostly focused on little nits/inconsistencies. Planning on doing another pass tomorrow.

I agree that navigation is something to address - @rajatarya's approach in #1956 seems promising.

Do we agree that "Xet" should be uppercased in all instances?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants