Skip to content

[Bots] Web Bot Auth docs #23099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: production
Choose a base branch
from
Open

[Bots] Web Bot Auth docs #23099

wants to merge 5 commits into from

Conversation

Oxyjun
Copy link
Contributor

@Oxyjun Oxyjun commented Jun 18, 2025

Summary

We're now introducing "Web Bot Auth" (WBA), which is a more secure authentication method for verifying bots. Previously, verifying bots was only possible through two flavours of IP validation, public IP list, and reverse DNS.

Web Bot Auth may become the new IETF standard in the near future, and paves the way for better bot detection across the Internet.

This PR restructures and adds information for WBA. Specifically:

  1. Creates a new chapter Verified bots requirements, which explains what's involved for a bot to be verified.
  2. Cuts down the existing Verified bots policy chapter to only talk about the policy
  3. Creates a new chapter Verification methods, which :
    • Guides users through verifying their bot via WBA
    • Guides users through verifying their bot via IP validation (text taken from existing Policy chapter)
    • Talks about generic user agents (text taken from existing Policy chapter)
    • Points users to other useful resources under "Additional resources" header.

Things that we need to improve:

  1. Link to blog
  2. Possible restructure such that WBA is its own page (and how this affects rest of IA)
  3. Consider absorbing Categories chapter into Verification methods (or somewhere else) Decided against after discussion.
  4. Consider defining the three main required headers for WBA into its own subheadings (such that they can be hyperlinked). This relates to point (2).
  5. Work out where the FAQs should go. Possibly a new Reference > FAQs > "WBA FAQs"?

Screenshots (optional)

Documentation checklist

  • The documentation style guide has been adhered to.
  • If a larger change - such as adding a new page- an issue has been opened in relation to any incorrect or out of date information that this PR fixes.
  • Files which have changed name or location have been allocated redirects.

@Oxyjun Oxyjun requested review from patriciasantaana and a team as code owners June 18, 2025 16:39
@github-actions github-actions bot added the product:bots Related to Bots product label Jun 18, 2025
Copy link
Contributor

hyperlint-ai bot commented Jun 18, 2025

Howdy and thanks for contributing to our repo. The Cloudflare team reviews new, external PRs within two (2) weeks. If it's been two weeks or longer without any movement, please tag the PR Assignees in a comment.

We review internal PRs within 1 week. If it's something urgent or has been sitting without a comment, start a thread in the Developer Docs space internally.


PR Change Summary

Introduced Web Bot Auth (WBA) documentation, enhancing bot verification methods.

  • Created a new chapter on Verified Bots Requirements detailing verification criteria.
  • Restructured the Verified Bots Policy chapter to focus solely on policy aspects.
  • Added a new chapter on Verification Methods, outlining WBA and IP validation processes.

Modified Files

  • src/content/docs/bots/concepts/bot/verified-bots/categories.mdx
  • src/content/docs/bots/concepts/bot/verified-bots/policy.mdx

Added Files

  • src/content/docs/bots/concepts/bot/verified-bots/requirements.mdx
  • src/content/docs/bots/concepts/bot/verified-bots/verification.mdx

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

Copy link
Contributor

github-actions bot commented Jun 18, 2025

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
/public/__redirects @GregBrimble, @KianNH, @pedrosousa, @WalshyDev, @cloudflare/pcx-technical-writing
/src/content/docs/bots/ @patriciasantaana, @cloudflare/pcx-technical-writing

Copy link
Contributor

github-actions bot commented Jun 18, 2025

Copy link
Contributor

github-actions bot commented Jun 19, 2025

This PR changes current filenames or deletes current files. Make sure you have redirects set up to cover the following paths:

  • /bots/troubleshooting/frequently-asked-questions/

You need to host a key directory which creates a way for Cloudflare to authenticate your bot's requests.

<Steps>
1. Host a key directory at a well known message signatures directory. The key directory should serve a JSON Web Key Set (JWKS) including the public key derived from your signing key.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well known

This term here refers to being under /.well-known, and only certain paths are allowed, I believe.

The example below suggests you could put it under any other name e.g. '/.well-known/http-message-signatures-directory/foo or /.well-known/foo.

That's incorrect - users can only host their signature directory on /.well-known/http-message-signatures-directory. Our tooling will flag anything else as invalid.

I think we should simply say host it on /.well-known/http-message-signatures-directory

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note we require Content-Type: application/http-message-signatures-directory+json today in the response. We might be open to others in the future, but this is mandatory right now.

:::note[Use components with only ASCII values]
Cloudflare currently does not support `bs` or `sf` parameter designed to serialize non-ASCII values into ASCII equivalents.
:::
- Add a `Content-Digest` header if you wish to sign your [message content](https://www.rfc-editor.org/rfc/rfc9421#name-message-content), then specify `Content-Digest` as a component to sign.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note we don't actually validate their Content-Digest header 😢

What this means is: anyone can give us a random Content-Digest header and sign it. We don't actually check the Content-Digest represents the hash of the message body - we only check if the signature over that hash was valid. Anyone can slap a Content-Digest header on.

What this means is that there's no guarantee a Content-Digest came from the message it was signed on, and that makes it a security concern.

I think we should recommend people only use this option if there's no risk of a message being altered on the way to us - like if the message was proxied unencrypted to us.

Or we don't talk about Content-Digest at all. It's not something we have first class support for anyway.

I'll let you decide @Oxyjun , but I don't feel comfortable not calling out the caveats of our support if people want to do this.


Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-00.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:
- The message includes a `Signature-Agent` header that is not an `https://`.
- The message includes a valid URI but do not enclose it in double quotes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: does, not do


The following derived components are not supported, and we will fail to verify a message if they are included:

- `@query-params`: Cloudflare recommends signing the whole query instead of an individual parameter.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: signing the whole query using the @query component


### How do I know my JSON Web Key set directory will be accepted?

Cloudflare uses [`http-signature-directory` tool](https://crates.io/crates/http-signature-directory) to validate your directory. Please your this works before submitting a verification request.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo + suggestion: Please ensure this works against your directory before registering with us.

(submitting a verification request is ambiguous - does it refer to registration or to sending us a signed request?)


### My message is failing validation. What could be the cause?

- Ensure you have a [`Signature-Agent` header](/bots/concepts/bot/verified-bots/web-bot-auth/#signature-agent-header), and that its value in double-quotes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: value is in

Cloudflare accepts all valid Ed25519 keys found in your key directory. In the event a key already exists in Cloudflare's registered database, Cloudflare will work with you to supply a new key, or rotate your existing key.

:::note[Estimated review time]
The estimated review time is approximately one week.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should let them know how to track their verification process. Unfortunately, only way to do so is to ask support on the status and have them escalate to us. Should we mention this?


### What key algorithms does Cloudflare support?

Cloudflare does not support key algorithms other than Ed25519.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cloudflare does not support key algorithms other than Ed25519.
Cloudflare supports Ed25519 key algorithm.

be in the affirmative, not negative


---

### What `web-bot-auth` features from the spec are not supported?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### What `web-bot-auth` features from the spec are not supported?
### What `web-bot-auth` features from the IETF draft are not supported?

Comment on lines +14 to +37
## 1. Generate a valid signing key

You need to generate a signing key which will be used to authenticate your bot's requests.

{/* prettier-ignore */}
<Steps>
1. Generate a unique [Ed25519](https://ed25519.cr.yp.to/) private key to sign your requests. This example uses the [OpenSSL](https://openssl-library.org/) `genpkey` command:

```sh
openssl genpkey -algorithm ed25519 -out private-key.pem
```
2. Extract your public key.

```sh
openssl pkey -in private-key.pem -pubout -out public-key.pem
```
3. Convert the public key to JSON Web Key (JWK) using a tool of your choice. This example uses [`jwker`](https://github.com/jphastings/jwker) command line application.
```sh
go install github.com/jphastings/jwker/cmd/jwker@latest
jwker public-key.pem public-key.jwk
```
</Steps>

By following these steps, you have generated a private key and a public key, then converted the public key to a JWK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also point out to JavaScript key generation using WebCrypto API

https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/generateKey

this would be directly in the right JWK format

most of the existing JWK libraries or provider should be able to do that as well https://jwt.io/libraries


import { GlossaryTooltip, Steps } from "~/components"

Web Bot Auth is an authentication method that leverages cryptographic signatures in HTTP messages to verify that a request comes from an automated bot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Web Bot Auth is an authentication method that leverages cryptographic signatures in HTTP messages to verify that a request comes from an automated bot.
Web Bot Auth is an authentication method that leverages cryptographic signatures in HTTP messages to verify that a request comes from an automated bot.
It relies on two active IETF drafts: a [directory draft](https://datatracker.ietf.org/doc/html/draft-meunier-http-message-signatures-directory) allowing the crawler to share their public keys, and a [protocol draft](https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture) defining how these keys should be used to attach crawler's identity to HTTP requests.
This documentation goes over specific integration within Cloudflare.


## 2. Host a key directory

You need to host a key directory which creates a way for Cloudflare to authenticate your bot's requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the sentence to have bots as the actor rather than cloudflare.

In addition, clearly reference the IETF draft, given this is where the format is defined.

Suggested change
You need to host a key directory which creates a way for Cloudflare to authenticate your bot's requests.
You need to host a key directory which creates a way for your bot to authenticate its requests to Cloudflare.
This directory should follow the definition from the active IETF draft [draft-meunier-http-message-signatures-directory-01](https://datatracker.ietf.org/doc/html/draft-meunier-http-message-signatures-directory-01).


## 4. (After verification) Sign your requests

After your bot has been successfully verified, you need to sign your bot's requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly updating the wording.

clearly reference the revision of the IETF draft we support. this is going to be valuable in the future as the draft evolves

Suggested change
After your bot has been successfully verified, you need to sign your bot's requests.
After your bot has been successfully verified, your bot is ready to sign its requests. The signature protocol is defined in [draft-meunier-web-bot-auth-architecture-02](https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture-02)

| Required component parameter | Requirement |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tag` | This should be equal to `web-bot-auth`. |
| `alg` | This should be equal to `ed25519`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the draft suggest that using alg should be avoided. not prohibited, but I don't think we should have it in our docs @AkshatM

Suggested change
| `alg` | This should be equal to `ed25519`. |

Copy link

@AkshatM AkshatM Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will need to be changed after LDW. Today, we require this in the implementation, and things will not verify otherwise, so it needs to be in the docs right now. I'll raise a ticket.

Copy link

@AkshatM AkshatM Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires some pretty intense changes to the whole keyring concept in web-bot-auth crate, hence the slowness on my end eradicating the need for alg.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated web-bot-auth crate to not need alg anymore in v0.3.0.

However, this documentation should not be changed - we still need to update upstream dependencies to use web-bot-auth 3.0 and that's unlikely to happen until after LDW. Until then, customers will be forced to send alg.


### 4.2. Calculate the JWK thumbprint

[Calculate the base64 URL-encoded JWK thumbprint](https://www.rfc-editor.org/rfc/rfc8037.html#appendix-A.3) associated with your Ed25519 public key registered with Cloudflare.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Calculate the base64 URL-encoded JWK thumbprint](https://www.rfc-editor.org/rfc/rfc8037.html#appendix-A.3) associated with your Ed25519 public key registered with Cloudflare.
[Calculate the base64 URL-encoded JWK thumbprint](https://www.rfc-editor.org/rfc/rfc8037.html#appendix-A.3) from the public key you registered with Cloudflare.


#### `Signature-Agent` header

Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-00.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-00.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:
Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-01.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:


Construct a [`Signature-Agent` header](https://www.ietf.org/archive/id/draft-meunier-http-message-signatures-directory-00.html#name-header-field-definition) that points to your key directory. Note that Cloudflare will fail to verify a message if:
- The message includes a `Signature-Agent` header that is not an `https://`.
- The message includes a valid URI but do not enclose it in double quotes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The message includes a valid URI but do not enclose it in double quotes.
- The message includes a valid URI but does not enclose it in double quotes. This is due to Signature-Agent being a structured field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product:bots Related to Bots product size/m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants