diff --git a/README.md b/README.md index b8b0464..be9488a 100644 --- a/README.md +++ b/README.md @@ -287,23 +287,9 @@ After a developer has a `Translator` or `LanguageDetector` object, further calls This design means that the implementation must have all information about the capabilities of its translation and language detection models available beforehand, i.e. "shipped with the browser". (Either as part of the browser binary, or through some out-of-band update mechanism that eagerly pushes updates.) -## Privacy considerations +## Privacy and security considerations -This proposal as-is has privacy issues, which we are actively thinking about how to address. They are all centered around how sites that use this API might be able to uniquely fingerprint the user. - -The most obvious identifier in the current API design is the list of supported languages, and especially their availability status (`"unavailable"`, `"downloadable"`, `"downloading"`, and `"available"`). For example, as of the time of this writing [Firefox supports 9 languages](https://www.mozilla.org/firefox/features/translate/), which can each be [independently downloaded](https://support.mozilla.org/kb/website-translation#w_configure-installed-languages). With a naive implementation, this gives 9 bits of identifying information, which various sites can all correlate. - -Some sort of mitigation may be necessary here. We believe this is adjacent to other areas that have seen similar mitigation, such as the [Local Font Access API](https://github.com/WICG/local-font-access/blob/main/README.md). Possible techniques are: - -* Grouping language packs to reduce the number of bits, so that downloading one language also downloads others in its group. -* Partitioning download status by top-level site, introducing a fake download (which takes time but does not actually download anything) for the second-onward site to download a language pack. -* Only exposing a fixed set of languages to this API, e.g. based on the user's locale or the document's main language. - -As a first step, we require that detecting the availability of translation/detection be done via individual calls to `Translator.availability()` and `LanguageDetector.availability()`. This allows browsers to implement possible mitigation techniques, such as detecting excessive calls to these methods and starting to return `"unavailable"`. - -Another way in which this API might enhance the web's fingerprinting surface is if translation and language detection models are updated separately from browser versions. In that case, differing results from different versions of the model provide additional fingerprinting bits beyond those already provided by the browser's major version number. Mandating that older browser versions not receive updates or be able to download models from too far into the future might be a possible remediation for this. - -Finally, we intend to prohibit (in the specification) any use of user-specific information in producing the results. For example, it would not be permissible to fine-tune the translation model based on information the user has entered into the browser in the past. +Please see [the Writing Assistance APIs specification](https://webmachinelearning.github.io/writing-assistance-apis/#privacy), where we have centralized the normative privacy and security considerations that apply to these APIs as well as the writing assistance APIs. ### Permissions policy, iframes, and workers diff --git a/index.bs b/index.bs index 716a377..0bddd95 100644 --- a/index.bs +++ b/index.bs @@ -46,7 +46,7 @@ This specification depends on the Infra Standard. [[!INFRA]] As with the rest of the web platform, human languages are identified in these APIs by BCP 47 language tags, such as "`ja`", "`en-US`", "`sr-Cyrl`", or "`de-CH-1901-x-phonebk-extended`". The specific algorithms used for validation, canonicalization, and language tag matching are those from the ECMAScript Internationalization API Specification, which in turn defers some of its processing to Unicode Locale Data Markup Language (LDML). [[BCP47]] [[!ECMA-402]] [[UTS35]]. -These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]]. Implementing these APIs requires implementing that shared infrastructure, but does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]] +These APIs are part of a family of APIs expected to be powered by machine learning models, which share common API surface idioms and specification patterns. Currently, the specification text for these shared parts lives in [[WRITING-ASSISTANCE-APIS#supporting]], and the common privacy and security considerations are discussed in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Implementing these APIs requires implementing that shared infrastructure, and conforming to those privacy and security considerations. But it does not require implementing or exposing the actual writing assistance APIs. [[!WRITING-ASSISTANCE-APIS]]
All other scenarios, or if the user agent would prefer not to disclose the failure reason. +
All other scenarios, including if the user agent believes it cannot translate and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
This table does not give the complete list of exceptions that can be surfaced by the translator API. It only contains those which can come from certain [=implementation-defined=] steps. @@ -584,7 +586,7 @@ dictionary LanguageDetectionResult { 1. [=Assert=]: this algorithm is running [=in parallel=]. - 1. If there is some error attempting to determine what languages the user agent supports detecting, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null. + 1. If there is some error attempting to determine what language detection capabilities the user agent [=model availability/can support=], which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null. 1. Let |partition| be the result of [=getting the language availabilities partition=] given the purpose of detecting text written in that language. @@ -710,6 +712,8 @@ The inputQuota getter steps are to r If an error occurred during language detection, then return an [=error information=] according to the guidance in [[#language-detector-errors]]. + The detection process must conform to the guidance given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]], notably including (but not limited to) [[WRITING-ASSISTANCE-APIS#privacy-user-input]] and [[WRITING-ASSISTANCE-APIS#security-runtime]]. + 1. [=map/Sort in descending order=] |rawResult| with a less than algorithm which given [=map/entries=] |a| and |b|, returns true if |a|'s [=map/value=] is less than |b|'s [=map/value=]. 1. Let |results| be an empty [=list=]. @@ -784,7 +788,7 @@ When language detection fails, the following possible reasons may be surfaced to
All other scenarios, or if the user agent would prefer not to disclose the failure reason. +
All other scenarios, including if the user agent believes it cannot detect and also meet the requirements given in [[WRITING-ASSISTANCE-APIS#privacy]] and [[WRITING-ASSISTANCE-APIS#security]]. Or, if the user agent would prefer not to disclose the failure reason.
This table does not give the complete list of exceptions that can be surfaced by the language detector API. It only contains those which can come from certain [=implementation-defined=] steps. @@ -792,3 +796,11 @@ When language detection fails, the following possible reasons may be surfaced to
[=default allowlist/'self'=].
+
+