New principle: Sourcing models and compute for inference #551

martinthomson · 2025-02-06T07:29:03Z

This was brought up in our discussion of w3ctag/design-reviews#991 and w3ctag/design-reviews#1038 and translations and a few other new APIs.

The common theme here is that these capabilities tend to rely on the availability of large or relatively large ML models. There are essentially three states for availability:

The model is available locally.
The model is available, but it needs to be downloaded.
The model is available on the cloud somewhere.

Then there is the question of what responsibility users have for providing compute resources for inference. Historically, we haven't really done a lot to limit access to bandwidth and compute resources on user agents, but some newer ML applications are expensive enough to maybe make us reconsider that.

How these APIs might reasonably manage the potential choices that user agents might make with respect to these things is worth some discussion, to see if we can arrive at some reasonable advice.

martinthomson · 2025-02-06T07:34:27Z

I found these comments from Mozilla to be constructive (full disclosure: I contributed to these).

There were also some useful insights in the discussion we've had so far about speech recognition. It was observed that it would be reasonable to have a user's own browser recognize their speech, but perhaps unreasonable (and inefficient) to have them recognize speech that comes from elsewhere. That limits the number of languages that a browser might need to source for that particular use case.

jyasskin · 2025-02-11T01:15:03Z

@tomayac's https://github.com/tomayac/cross-origin-storage is an early, related proposal. In some ways, it's the low-level proposal matching the high level "let the browser download the model" proposals.

martinthomson · 2025-02-11T01:56:02Z

In both cases, we have browsers asking people questions that have hidden consequences. That the question is being asked implies that there are consequences, but those consequences are not at all obvious. The question that is asked there is banal enough to trigger some people to think "what are they really saying?" Given that we can't expect people to engage in that sort of higher-level game-theory stuff, I would have to say that this ends up being a dark pattern. I would only ask a question like that if my purpose was to deceive and gain "consent".

That hiding of consequences is something that I see similar things fail at as well. For example, people might understand that letting a site do WebUSB things to a USB device has consequences for the device, but they won't realize that it has implications for their entire computer.

Personally, to be convinced of an approach like that, I'd need to be shown something that conveys an intuitive understanding of the risk.

jyasskin mentioned this issue Feb 11, 2025

On-device Web Speech API w3ctag/design-reviews#1038

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New principle: Sourcing models and compute for inference #551

New principle: Sourcing models and compute for inference #551

martinthomson commented Feb 6, 2025

martinthomson commented Feb 6, 2025

jyasskin commented Feb 11, 2025

martinthomson commented Feb 11, 2025

New principle: Sourcing models and compute for inference #551

New principle: Sourcing models and compute for inference #551

Comments

martinthomson commented Feb 6, 2025

martinthomson commented Feb 6, 2025

jyasskin commented Feb 11, 2025

martinthomson commented Feb 11, 2025