Skip to content

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Mar 5, 2025

LiteASR is a compression scheme for automatic speech recognition (ASR) models that leverages the low-rank properties of activation values. Our method can compress OpenAI's Whisper encoder by up to ~50%.

Supported models: https://huggingface.co/models?library=transformers.js&other=lite-whisper&sort=trending


Example usage:

import { pipeline } from "@huggingface/transformers";

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "onnx-community/lite-whisper-large-v3-turbo-acc-ONNX",
  { dtype: { encoder_model: "fp32", decoder_model_merged: "q4" } },
);

const audio = await read_audio(
  "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav",
  transcriber.processor.feature_extractor.config.sampling_rate,
);

const output = await transcriber(audio);
console.log(output);
// { text: ' And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.' }

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova merged commit 31dfd43 into main Mar 6, 2025
4 checks passed
@xenova xenova deleted the add-lite-whisper branch March 6, 2025 11:27
@decoder-sh-david
Copy link

@xenova Which browser did you test this in? I'm unable to get your code sample working in Chrome. While trying to load the model, the pipeline throws an error that's just a number 4273819248.

@xenova
Copy link
Collaborator Author

xenova commented Mar 29, 2025

@decoder-sh-david Be sure to specify device: "webgpu" to run in-browser with the current configuration.

Alternatively, you can set dtype: "q8" if you'd like to run on CPU.

The above sample code was run with Node.js (CPU)

@decoder-sh-david
Copy link

@decoder-sh-david Be sure to specify device: "webgpu" to run in-browser with the current configuration.

Alternatively, you can set dtype: "q8" if you'd like to run on CPU.

The above sample code was run with Node.js (CPU)

That was the missing piece, thank you! Do you have any guidance on which quantizations work? I recall that for some onnx versions of whisper, some just don't work

@decoder-sh-david
Copy link

Additionally, I don't suppose that this model supports word-level timestamps, does it?

@decoder-sh-david
Copy link

@xenova does this PR also support converting this model type using the script?

I'm trying to convert it to to include timestamps in the onnx export, but I keep running into the error ValueError: Unrecognized configuration class <class 'transformers_modules.efficient-speech.lite-whisper-large-v3-turbo.6697ac2a887e3256da5defc9e8472f76a2b0f16e.configuration_lite_whisper.LiteWhisperConfig'> to build an AutoTokenizer. which indicates that transformers python may not support the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants