Qwen3 Reranker CUDA Fix in FlashQwen3.rs #1

sigridjineth · 2025-08-16T13:30:34Z

Problem

When running Qwen3-Reranker on CUDA with flash attention, all documents get nearly identical scores (~0.891), while on MPS they get properly differentiated scores.

Root Cause

Two issues were found in backends/candle/src/models/flash_qwen3.rs:

Wrong pooling method: The model was using Pool::Cls instead of Pool::LastToken for classification models (line 313)
Incorrect index selection: When selecting pooled embeddings, the code was incorrectly using batch indices to select from token indices

Fixes Applied

Fix 1: Change pooling method (line 313)

// Before:
(Pool::Cls, classification_head)
// After:
(Pool::LastToken, classification_head)

Fix 2: Correct index selection (lines 405-435)

The pooling logic was incorrectly handling indices when both pooled and raw requests exist. Fixed to properly select the last token indices for the requested sequences.

sigridjineth · 2025-08-16T13:31:24Z

ata '{
    "query": "털사 대학교에서 2003년부터 2006년까지 감독을 맡았던 사람이 누구야?",
    "texts": [
        "존 맥널티(1968년 5월 29일 출생)는 미국 프로 미식축구 리그(NFL) 로스앤젤레스 차저스의 타이트 엔드 코치인 미식축구 코치입니다. 그는 1990년 펜실베이니아 대학교를 졸업했습니다. 2012년까지 애리조나 카디널스에서 코치로 활동했으며, 2009년 초 와이드 리시버 코치로 시작하여 2012년 켄 위젠헌트 감독이 경질 될 때까지 쿼터백 코치로 활동했습니다. 이후 2013년에는 그렉 샤이아노 감독이 경질될 때까지 탬파베이 버커니어스의 쿼터백 코치로 활동했습니다. 그리고 2014년에는 테네시 타이탄스의 쿼터백  코치로 합류하여 마이크 뮬라키 감독이 켄 위젠헌트 감독을 경질 한 2015년까지 그곳에서 일했습니다. 현재 그는 로스앤젤레스 차 저스의 타이트 엔드 코치로 일하고 있습니다.",
        "브라이언 쇼튼하이머(1973년 10월 16일 출생)는 미국 프로 미식축구 리그(NFL) 인디애나폴리스 콜츠의 쿼터백 코치인 미 국 미식축구 코치입니다. 그는 이전에 NFL의 워싱턴 레드스킨스와 샌디에이고 차저스에서 쿼터백 코치로 활약했으며, NFL의 뉴욕  제츠, 세인트루이스 램스, 조지아 대학교의 조지아 불독스 미식축구 팀의 공격 코디네이터를 역임했습니다. 그의 아버지인 마티 쇼튼하이머는 전 캔자스시티 치프스 감독이었고 그의 삼촌인 커트  }'  "truncate": false렀고 콘퍼런스 USA 서부 지구에서 경쟁했습
[{"index":5,"score":0.95565146},{"index":8,"score":0.0001313518},{"index":3,"score":0.00011147904},{"index":9,"score":0.00005967352},{"index":6,"score":0.00003705297},{"index":7,"score":0.0000059551394},{"index":4,"score":0.000005338156},{"index":0,"score":0.000005255396},{"index":2,"score":0.00000283508},{"index":1,"score":0.0000013081766}

fix: wrong pooling method for CUDA

79d8da9

sigridjineth changed the title ~~Qwen3 Reranker CUDA Fix~~ Qwen3 Reranker CUDA Fix in FlashQwen3.rs Aug 16, 2025

sigridjineth mentioned this pull request Aug 16, 2025

[Feature] Add support for Qwen3 Reranker with Sequence Classifier head huggingface/text-embeddings-inference#698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 Reranker CUDA Fix in FlashQwen3.rs #1

Qwen3 Reranker CUDA Fix in FlashQwen3.rs #1

Uh oh!

sigridjineth commented Aug 16, 2025 •

edited

Loading

Uh oh!

sigridjineth commented Aug 16, 2025

Uh oh!

Uh oh!

Qwen3 Reranker CUDA Fix in FlashQwen3.rs #1

Are you sure you want to change the base?

Qwen3 Reranker CUDA Fix in FlashQwen3.rs #1

Uh oh!

Conversation

sigridjineth commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fixes Applied

Fix 1: Change pooling method (line 313)

Fix 2: Correct index selection (lines 405-435)

Uh oh!

sigridjineth commented Aug 16, 2025

Uh oh!

Uh oh!

sigridjineth commented Aug 16, 2025 •

edited

Loading