Skip to content

Conversation

@XuehaoSun
Copy link
Contributor

Type of Change

fix azure UT Basic artifact error

Description

fix azure UT Basic artifact error:
set condition: succeededOrFailed()
another fix

Expected Behavior & Potential Risk

generate report when coverage failed

How has this PR been tested?

UT-Basic succeed,
generate report when coverage failed

Dependency Change?

no

Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
@XuehaoSun
Copy link
Contributor Author

XuehaoSun commented Dec 9, 2022

@ftian1
Copy link
Contributor

ftian1 commented Dec 9, 2022

don't include your name in PR title

@XuehaoSun XuehaoSun changed the title Xuehao/azure ut basic fix azure ut basic fix Dec 9, 2022
Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
@chensuyue chensuyue merged commit c61be34 into master Dec 9, 2022
@chensuyue chensuyue deleted the xuehao/azure-UT-Basic-fix branch December 9, 2022 14:26
zehao-intel pushed a commit that referenced this pull request Dec 20, 2022
Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: zehao-intel <[email protected]>
yiliu30 pushed a commit that referenced this pull request Oct 27, 2025
In the input, we use zero tokens for padding. After the linear layer, we set the corresponding positions (from the padding) to -inf, so that the softmax outputs values close to epsilon.

When using FSDPA optimization, to improve performance, we avoid copying the -inf values to the softmax and instead set them directly to zero. As a result, the softmax output becomes exactly zero (as opposed to a small epsilon value without the FSDP optimization).

When computing the dynamic scale for the out_proj, this leads to a division by zero issue.

The fix we're implementing is to use max(epsilon, scale) during scale calc.

This fix aligns non-CGUID code to act the same as the CGUID flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants