-
Notifications
You must be signed in to change notification settings - Fork 66
Add LanguageReward for training models to think in target language #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
casteryh
wants to merge
40
commits into
main
Choose a base branch
from
language-reward-feature
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
033fa60
Add LanguageReward for training models to think in target language
casteryh b12ed15
Update system prompt to instruct model to think in Japanese
casteryh b15f171
Add fallback reward for correct language without thinking blocks
casteryh a2c0237
Add debug logging and troubleshooting guide for LanguageReward
casteryh afca75c
Add debug printing to LanguageReward and strengthen system prompt
casteryh 2625b28
Refactor to use configurable Japanese tags <思考> instead of English <t…
casteryh 1a4d5fb
Remove old debug code from main.py
casteryh 4e87a4d
Weaken system prompt to rely more on RL rewards
casteryh abb653e
Remove sandbox config and reference apps/grpo configs instead
casteryh 7b4829c
Simplify LanguageReward logic to focus on language detection only
casteryh 0ed798c
Add langid to dev dependencies for CI
casteryh 5a3193e
Remove debug script
casteryh 93a65b2
Clarify why English training won't work in TROUBLESHOOTING
casteryh f72be7f
Add unit test for ThinkingReward custom tag
casteryh 6186f9f
Bump LanguageReward match_reward to 2.0
casteryh c640d37
Set KL divergence coefficient to zero in loss function
casteryh 7fde86d
Change KL divergence coefficient to 1e-3
casteryh ffb6c43
Change KL divergence coefficient to 1e-4
casteryh 7ffa20e
Enable multi-epoch training in sandbox/grpo_language app
casteryh 1bf3cca
Fix recursive endpoint call - use while loop instead
casteryh 7758b48
Simplify multi-epoch fix - use return next() instead of while loop
casteryh f71dbb6
fix
casteryh 735af9a
change logging
casteryh ef39e46
git mv
casteryh 2e2981a
Update src/forge/data/rewards.py
casteryh 869c76d
remove README.md & TROUBLESHOOTING.md
casteryh 5f6a788
truncate at 1000 instead
casteryh 04182d4
merge into main app & fix test
casteryh dc67ea2
merge into main app
casteryh d02b866
Merge branch 'main' into language-reward-feature
casteryh 866dff7
fix endpoint
casteryh 20d8644
Merge branch 'main' into language-reward-feature
casteryh ddefe22
update config for 1.7b
casteryh 22891bf
remove unnecessary tests
casteryh bf3dbf0
simplify sampling, add todo
casteryh 4318d83
increase context len, decrease bsz for 8b (working)
casteryh 2cc6c6b
kl coeff = 1e-5
casteryh e7d1bfc
fix rewards
casteryh 5d051bd
fix debug samples
casteryh 71cf182
fix test
casteryh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -47,6 +47,7 @@ dev = [ | |
| "anyio", | ||
| "pytest-asyncio", | ||
| "multiprocess", | ||
| "langid", | ||
| ] | ||
| docs = [ | ||
| "sphinx==7.2.6", | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to surface import error that late into the program? If this is not a "default" app we want everyone to install, is there better option to manage the module dependencies? I think we'll get to this point soon to support different training backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know a better place to check import error other than the initializer.
One option is to add this as an optional dependency in
pyproject.toml. But I am not sure if we should do this in this particular case.In this case, I think optional dependency is the way to go. So the use can run
pip install forge[huggingface]to enable the hugging face trainer for example.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @joecummings what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the rationale -- is it possible to do something like
pip install forge[grpo]for this dependency? We'll need to turn the app into integration tests that run on CI and we need a way to specify the dependency statically.