Skip to content

Conversation

CYHSM
Copy link
Contributor

@CYHSM CYHSM commented Aug 18, 2025

What does this PR do?

This PR makes the inference look a bit nicer and also adds:

  • A loop so the user can test more prompts after running the first one
  • An option to specify a temperature or several (comma separated,e.g. 0, 0.4, 0.8) which then evaluates the prompt for each temperature
  • An optional system prompt which can be read from a txt file and is specified in the yaml file:
text_inference_component:
  component_key: inference_component
  variant_key: text
  config:
    device: ${settings.device}
    model:
      instance_key: checkpointed_model
      pass_type: BY_REFERENCE
    tokenizer:
      component_key: tokenizer
      variant_key: pretrained_sp_tokenizer
      config:
        tokenizer_model_file: /raid/s3/opengptx/mfrey/3.73T-Tokens/tokenizer/eurolingua_tokenizer.model
    sequence_length: ${settings.sequence_length}
    eod_token: <|endoftext|>
    prompt_template: "{prompt_input}" # "<instruction> Du bist Moody, ein LLM welches Menschen helfen soll. user: {prompt_input}"
    system_prompt_path: "/home/markus_frey/Github/modalities/tutorials/instruct_teuken/configs/system_prompt.txt"
    chat_template: "System:\n{system_prompt}\nUser:{user_prompt}\nAssistant:\n"
    temperature: 1

I have found no inference tests so not sure if there are none or I am just not seeing them. If someone points me towards an existing test I can create one for this PR or create a full inference one

Breaking Changes

  • System prompt is configured as optional so there should be no breaking changes

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
    Not all tests run through but the errors are seemingly unrelated to this change (sh scripts/run_checkpoint_conversion.sh)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

@behzadshomali
Copy link

Thanks for your commit, it was really helpful. As a quick-to-implement suggestion, in the run() function, you can add another try/except statements which enables the user to interrupt the model's generation while still running the code (it becomes handy when the sequence_length has been set to a higher value and the models goes off track, so you don't need to wait for it to finish generating non-sense text):

def run(self):
        print("\n" + "🚀 Modalities Chat Interface ".center(60, "="))
        print("=" * 60)

        while True:
            try:
                user_prompt = self._get_prompt(self.prompt_template)
                full_prompt = self.chat_template.format(system_prompt=self.system_prompt, user_prompt=user_prompt)

                temp_input = input("\n🌡️  Enter temperatures (comma-separated) or press Enter for default [0.8]: ")

                if not temp_input.strip():
                    temperatures = [0.8]
                    print("Using default temperature: 0.8")
                else:
                    try:
                        temperatures = [float(t.strip()) for t in temp_input.split(",")]
                        if not temperatures:
                            raise ValueError("No temperatures provided.")
                    except ValueError:
                        print("\n❌ Invalid input. Please enter comma-separated numbers or press Enter for default.\n")
                        continue

                for i, temp in enumerate(temperatures):
                    if len(temperatures) > 1:
                        print(f"\n\n{'🎯 GENERATION ' + str(i+1) + f' (Temperature: {temp})'.center(60, '=')}")
                    else:
                        print(f"\n\n{'🎯 GENERATING (Temperature: ' + str(temp) + ')'.center(60, '=')}")
                    try:
                        self.temperature = temp
                        self.generate_tokens(context=full_prompt)
                    except:
                        continue

                print("\n\n" + "🏁 ALL GENERATIONS COMPLETE".center(60, "="))
                print("=" * 60)
            except KeyboardInterrupt:
                print("\n\n👋 Closing app... Goodbye!")
                break

@CYHSM
Copy link
Contributor Author

CYHSM commented Sep 1, 2025

Thanks @behzadshomali, I integrated your suggestion now and also added tests for general inference:

  • Checks if greedy decoding produces the same output
  • Checks different temperatures produce different outputs
  • Checks new run method with and without system prompt

@rrutmann this is ready from my side

@rrutmann rrutmann self-requested a review September 4, 2025 12:25
Copy link
Collaborator

@rrutmann rrutmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance the code looks good. I want to run it locally to check if everything works. Could you please add an inference config using the newly added variable system_prompt_path as well as an example file for such a system prompt to the repo?I would suggest to put it into config_files/text_inference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants