|
| 1 | +.. currentmodule:: torchrl |
| 2 | + |
| 3 | +LLM interface |
| 4 | +============= |
| 5 | + |
| 6 | +.. _ref_llms: |
| 7 | + |
| 8 | +TorchRL offers a set of tools for LLM post-training, as well as some examples for training or setup. |
| 9 | + |
| 10 | +Collectors |
| 11 | +---------- |
| 12 | + |
| 13 | +TorchRL offers a specialized collector class (:class:`~torchrl.collectors.llm.LLMCollector`) that is tailored for LLM |
| 14 | +use cases. We also provide dedicated updaters for some inference engines. |
| 15 | + |
| 16 | +.. currentmodule:: torchrl.collectors.llm |
| 17 | + |
| 18 | +.. autosummary:: |
| 19 | + :toctree: generated/ |
| 20 | + :template: rl_template.rst |
| 21 | + |
| 22 | + vLLMUpdater |
| 23 | + LLMCollector |
| 24 | + |
| 25 | + |
| 26 | +Data structures |
| 27 | +--------------- |
| 28 | + |
| 29 | +To handle text-based data structures (such as conversations etc.), we offer a few data structures dedicated to carrying |
| 30 | +data for LLM post-training. |
| 31 | + |
| 32 | +.. currentmodule:: torchrl.data.llm |
| 33 | + |
| 34 | +.. autosummary:: |
| 35 | + :toctree: generated/ |
| 36 | + :template: rl_template.rst |
| 37 | + |
| 38 | + History |
| 39 | + LLMData |
| 40 | + |
| 41 | +Environments |
| 42 | +------------ |
| 43 | + |
| 44 | +When fine-tuning an LLM using TorchRL, the environment is a crucial component of the inference pipeline, alongside the |
| 45 | +policy and collector. Environments manage operations that are not handled by the LLM itself, such as interacting with |
| 46 | +tools, loading prompts from datasets, computing rewards (when necessary), and formatting data. |
| 47 | + |
| 48 | +The design of environments in TorchRL allows for flexibility and modularity. By framing tasks as environments, users can |
| 49 | +easily extend or modify existing environments using transforms. This approach enables the isolation of individual |
| 50 | +components within specific :class:`~torchrl.envs.EnvBase` or :class:`~torchrl.envs.Transform` subclasses, making it |
| 51 | +simpler to augment or alter the environment logic. |
| 52 | + |
| 53 | +Available Environment Classes and Utilities |
| 54 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 55 | + |
| 56 | +TorchRL provides various environment classes and utilities for working with LLMs, including: |
| 57 | + |
| 58 | +- Various environment classes (:class:`~torchrl.envs.llm.ChatEnv`, :class:`~torchrl.envs.llm.DatasetChatEnv`, |
| 59 | + :class:`~torchrl.envs.llm.GSM8KEnv`, etc.) |
| 60 | +- Utility functions (:class:`~torchrl.envs.make_gsm8k_env`, :class:`~torchrl.envs.make_mlgym`, etc.) |
| 61 | +- Transforms and other supporting classes (:class:`~torchrl.envs.KLRewardTransform`, |
| 62 | + :class:`~torchrl.envs.TemplateTransform`, :class:`~torchrl.envs.Tokenizer`, etc.) |
| 63 | + |
| 64 | +These components can be used to create customized environments tailored to specific use cases and requirements. |
| 65 | + |
| 66 | +.. currentmodule:: torchrl.envs.llm |
| 67 | + |
| 68 | +.. autosummary:: |
| 69 | + :toctree: generated/ |
| 70 | + :template: rl_template.rst |
| 71 | + |
| 72 | + ChatEnv |
| 73 | + DatasetChatEnv |
| 74 | + GSM8KEnv |
| 75 | + make_gsm8k_env |
| 76 | + GSM8KPrepareQuestion |
| 77 | + GSM8KEnv |
| 78 | + IFEvalEnv |
| 79 | + IfEvalScorer |
| 80 | + IFEvalScoreData |
| 81 | + LLMEnv |
| 82 | + LLMHashingEnv |
| 83 | + make_mlgym |
| 84 | + MLGymWrapper |
| 85 | + GSM8KRewardParser |
| 86 | + IfEvalScorer |
| 87 | + as_nested_tensor |
| 88 | + as_padded_tensor |
| 89 | + DataLoadingPrimer |
| 90 | + KLRewardTransform |
| 91 | + TemplateTransform |
| 92 | + Tokenizer |
| 93 | + |
| 94 | +Modules |
| 95 | +------- |
| 96 | + |
| 97 | +The :ref:`~torchrl.modules.llm` section provides a set of wrappers and utility functions for popular training and |
| 98 | +inference backends. The main goal of these primitives is to: |
| 99 | + |
| 100 | +- Unify the input / output data format across training and inference pipelines; |
| 101 | +- Unify the input / output data format across backends (to be able to use different backends across losses and |
| 102 | + collectors, for instance) |
| 103 | +- Give appropriate tooling to construct these objects in typical RL settings (resource allocation, async execution, |
| 104 | + weight update, etc.) |
| 105 | + |
| 106 | +Wrappers |
| 107 | +~~~~~~~~ |
| 108 | + |
| 109 | +.. currentmodule:: torchrl.modules.llm |
| 110 | + |
| 111 | +.. autosummary:: |
| 112 | + :toctree: generated/ |
| 113 | + :template: rl_template.rst |
| 114 | + |
| 115 | + TransformersWrapper |
| 116 | + vLLMWrapper |
| 117 | + |
| 118 | +Utils |
| 119 | +~~~~~ |
| 120 | + |
| 121 | +.. currentmodule:: torchrl.modules.llm |
| 122 | + |
| 123 | +.. autosummary:: |
| 124 | + :toctree: generated/ |
| 125 | + :template: rl_template.rst |
| 126 | + |
| 127 | + CategoricalSequential |
| 128 | + LLMOnDevice |
| 129 | + make_vllm_worker |
| 130 | + stateless_init_process_group |
| 131 | + vLLMWorker |
| 132 | + |
| 133 | +Objectives |
| 134 | +---------- |
| 135 | + |
| 136 | +LLM post training require some appropriate versions of the losses implemented in TorchRL. |
| 137 | + |
| 138 | +GRPO |
| 139 | +~~~~ |
| 140 | + |
| 141 | +.. currentmodule:: torchrl.objectives.llm |
| 142 | + |
| 143 | +.. autosummary:: |
| 144 | + :toctree: generated/ |
| 145 | + :template: rl_template.rst |
| 146 | + |
| 147 | + GRPOLoss |
| 148 | + GRPOLossOutput |
| 149 | + MCAdvantage |
0 commit comments