Skip to content

Commit 8dbc2cc

Browse files
committed
remove svg
1 parent 67ec7c3 commit 8dbc2cc

File tree

2 files changed

+0
-2
lines changed

2 files changed

+0
-2
lines changed

optillm-sequence-diagram.svg

Lines changed: 0 additions & 1 deletion
This file was deleted.

test_results.json

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
"results": [
99
{
1010
"approach": "bon",
11-
"result": "Building a reinforcement learning (RL) model from scratch is quite complex, especially when using only NumPy. The following example illustrates a simple RL setup where we define a Q-learning algorithm to perform actions related to reciting text from a given position. This is a basic demonstration and may not be practically suited for intricate needs.\n\nWe'll simulate an environment where the agent needs to learn to recite a text based on the index provided by the user. The goal is to implement a simple state-action model without relying on external libraries.\n\nHere's an example of how you could structure this:\n\n```python\nimport numpy as np\n\nclass TextReciterEnv:\n def __init__(self, text):\n self.text = text\n self.state = 0 # The starting position in the text\n self.done = False\n\n def reset(self):\n self.state = 0\n self.done = False\n return self.state\n\n def step(self, action):\n if action < 0 or action >= len(self.text):\n raise ValueError(\"Action must be a valid index within the text.\")\n\n self.state = action\n reward = 1 if self.state == len(self.text) - 1 else 0 # Give reward for reaching the end\n self.done = (self.state == len(self.text) - 1) # End of episode when we reach the last character\n \n return self.state, reward, self.done\n\n def get_text(self):\n return self.text[self.state]\n\nclass QLearningAgent:\n def __init__(self, action_size, learning_rate=0.1, discount_factor=0.95, exploration_rate=1.0):\n self.q_table = np.zeros((len(text), action_size))\n self.learning_rate = learning_rate\n self.discount_factor = discount_factor\n self.exploration_rate = exploration_rate\n self.action_size = action_size\n\n def choose_action(self, state):\n if np.random.rand() < self.exploration_rate:\n return np.random.randint(0, self.action_size) # Explore\n else:\n return np.argmax(self.q_table[state]) # Exploit\n\n def learn(self, state, action, reward, next_state):\n best_next_action = np.argmax(self.q_table[next_state])\n td_target = reward + self.discount_factor * self.q_table[next_state][best_next_action]\n td_delta = td_target - self.q_table[state][action]\n self.q_table[state][action] += self.learning_rate * td_delta\n\n# Define the text and environment\ntext = \"Learning Reinforcement Learning is fun!\"\nenv = TextReciterEnv(text)\nagent = QLearningAgent(action_size=len(text))\n\n# Hyperparameters\nnum_episodes = 1000\nmax_steps = len(text) - 1\n\n# Training process\nfor episode in range(num_episodes):\n state = env.reset()\n \n for step in range(max_steps):\n action = agent.choose_action(state)\n next_state, reward, done = env.step(action)\n \n agent.learn(state, action, reward, next_state)\n \n if done:\n break\n \n state = next_state\n\n # Reduce exploration over time\n if agent.exploration_rate > 0.01:\n agent.exploration_rate *= 0.99\n\n# User interaction to recite the text\nwhile True:\n user_input = input(\"Enter a position to recite text (0 to {} or 'exit' to quit): \".format(len(text) - 1))\n \n if user_input.lower() == 'exit':\n break\n \n try:\n position = int(user_input)\n if 0 <= position < len(text):\n action = position # The user would like to recite from this position\n env.state = action # Set environment state directly for demonstration\n print(\"Reciting:\", env.get_text())\n else:\n print(\"Invalid position! Please enter a number between 0 and {}.\".format(len(text) - 1))\n except ValueError:\n print(\"Invalid input! Please enter a number or 'exit'.\")\n\n```\n\n### Explanation:\n1. **TextReciterEnv Class**: This class is responsible for simulating the environment in which the agent operates. It has methods to reset the environment and step through the states.\n2. **QLearningAgent Class**: This class implements a simple Q-learning agent that learns optimal actions based on the rewards received for taking actions from given states.\n3. **Training Loop**: The loop manages training where the agent explores the environment and updates its Q-values based on the actions taken.\n4. **User Interaction**: After training, users can input a text position, and the program will display the character at that position.\n\n### Note:\nThis implementation is quite simplistic and serves as an educational demonstration only. For actual reinforcement learning applications, consider using frameworks like TensorFlow or PyTorch, which provide a more comprehensive set of tools for building and training RL models.",
1211
"time": 14.10278606414795,
1312
"status": "success"
1413
},

0 commit comments

Comments
 (0)