You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
RLHF-从基础到Claude中的应用探索
RLHF-从基础到Claude中的应用探索 强化学习基础知识:马尔可夫决策过程(Markov Decision Processes,MDPs) 例行回顾,但不是核心内容 MDPs 简单说就是一个智能体(Agent)采取行动(Action)从而改变自己的状态(State)获得奖励(Reward)与环境(Environment)发生交互的循环过程。 MDP 的策略完全取决于当前状态(Only pre
https://www.zwn2001.space/posts/Graduate-Works/RL/RLHF-%E4%BB%8E%E5%9F%BA%E7%A1%80%E5%88%B0Claude%E4%B8%AD%E7%9A%84%E5%BA%94%E7%94%A8%E6%8E%A2%E7%B4%A2/
Beta Was this translation helpful? Give feedback.
All reactions