Thread: The A.I. Thread
View Single Post
Old 01-28-2025, 01:08 PM   #543
Fuzz
Franchise Player
 
Fuzz's Avatar
 
Join Date: Mar 2015
Location: Pickle Jar Lake
Exp:
Default

That would obviously be bad, but it sounds like it's all handled with code.




Quote:
  1. Start with a smart normal model, like DeepSeek-V3, and perform the following reinforcement-learning loop
  2. Ask that model to solve a mathematical problem, with a prompt that pushes it to think step-by-step
  3. Verify the answer in code (i.e. not with a model, but by directly parsing the answer and checking it)
  4. If correct, reward the model; if wrong, punish the model
  5. Repeat for a long time
The asking the model part is probably more manual, as they'd need to create a list of problems, though I suspect a lot of this is done and grabbed from elsewhere.
Fuzz is offline   Reply With Quote