Thread: The A.I. Thread
View Single Post
Old 01-27-2025, 01:33 PM   #533
Fuzz
Franchise Player
 
Fuzz's Avatar
 
Join Date: Mar 2015
Location: Pickle Jar Lake
Exp:
Default

Quote:
Originally Posted by Shazam View Post
There is nothing novel about DS.

It is cheaper because there is no fine tuning done by humans.

https://www.seangoedecke.com/deepseek-r1
Quote:
In short, this is a reinforcement learning approach, not a fine-tuning approach. There’s no need to generate a huge body of chain-of-thought data ahead of time, and there’s no need to run an expensive answer-checking model. Instead, the model generates its own chains-of-thought as it goes2. There are other points made in the DeepSeek-R1 paper, but I think this is by far the most important.


Quote:
Addendum: this is a relatively straightforward approach that others must have thought of. Why did it happen now and not a year ago? The most compelling answer is probably this: open-source base models had to get good enough at reasoning that they could be RL-ed into becoming reasoning models. It’s plausible that a year ago that wasn’t the case. A less compelling answer: the quality of reasoning-based benchmarks is much higher now than it was. For this approach to work, you need to be able to feed the model a ton of problems that require reasoning to solve (otherwise it’ll jump straight to the solution). Maybe those problems have only recently become available.

Thanks for the link. Sounds like it is a bit novel, as they used a different feedback mechanism that doesn't appear to have caused any major issues. This will save a lot of training/money and I'd expect it to get refined further.
Fuzz is online now   Reply With Quote