DeepSeek-R1 uses reinforcement learning to train the model without supervised fine-tuning.
DeepSeek-R1-Zero, the initial model trained without SFT, has some limitations, such as poor readability and language mixing. To address these issues, the authors introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI o1 on reasoning tasks.
https://arxiv.org/pdf/2501.12948
#deepseek #MachineLearning
DeepSeek-R1-Zero, the initial model trained without SFT, has some limitations, such as poor readability and language mixing. To address these issues, the authors introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI o1 on reasoning tasks.
https://arxiv.org/pdf/2501.12948
#deepseek #MachineLearning
Heretical_i •
"Donald Trump’s AI tsar has claimed there’s ‘substantial evidence’ that DeepSeek leaned on OpenAI’s models to develop its own technology." https://www.scmp.com/news/world/united-states-canada/article/3296667/microsoft-openai-investigate-chinas-deepseek-over-data-breach
At first I thought it said, perhaps correctly, 'learned on'😎
Re #TechFascists... a logical political progression from US #Libertarians, which describes most techies for the last couple of decades, or more. https://kafeneio.social/@heretical_i/113897993841616105
Microsoft, OpenAI investigate China’s DeepSeek over data breach
Bloomberg (South China Morning Post)Yogthos •
Heretical_i •
Yogthos •