DeepSeek R1 LLM
(Redirected from DeepSeek R1)
Jump to navigation
Jump to search
A DeepSeek R1 LLM is a DeepSeek reasoning LLM (designed to perform automated-reasoning tasks through mixture-of-experts architecture).
- Context:
- It can (typically) perform mathematical reasoning through step-by-step solution processing.
- It can (typically) handle coding challenges through algorithmic problem solving.
- It can (typically) process long documents through extended context window.
- It can (typically) optimize computational resources through sparse activation.
- It can (typically) maintain model efficiency through parameter sharing.
- ...
- It can (often) support multilingual processing through language understanding.
- It can (often) enable local deployment through distilled versions.
- It can (often) provide cost advantages through token pricing.
- It can (often) allow commercial use through MIT license.
- It can (often) handle heavy workloads with some performance limitations.
- ...
- It can range from being a Base R1 Model to being a Distilled R1 Model, depending on its parameter count.
- It can range from being a Small Scale R1 to being a Full Scale R1, depending on its deployment requirements.
- ...
- It can achieve benchmark performance with:
- AIME Score of 79.8% accuracy
- MATH Benchmark of 91.6% accuracy
- Codeforces Ranking in 96.3 percentile
- ...
- It can utilize model capability through:
- It can be accessed via DeepSeek API.
- ...
- Example(s):
- DeepSeek-R1-671B requiring multi-GPU systems.
- R1-Distill-Qwen-32B for high-end GPUs.
- R1-Distill-Qwen-14B for mid-range GPUs.
- R1-Distill-Qwen-7B for consumer GPUs.
- ...
- Counter-Example(s):
- OpenAI o1 LLM, which uses different architecture and is proprietary.
- DeepSeek Coder, which focuses solely on code generation.
- DeepSeek MoE, which lacks specialized reasoning capability.
- See: DeepSeek Model Family, Reasoning LLM, MoE Architecture, Open Source LLM.
References
2025-01
- https://github.com/deepseek-ai/DeepSeek-R1
- NOTES:
- Training and Architecture:
- It can deploy Large Scale Reinforcement Learning with 671B total parameters and 37B activated parameters
- It can utilize MoE Architecture with 128K context length
- It can demonstrate Zero-Shot Learning capabilities through DeepSeek-R1-Zero variant
- It can integrate Cold Start Data for improved performance in DeepSeek-R1
- Performance Benchmarks:
- It can achieve 90.8% accuracy on MMLU Testing
- It can score 97.3% pass rate on MATH-500 Assessment
- It can obtain 2029 rating on Codeforces Challenges
- It can demonstrate 65.9% pass rate on LiveCodeBench Evaluation
- It can perform effectively on Chinese Language Tasks with 91.8% C-Eval score
- Model Variants:
- It can provide Base Models including DeepSeek-R1-Zero and DeepSeek-R1
- It can support Distilled Models ranging from 1.5B to 70B parameters
- It can maintain performance in smaller architectures through knowledge distillation
- It can achieve State Of The Art results with DeepSeek-R1-Distill-Qwen-32B
- Implementation Features:
- It can operate with Temperature Settings between 0.5-0.7
- It can perform optimally using only User Prompts without system prompts
- It can support Commercial Use under MIT License
- It can provide access through chat.deepseek.com interface
- It can offer OpenAI Compatible API integration
- It can demonstrate Self Verification and reflection capabilities
- Training and Architecture:
- NOTES:
2025-01-24
- Perplexity.ai
- Question: What are the key differences between DeepSeek-R1-Zero and DeepSeek-R1 in terms of their training approach, performance, and limitations?
- Answer: The DeepSeek-R1-Zero employs pure reinforcement learning without supervised fine-tuning, while the DeepSeek-R1 uses a hybrid approach combining both SFT and RL with cold-start data.
- Training Approaches:
- DeepSeek-R1-Zero Training:
- Uses pure reinforcement learning approach.
- Excludes supervised fine-tuning phase.
- DeepSeek-R1 Training:
- Implements hybrid training methodology.
- Incorporates cold-start data before RL phase.
- DeepSeek-R1-Zero Training:
- Performance Characteristics:
- DeepSeek-R1-Zero Performance:
- Achieves 71.0% Pass@1 accuracy on AIME 2024 Test.
- Reaches 86.7% accuracy with majority voting.
- Attains 95.9% accuracy on MATH 500 Test.
- Scores 73.3% on GPQA Diamond Test.
- DeepSeek-R1 Performance:
- Matches OpenAI-o1 across math, code, and reasoning tasks.
- Demonstrates superior readability and language consistency.
- Shows enhanced natural language coherence.
- DeepSeek-R1-Zero Performance:
- Technical Specifications:
- Common Characteristics:
- Total Parameters: 671B.
- Activated Parameters: 37B.
- Context Length: 128K.
- Common Characteristics:
- Model Limitations:
- DeepSeek-R1-Zero Limitations:
- Exhibits endless repetition issues.
- Shows poor readability.
- Experiences language mixing problems.
- DeepSeek-R1 Limitations:
- Shows function calling constraints.
- Faces multi-turn conversation challenges.
- Struggles with complex role-playing.
- Demonstrates prompt sensitivity.
- Exhibits suboptimal software engineering performance.
- DeepSeek-R1-Zero Limitations:
- Model Architecture:
- Both models utilize Mixture of Experts with selective parameter activation guided by sophisticated routing.
- Training Approaches:
- Citations:
[1] https://deepinfra.com/deepseek-ai/DeepSeek-R1 [2] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1 [3] https://www.gocodeo.com/post/deepseek-r1-and-deepseek-r1-zero [4] https://arxiv.org/html/2501.12948v1 [5] https://workos.com/blog/deepseek-r1-pushes-local-and-open-ai-forward [6] https://www.reddit.com/r/LocalLLaMA/comments/1i5jh1u/deepseek_r1_r1_zero/ [7] https://www.datacamp.com/blog/deepseek-r1 [8] https://www.reddit.com/r/LocalLLaMA/comments/1i8rujw/notes_on_deepseek_r1_just_how_good_it_is_compared/
2025-01-24
- Perplexity.ai
- Question: How does the DeepSeek R1 represent its thought process? Please explain its tag structure, reasoning characteristics, and format sensitivity.
- Answer: The DeepSeek R1 represents its thought process using a structured XML-like format with specific tags, encapsulating its reasoning within `<think>` and `</think>` tags, followed by responses in `<answer>` tags.
- Tag Structure:
- Reasoning Format:
- Think tags
<think>...</think>
encapsulate the reasoning process. - Answer tags contain the final response.
- Think tags
- Reasoning Format:
- Reasoning Characteristics:
- Chain-of-Thought Process:
- Self-verification of answers.
- Ability to reevaluate flawed logic.
- Solution validation within chain of thought.
- Emergent Behaviors:
- Exhibits "Aha moment" phenomenon for reasoning correction.
- Manifests through phrases like "Wait a minute" or "Wait, but...".
- Chain-of-Thought Process:
- Format Sensitivity:
- Zero-Shot Performance:
- Optimal with direct problem description.
- Requires specified output format.
- Few-Shot Limitations:
- Zero-Shot Performance:
- Language Considerations:
- Language Processing:
- Excels at English and Chinese reasoning.
- May encounter language mixing issues.
- Defaults to English for reasoning process.
- Language Processing:
- Tag Structure:
- Citations:
[1] https://patmcguinness.substack.com/p/deepseek-releases-r1-and-opens-up [2] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1 [3] https://adasci.org/mastering-llms-reasoning-capability-with-deepseek-r1/ [4] https://c3.unu.edu/blog/deepseek-r1-pioneering-open-source-thinking-model-and-its-impact-on-the-llm-landscape [5] https://www.ignorance.ai/p/r1-is-reasoning-for-the-masses [6] https://arxiv.org/html/2501.12948v1 [7] https://www.reddit.com/r/LocalLLaMA/comments/1i73x81/you_can_extract_reasoning_from_r1_and_pass_it/ [8] https://www.datacamp.com/blog/deepseek-r1