DeepSeek R1 LLM

A DeepSeek R1 LLM is a DeepSeek reasoning LLM (designed to perform automated-reasoning tasks through mixture-of-experts architecture).

Context:
- It can (typically) perform mathematical reasoning through step-by-step solution processing.
- It can (typically) handle coding challenges through algorithmic problem solving.
- It can (typically) process long documents through extended context window.
- It can (typically) optimize computational resources through sparse activation.
- It can (typically) maintain model efficiency through parameter sharing.
- ...
- It can (often) support multilingual processing through language understanding.
- It can (often) enable local deployment through distilled versions.
- It can (often) provide cost advantages through token pricing.
- It can (often) allow commercial use through MIT license.
- It can (often) handle heavy workloads with some performance limitations.
- ...
- It can range from being a Base R1 Model to being a Distilled R1 Model, depending on its parameter count.
- It can range from being a Small Scale R1 to being a Full Scale R1, depending on its deployment requirements.
- ...
- It can achieve benchmark performance with:
  - AIME Score of 79.8% accuracy
  - MATH Benchmark of 91.6% accuracy
  - Codeforces Ranking in 96.3 percentile
- ...
- It can utilize model capability through:
  - Mixture of Experts with 671 billion parameters
  - Sparse Activation of 37 billion parameters per pass
  - Context Window of 128,000 tokens
  - Token Training on 14.8 trillion tokens
- It can be accessed via DeepSeek API.
- ...
Example(s):
- DeepSeek-R1-671B requiring multi-GPU systems.
- R1-Distill-Qwen-32B for high-end GPUs.
- R1-Distill-Qwen-14B for mid-range GPUs.
- R1-Distill-Qwen-7B for consumer GPUs.
- ...
Counter-Example(s):
- OpenAI o1 LLM, which uses different architecture and is proprietary.
- DeepSeek Coder, which focuses solely on code generation.
- DeepSeek MoE, which lacks specialized reasoning capability.
See: DeepSeek Model Family, Reasoning LLM, MoE Architecture, Open Source LLM.

References

2025-01

https://github.com/deepseek-ai/DeepSeek-R1
- NOTES:
  - Training and Architecture:
    - It can deploy Large Scale Reinforcement Learning with 671B total parameters and 37B activated parameters
    - It can utilize MoE Architecture with 128K context length
    - It can demonstrate Zero-Shot Learning capabilities through DeepSeek-R1-Zero variant
    - It can integrate Cold Start Data for improved performance in DeepSeek-R1
  - Performance Benchmarks:
    - It can achieve 90.8% accuracy on MMLU Testing
    - It can score 97.3% pass rate on MATH-500 Assessment
    - It can obtain 2029 rating on Codeforces Challenges
    - It can demonstrate 65.9% pass rate on LiveCodeBench Evaluation
    - It can perform effectively on Chinese Language Tasks with 91.8% C-Eval score
  - Model Variants:
    - It can provide Base Models including DeepSeek-R1-Zero and DeepSeek-R1
    - It can support Distilled Models ranging from 1.5B to 70B parameters
    - It can maintain performance in smaller architectures through knowledge distillation
    - It can achieve State Of The Art results with DeepSeek-R1-Distill-Qwen-32B
  - Implementation Features:
    - It can operate with Temperature Settings between 0.5-0.7
    - It can perform optimally using only User Prompts without system prompts
    - It can support Commercial Use under MIT License
    - It can provide access through chat.deepseek.com interface
    - It can offer OpenAI Compatible API integration
    - It can demonstrate Self Verification and reflection capabilities

2025-01-24

Perplexity.ai
- Question: What are the key differences between DeepSeek-R1-Zero and DeepSeek-R1 in terms of their training approach, performance, and limitations?
- Answer: The DeepSeek-R1-Zero employs pure reinforcement learning without supervised fine-tuning, while the DeepSeek-R1 uses a hybrid approach combining both SFT and RL with cold-start data.
  - Training Approaches:
    - DeepSeek-R1-Zero Training:
      - Uses pure reinforcement learning approach.
      - Excludes supervised fine-tuning phase.
    - DeepSeek-R1 Training:
      - Implements hybrid training methodology.
      - Incorporates cold-start data before RL phase.
  - Performance Characteristics:
    - DeepSeek-R1-Zero Performance:
      - Achieves 71.0% Pass@1 accuracy on AIME 2024 Test.
      - Reaches 86.7% accuracy with majority voting.
      - Attains 95.9% accuracy on MATH 500 Test.
      - Scores 73.3% on GPQA Diamond Test.
    - DeepSeek-R1 Performance:
      - Matches OpenAI-o1 across math, code, and reasoning tasks.
      - Demonstrates superior readability and language consistency.
      - Shows enhanced natural language coherence.
  - Technical Specifications:
    - Common Characteristics:
      - Total Parameters: 671B.
      - Activated Parameters: 37B.
      - Context Length: 128K.
  - Model Limitations:
    - DeepSeek-R1-Zero Limitations:
      - Exhibits endless repetition issues.
      - Shows poor readability.
      - Experiences language mixing problems.
    - DeepSeek-R1 Limitations:
      - Shows function calling constraints.
      - Faces multi-turn conversation challenges.
      - Struggles with complex role-playing.
      - Demonstrates prompt sensitivity.
      - Exhibits suboptimal software engineering performance.
  - Model Architecture:
    - Both models utilize Mixture of Experts with selective parameter activation guided by sophisticated routing.
- Citations:

[1] https://deepinfra.com/deepseek-ai/DeepSeek-R1
[2] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
[3] https://www.gocodeo.com/post/deepseek-r1-and-deepseek-r1-zero
[4] https://arxiv.org/html/2501.12948v1
[5] https://workos.com/blog/deepseek-r1-pushes-local-and-open-ai-forward
[6] https://www.reddit.com/r/LocalLLaMA/comments/1i5jh1u/deepseek_r1_r1_zero/
[7] https://www.datacamp.com/blog/deepseek-r1
[8] https://www.reddit.com/r/LocalLLaMA/comments/1i8rujw/notes_on_deepseek_r1_just_how_good_it_is_compared/

2025-01-24

Perplexity.ai
- Question: How does the DeepSeek R1 represent its thought process? Please explain its tag structure, reasoning characteristics, and format sensitivity.
- Answer: The DeepSeek R1 represents its thought process using a structured XML-like format with specific tags, encapsulating its reasoning within `<think>` and `</think>` tags, followed by responses in `<answer>` tags.
  - Tag Structure:
    - Reasoning Format:
      - Think tags <think>...</think> encapsulate the reasoning process.
      - Answer tags contain the final response.
  - Reasoning Characteristics:
    - Chain-of-Thought Process:
      - Self-verification of answers.
      - Ability to reevaluate flawed logic.
      - Solution validation within chain of thought.
    - Emergent Behaviors:
      - Exhibits "Aha moment" phenomenon for reasoning correction.
      - Manifests through phrases like "Wait a minute" or "Wait, but...".
  - Format Sensitivity:
    - Zero-Shot Performance:
      - Optimal with direct problem description.
      - Requires specified output format.
    - Few-Shot Limitations:
      - Performance degrades with few-shot prompting.
  - Language Considerations:
    - Language Processing:
      - Excels at English and Chinese reasoning.
      - May encounter language mixing issues.
      - Defaults to English for reasoning process.
- Citations:

[1] https://patmcguinness.substack.com/p/deepseek-releases-r1-and-opens-up
[2] https://www.prompthub.us/blog/deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
[3] https://adasci.org/mastering-llms-reasoning-capability-with-deepseek-r1/
[4] https://c3.unu.edu/blog/deepseek-r1-pioneering-open-source-thinking-model-and-its-impact-on-the-llm-landscape
[5] https://www.ignorance.ai/p/r1-is-reasoning-for-the-masses
[6] https://arxiv.org/html/2501.12948v1
[7] https://www.reddit.com/r/LocalLLaMA/comments/1i73x81/you_can_extract_reasoning_from_r1_and_pass_it/
[8] https://www.datacamp.com/blog/deepseek-r1

DeepSeek R1 LLM

References

2025-01

2025-01-24

2025-01-24

Navigation menu

Search