OpenAI o1 LLM
Jump to navigation
Jump to search
A OpenAI o1 LLM is an OpenAI GPT-4 multimodal LLM released on 2024-09-12
.
- Context:
- It can (typically) spend more time thinking before providing responses.
- ...
- It can process text, images, and audio data to handle complex reasoning tasks.
- It can provide enhanced real-time speech recognition and advanced text-to-speech capabilities.
- ...
- Example(s):
- ...
- Counter-Example(s):
- See: OpenAI LLM Model, Foundation Neural Model, GPT-4 Turbo
References
2024
- (OpenAI, 2024) ⇒ OpenAI. (2024). "Introducing OpenAI o1 LLM."
- NOTES:
- It is a multimodal large language model designed for advanced reasoning in challenging domains such as physics and computational tasks.
- NOTES:
2024
- https://openai.com/index/learning-to-reason-with-llms/
- NOTES:
- o1 is a new large language model trained with reinforcement learning to perform complex reasoning, producing a detailed internal chain of thought before responding to users.
- The model significantly outperforms GPT-4o on challenging reasoning benchmarks across math, coding, and science exams, demonstrating advanced problem-solving capabilities.
- On the 2024 AIME exams:
- In competitive programming, o1 ranks:
- In the 89th percentile on Codeforces.
- Achieving an Elo rating of 1807, outperforming 93% of human competitors.
- o1 exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA), surpassing the performance of human experts in these domains.
- Through reinforcement learning, o1:
- Refines its chain of thought.
- Learns to recognize and correct mistakes.
- Breaks down complex steps.
- Adopts different approaches when necessary.
- The model's performance improves with:
- Increased reinforcement learning (train-time compute).
- More time allocated for reasoning (test-time compute), showing scalability with computational resources.
- An early version, o1-preview:
- Human preference evaluations show:
- o1-preview is strongly preferred over GPT-4o in reasoning-intensive tasks like data analysis, coding, and math.
- It is less preferred in some natural language tasks.
- Chain of thought reasoning enhances safety and alignment by:
- Enabling the model to reason about safety rules internally.
- Making it more robust to unexpected scenarios.
- To balance user experience and safety:
- OpenAI provides a model-generated summary of the chain of thought.
- Instead of revealing the raw reasoning process to users.
- NOTES: