OpenAI o1 LLM

From GM-RKB
(Redirected from GPT-4 o1)
Jump to navigation Jump to search

A OpenAI o1 LLM is an OpenAI reasoning LLM (designed to perform automated-reasoning tasks).



References

2024-12-21

[1] https://www.cmswire.com/digital-experience/chatgpts-new-family-openai-o1-unveils-advanced-ai-reasoning/
[2] https://openai.com/index/learning-to-reason-with-llms/
[3] https://www.kommunicate.io/blog/meet-openai-o1/
[4] https://arxiv.org/html/2410.13639v1
[5] https://codingscape.com/blog/openai-o1-preview-frontier-llm-math-science-code
[6] https://datasciencedojo.com/blog/openai-model-o1/
[7] https://venturebeat.com/ai/alibaba-researchers-unveil-marco-o1-an-llm-with-advanced-reasoning-capabilities/
[8] https://www.champsoft.com/2024/09/20/understanding-openai-o1-comprehensive-guide-to-llm-technology/
[9] https://botpress.com/blog/openai-o1
[10] https://www.techtarget.com/whatis/feature/OpenAI-o1-explained-Everything-you-need-to-know

2024

  • (Valmeekam et al., 2024) ⇒ Karthik Valmeekam, Kaya Stechly, and Subbarao Kambhampati. (2024). "LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench." In: arXiv preprint. DOI: 10.48550/arXiv.2409.13373
    • ABSTRACT: The paper investigates whether large language models (LLMs) possess the ability to plan, a critical function of intelligent agents. Using PlanBench, a benchmark introduced in 2022, the authors evaluate the performance of various LLMs and OpenAI's new Large Reasoning Model (LRM) o1 (Strawberry). While o1 shows significant improvement, it still faces challenges in meeting the benchmark's full potential.
    • NOTES:
      • o1 is a Large Reasoning Model (LRM), designed by OpenAI to go beyond the capabilities of traditional autoregressive Large Language Models (LLMs), with a focus on reasoning and planning tasks.
      • o1 incorporates reinforcement learning pre-training, allowing it to generate and evaluate chains of reasoning (Chain-of-Thought) to improve performance on complex tasks like planning.
      • o1 shows substantial improvements in planning benchmarks, achieving 97.8% accuracy on simple PlanBench Blocksworld tasks, far surpassing previous LLMs, but its performance degrades significantly on larger, more complex problems.
      • While o1 is more expensive and lacks guarantees, it dynamically adjusts its inference processes, using adaptive reasoning tokens, though it still struggles with unsolvable problems and complex, obfuscated tasks.

2024

2024