OpenAI Codex LLM Model

From GM-RKB
Jump to navigation Jump to search

An OpenAI Codex LLM Model is an text-to-code model.



References

2023

2023

2023

  1. Cite error: Invalid <ref> tag; no text was provided for refs named OAI

2022

  • https://platform.openai.com/docs/models/codex
    • QUOTE: The Codex models are descendants of our GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub. Learn more.

      They’re most capable in Python and proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.

    • We currently offer two Codex models:
      • Latest model Description Max request Training data
      • code-davinci-002 Most capable Codex model. Particularly good at translating natural language to code. In addition to completing code, also supports inserting completions within code. 8,000 tokens Up to Jun 2021
      • code-cushman-001 Almost as capable as Davinci Codex, but slightly faster. This speed advantage may make it preferable for real-time applications.

2022

2022

  • (Ansley et al., 2022) ⇒ James Finnie-Ansley, Paul Denny, Brett A. Becker, Andrew Luxton-Reilly, and James Prather. (2022). “The Robots Are Coming: Exploring the Implications of Openai Codex on Introductory Programming.” In: Australasian Computing Education Conference, pp. 10-19.
    • ABSTRACT: Recent advances in artificial intelligence have been driven by an exponential growth in digitised data. Natural language processing, in particular, has been transformed by machine learning models such as OpenAI’s GPT-3 which generates human-like text so realistic that its developers have warned of the dangers of its misuse. In recent months OpenAI released Codex, a new deep learning model trained on Python code from more than 50 million GitHub repositories. Provided with a natural language description of a programming problem as input, Codex generates solution code as output. It can also explain (in English) input code, translate code between programming languages, and more. In this work, we explore how Codex performs on typical introductory programming problems. We report its performance on real questions taken from introductory programming exams and compare it to results from students who took these same exams under normal conditions, demonstrating that Codex outscores most students. We then explore how Codex handles subtle variations in problem wording using several published variants of the well-known “Rainfall Problem” along with one unpublished variant we have used in our teaching. We find the model passes many test cases for all variants. We also explore how much variation there is in the Codex generated solutions, observing that an identical input prompt frequently leads to very different solutions in terms of algorithmic approach and code length. Finally, we discuss the implications that such technology will have for computing education as it continues to evolve, including both challenges and opportunities.
    • KEYWORDS: academic integrity; AI; artificial intelligence; code generation; code writing; Codex; copilot; CS1; deep learning; introductory programming; GitHub; GPT-3; machine learning; neural networks; novice programming; OpenAI
    • QUOTES:
      • ... In 2021, OpenAI released Codex, a descendent of GPT-3 that was trained on an additional 159GB of Python code from >50M GitHub repositories [5]. Codex is “proficient” in over a dozen programming languages including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, and Shell, but is “most capable” in Python [42] ...

2021