OpenAI GPT-4 Multimodal Language Model

From GM-RKB
(Redirected from OpenAI GPT-4 language model)
Jump to navigation Jump to search

An OpenAI GPT-4 Multimodal Language Model is an multimodal language model within an OpenAI GPT-4 LLM Family (LLM family).

See: Mixture of Experts (MoE), Azure OpenAI Service, Auto-GPT Framework, DSPy Framework.



References

2024-12-13

[1] https://platform.openai.com/docs/models/gp
[2] https://research.aimultiple.com/gpt4/
[3] https://lingarogroup.com/blog/whats-new-with-gpt-4-features-and-limitations
[4] https://www.techtarget.com/whatis/feature/GPT-4o-explained-Everything-you-need-to-know
[5] https://openai.com/index/gpt-4-research/
[6] https://www.restack.io/p/gpt-4-training-workshops-answer-training-data-size-cat-ai
[7] https://www.version1.com/blog/openai-gpt-4-a-complete-review/
[8] https://en.wikipedia.org/wiki/GPT-4
[9] https://www.datacamp.com/blog/what-we-know-gpt4
[10] https://pmc.ncbi.nlm.nih.gov/articles/PMC10795998/

2023

  • https://platform.openai.com/docs/models/gpt-4
    • QUOTE: GPT-4 is a large multimodal model (accepting text inputs and emitting text outputs today, with image inputs coming in the future) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities. Like gpt-3.5-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks using the Chat completions API. Learn how to use GPT-4 in our GPT guide.
Latest model	Description	Max tokens	Training data
gpt-4	More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration 2 weeks after it is released.	8,192 tokens	Up to Sep 2021
gpt-4-0613	Snapshot of gpt-4 from June 13th 2023 with function calling data. Unlike gpt-4, this model will not receive updates, and will be deprecated 3 months after a new version is released.	8,192 tokens	Up to Sep 2021
gpt-4-32k	Same capabilities as the standard gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration.	32,768 tokens	Up to Sep 2021
gpt-4-32k-0613	Snapshot of gpt-4-32 from June 13th 2023. Unlike gpt-4-32k, this model will not receive updates, and will be deprecated 3 months after a new version is released.	32,768 tokens	Up to Sep 2021
gpt-4-0314 (Legacy)	Snapshot of gpt-4 from March 14th 2023 with function calling data. Unlike gpt-4, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest.	8,192 tokens	Up to Sep 2021
gpt-4-32k-0314 (Legacy)	Snapshot of gpt-4-32 from March 14th 2023. Unlike gpt-4-32k, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest.	32,768 tokens

2023

  • https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/
    • GPT-4's Scale: GPT-4 has ~1.8 trillion parameters across 120 layers, which is over 10 times larger than GPT-3.
    • Mixture Of Experts (MoE): OpenAI utilizes 16 experts within their model, each with ~111B parameters for MLP. Two of these experts are routed per forward pass, which contributes to keeping costs manageable.
    • Dataset: GPT-4 is trained on ~13T tokens, including both text-based and code-based data, with some fine-tuning data from ScaleAI and internally.
    • Dataset Mixture: The training data included CommonCrawl & RefinedWeb, totaling 13T tokens. Speculation suggests additional sources like Twitter, Reddit, YouTube, and a large collection of textbooks.
    • Training Cost: The training costs for GPT-4 was around $63 million, taking into account the computational power required and the time of training.
    • Inference Cost: GPT-4 costs 3 times more than the 175B parameter Davinci, due to the larger clusters required and lower utilization rates.
    • Inference Architecture: The inference runs on a cluster of 128 GPUs, using 8-way tensor parallelism and 16-way pipeline parallelism.
    • Vision Multi-Modal: GPT-4 includes a vision encoder for autonomous agents to read web pages and transcribe images and videos. The architecture is similar to Flamingo. This adds more parameters on top and it is fine-tuned with another ~2 trillion tokens.

2023

  • https://openai.com/research/gpt-4
    • We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails. ...

      ...

    • Steerability:

      We’ve been working on each aspect of the plan outlined in our post about defining the behavior of AIs, including steerability. Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the “system” message. System messages allow API users to significantly customize their users’ experience within bounds. ...

      ...

    • Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document, and was trained using publicly available data (such as internet data) as well as data we’ve licensed. The data is a web-scale corpus of data including correct and incorrect solutions to math problems, weak and strong reasoning, self-contradictory and consistent statements, and representing a great variety of ideologies and ideas. So when prompted with a question, the base model can respond in a wide variety of ways that might be far from a user’s intent. To align it with the user’s intent within guardrails, we fine-tune the model’s behavior using reinforcement learning with human feedback (RLHF). Note that the model’s capabilities seem to come primarily from the pre-training process—RLHF does not improve exam performance (without active effort, it actually degrades it). But steering of the model comes from the post-training process—the base model requires prompt engineering to even know that it should answer the questions.
    • ...
    • Once you have access, you can make text-only requests to the gpt-4 model (image inputs are still in limited alpha), which we will automatically update to our recommended stable model as we make new versions over time (you can pin the current version by calling gpt-4-0314, which we’ll support until June 14). Pricing is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens. Default rate limits are 40k tokens per minute and 200 requests per minute.

       gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k, which will also be updated automatically over time (current version gpt-4-32k-0314, also supported until June 14). Pricing is $0.06 per 1K prompt tokens and $0.12 per 1k completion tokens. We are still improving model quality for long context and would love feedback on how it performs for your use-case. We are processing requests for the 8K and 32K engines at different rates based on capacity, so you may receive access to them at different times.