Open-Source Large Language Model (LLM)
Jump to navigation
Jump to search
A Open-Source Large Language Model (LLM) is a large language model that is an open source AI model (which can be freely accessed, modified, and distributed).
- Context:
- It can (typically) encourage contributions from the wider research community and industry practitioners.
- It can (often) rely on publicly available training datasets for its model training.
- It can (often) be used as a starting point for more specialized or customized language models.
- …
- Example(s):
- Counter-Example(s):
- A Proprietary LLM, such as GPT-4.
- A Closed-Source ML Model.
- See: Natural Language Processing, Machine Learning Model, Artificial Intelligence, Open-Source Software.
References
2023
- (Mitra et al., 2023) ⇒ Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, and Ahmed Awadallah. (2023). “Orca 2: Teaching Small Language Models How to Reason.” In: arXiv preprint arXiv:2311.11045. DOI:10.48550/arXiv.2311.11045
- It introduces Orca 2, a smaller language model that enhances reasoning abilities. It achieves performance levels comparable to or better than models 5-10 times larger on complex reasoning tasks.
- Orca 2 is created by fine-tuning the LLAMA 2 base models using tailored, high-quality synthetic data that teaches various reasoning techniques.
- Evaluation using 15 diverse benchmarks shows Orca 2 matches or surpasses the performance of larger models. It has limitations common to language models but shows potential for reasoning improvements in smaller models.
- The key insight is that tailored synthetic training data and training smaller LLM models on diverse reasoning strategies allows them to attain capabilities typically seen only in much larger models.
[[Category: