Synthetic LLM Training Dataset
Jump to navigation
Jump to search
A Synthetic LLM Training Dataset is an LLM training dataset (for LLM training) that is a synthetic training dataset.
- See: Training Dataset, Orca 2 LLM.
References
2023
- (Lightman et al., 2023) ⇒ Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. (2023). “Let's Verify Step by Step.” In: arXiv preprint arXiv:2305.20050. doi:10.48550/arXiv.2305.20050
2023
- (Mitra et al., 2023) ⇒ Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, and Ahmed Awadallah. (2023). “Orca 2: Teaching Small Language Models How to Reason.” In: arXiv preprint arXiv:2311.11045. doi:10.48550/arXiv.2311.11045.