2023 LoftQLoRAFineTuningAwareQuantiz
- (Li, Yu et al., 2023) ⇒ Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, and Tuo Zhao. (2023). “LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models.” doi:10.48550/arXiv.2310.08659
Subject Headings: LoRA Fine-Tuning.
Notes
- LoftQ (LoRA-Fine-Tuning-aware Quantization) is a novel quantization framework for large language models (LLMs) that is specifically tailored for pre-trained models that necessitate quantization and LoRA fine-tuning. It actively combines low-rank approximation, working in conjunction with quantization to jointly approximate the original high-precision pre-trained weights.
- LoftQ addresses the following challenges:
- Quantization discrepancy: Quantization can cause a performance gap between the quantized model and the full-precision model.
- Imbalance between quantization and adaptation: Existing quantization methods for LLMs often suffer from an imbalance between the degrees of freedom of quantization and adaptation. This imbalance can lead to large quantization errors and performance degradation.
- To address these challenges, LoftQ:
- Simultaneously quantizes the LLM and finds a proper low-rank initialization for LoRA fine-tuning: This ensures that the quantized model is initialized in a way that is compatible with LoRA fine-tuning.
- Balances the degrees of freedom between quantization and adaptation: LoftQ uses group-wise operations to balance the degrees of freedom between quantization and adaptation. This allows LoftQ to quantize the LLM weights into low-bit integers during fine-tuning to reduce memory and computation, without sacrificing performance.
Cited By
Quotes
Abstract
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 LoftQLoRAFineTuningAwareQuantiz | Weizhu Chen Chen Liang Pengcheng He Yixiao Li Yifan Yu Nikos Karampatziakis Tuo Zhao | LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | 10.48550/arXiv.2310.08659 | 2023 |