Fine-Grained Reward Method
Jump to navigation
Jump to search
A Fine-Grained Reward Method is a model training methodology that utilizes detailed feedback to refine the learning process of machine learning models, particularly LLMs.
- Context:
- It can (typically) target individual components of a task (offering precise guidance to improve performance).
- It can (often) involves analyzing outputs at a granular level, such as sentence or paragraph, to provide context-specific feedback.
- It can be applied to various domains such as natural language processing, computer vision, and reinforcement learning to enhance model accuracy and reliability.
- It can improve the quality of model-generated content, including the generation of relevant citations and reducing hallucinations.
- It can requires the development of sophisticated reward functions that can accurately measure the quality of complex outputs.
- It can lead to models that better understand and replicate human-like decision-making and content creation processes.
- ...
See: Reinforcement Learning, Natural Language Processing, Machine Learning Model Evaluation.
References
2024
- (Huang et al., 2024) ⇒ Chengyu Huang, Zeqiu Wu, Yushi Hu, and Wenya Wang. (2024). “Training Language Models to Generate Text with Citations via Fine-grained Rewards.” doi:10.48550/arXiv.2402.04315
- QUOTE: ... In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo. ...
- NOTE:
- It employs fine-grained rewards during training to encourage the production of text with relevant citations.
2024
- (Wu et al., 2024) ⇒ Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, and Hannaneh Hajishirzi. (2024). “Fine-grained Human Feedback Gives Better Rewards for Language Model Training.” Advances in Neural Information Processing Systems 36
- ABSTRACT: Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF)---where human preference judgments on LM outputs are transformed into a learning signal---has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with this reward function leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at
https://FineGrainedRLHF.github.io
.
- ABSTRACT: Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF)---where human preference judgments on LM outputs are transformed into a learning signal---has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with this reward function leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at