Tuesday Jul 09, 2024
arxiv preprint - Evaluating Human Alignment and Model Faithfulness of LLM Rationale
In this episode, we discuss Evaluating Human Alignment and Model Faithfulness of LLM Rationale by Mohsen Fayyaz, Fan Yin, Jiao Sun, Nanyun Peng. The paper investigates how effectively large language models (LLMs) can explain their decisions through rationales extracted from input texts. It compares two types of rationale extraction methods—attribution-based and prompting-based—finding that prompting-based rationales better align with human-annotated rationales. The study also explores the faithfulness limitations of prompting-based methods and shows that fine-tuning models on specific datasets can improve the faithfulness of both rationale extraction approaches.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.