mohamed ouaicha's picture
1 42

mohamed ouaicha

Bssayla

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago
tencent/HunyuanVideo
View all activity

Organizations

MLX Community's profile picture Moroccan Data Scientists's profile picture ThinkAI's profile picture

Bssayla's activity

liked a Space 7 months ago
reacted to gsarti's post with ❤️ 11 months ago
view post
Post
🔍 Today's pick in Interpretability & Analysis of LMs: Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models by C. Agarwal, S.H. Tanneru and H. Lakkaraju

This work discusses the dichotomy between faithfulness and plausibility in LLMs’ self-explanations (SEs) in natural language (CoT, counterfactual reasoning, and token importance). These explanations tend to be reasonable according to human understanding (plausible) but are not always aligned with the reasoning processes of the LLMs (unfaithful).

Authors remark that the increase in plausibility driven by the request for a friendly conversational interface might come at the expense of faithfulness. Provided the faithfulness requirements of many high-stakes real-world settings, authors suggest these are considered when designing and evaluating new explanation methodologies.

Finally, the authors call for a community effort to 1) develop reliable metrics to characterize the faithfulness of explanations and 2) pioneering novel strategies to generate more faithful SEs.

📄 Paper: Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models (2402.04614)

🔍 All daily picks in LM interpretability: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9