An information-theoretic study of lying in LLMs
Published in ICML 2024 Workshop on LLMs and Cognition, 2024
We investigate the dynamics of the predictive distribution across the layers of LLMs instructed to lie and tell the truth using information theory and logit lens.
