Sparse Autoencoders Find Highly Interpretable Features in Language Models
Paper • 2309.08600 • Published • 15
A collection of papers that I found useful for learning about using Sparse Autoencoders for finding interpretable features in language models