The Building Blocks of Interpretability

Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them — and the rich structure of this combinatorial space.

Wow—this is a massive new study from Google and CMU on what is potentially the hottest topic in AI today. The authors put a tremendous amount of work into the dynamic content to interact with and attempt to explain the behaviors of a network. Really very cool work, although it's really more of an exploration than an announcement of specific findings. Here's the conclusion:

There is a rich design space for interacting with enumerative algorithms, and we believe an equally rich space exists for interacting with neural networks. We have a lot of work left ahead of us to build powerful and trustworthy interfaces for interpretability. But, if we succeed, interpretability promises to be a powerful tool in enabling meaningful human oversight and in building fair, safe, and aligned AI systems.

Long, but even if you don't read the whole thing you should still click through and play with the interactives.


Want to receive more content like this in your inbox?