Piaw's Blog: Review: Human Compatible

I took my one and only AI class from Stuart Russell, who wrote Human Compatible. Written in 2019, this book predated OpenAI's ChatGPT and the LLM revolution, but nevertheless anticipated many modern concerns about the rise of AI. It addresses concerns such as the paperclip apocalypse with a critique of current AI approaches to problem solving.

Fundamentally, Russell's critique of the current AI approach is that the systems that are designed have an explicit goal and 100% certainty about their goals. This is appropriate if the AI system is incompetent and sucky, but will lead to bad outcomes if the system is superior in intelligence to humans and can prevent humans from interfering with its goals by turning it off.

The solution, Russell claims, is for the AI system's goals to be to be to assist the human's goals and to infer those goals from the human's statements and behavior. The inherent uncertainty about human goals will force the AI system to ask questions, and allow itself to be turned off if necessary. There's excellent analysis as why this is and why this is rational even in the case of super-intelligence.

The solution is elegant, interesting, and obviously unenforceable --- all it takes is one bad actor with super-intelligence to deviate from this principle and we'll be back at the paperclip apocalypse again. On the other hand, we're probably very far from being able to encode this sort of thinking into an AI system, so obviously this has no direct impact on current research.

Nevertheless, it's a great book that's well written and has an intelligent solution to what's widely perceived as a common problem. Recommended.

Piaw's Blog

Wednesday, October 15, 2025

Review: Human Compatible

No comments:

Post a Comment