Cooperatively Learning Human Values
We, humans, have a notoriously difficult time specifying what we actually want, and the AI systems we build suffer from it. With AI, this warning actually becomes “Be careful what you reward!”. When we design and deploy an AI agent for some application, we need to specify what we want it to do, and this typically takes the form of a reward function: a function that tells the agent which state and action combinations are good. A car reaching its destination is good, and a car crashing into another car is not so good.