LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Reward hacking

Reward hacking

From the course: Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes

Start my 1-month free trial Buy for my team

Reward hacking

“

- [Instructor] Many years ago, a dog heard a child's cry from the banks of the River Seine in Paris. The dog jumped into the water, saving the drowning child's life by safely bringing it to shore. As you can imagine, the dog received a lot of positive attention that day. The local showered him with affection, giving him a beef steak as a thank you. A few days later, a similar thing happened. Once again, the dog saved a drowning child, and once again, the dog got a steak. Then a pattern started to develop. More and more frequently, children were rescued by the dog. The town even established a dedicated neighborhood watch to catch the culprit in the act. The truth soon surfaced. The dog was pushing children into the water because he knew a rescue would lead to a great reward. He engineered circumstances to make it happen. This is a classic example of what we now call reward hacking, and it's something an AI can learn…

Contents