From the course: Security Risks in AI and Machine Learning: Categorizing Attacks and Failure Modes

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Dataset threat model

Dataset threat model

- [Instructor] If you've ever accidentally taught your autocorrect system that a typo is a actual word and then had to spend months correcting that typo every time your system tried to insert it, you know how frustrating it can be when machines learn the wrong thing. If data used to train an AI or provide context isn't accurate, the outcomes won't be either. That's why it's important to vet datasets and implement dataset hygiene policies. If your usage involves training, either training a new model, retraining an existing one, or fine tuning a foundation model, thinking through data hygiene, integrity and protection controls is extremely important because biased data leads to bias AI. But bias isn't always intentional. Consider an automated faucet that is programmed to turn on when the computer vision recognizes human hands in front of the faucet. If it is trained on large adult human hands of a particular skin color,…

Contents