From the course: MLOps Essentials: Model Development and Integration
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Automated data validation
From the course: MLOps Essentials: Model Development and Integration
Automated data validation
- [Instructor] Automated data validation should be a key feature of any data pipeline. Typically, the data-processing logic is decided based on the initial set of data used by data scientists. Those assumptions would carry over to the first model that is built by data scientists and deployed in production. After the model is deployed, new data is continuously acquired and processed by automated data pipelines. In some cases, AutoML is also used to create new models. It then becomes imperative to perform extensive validation on new data to ensure that they do not deviate from the initial training-set patterns. What should be validated in a data pipeline? To begin, basic feature validation should be done, including missing data or erroneous data. Data formats and ranges should also be checked. Even with data produced by machines like telemetry, errors can happen if there are issues with the source. Next comes data distribution…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.