From the course: MLOps Essentials: Model Development and Integration

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Automated data validation

Automated data validation

- [Instructor] Automated data validation should be a key feature of any data pipeline. Typically, the data-processing logic is decided based on the initial set of data used by data scientists. Those assumptions would carry over to the first model that is built by data scientists and deployed in production. After the model is deployed, new data is continuously acquired and processed by automated data pipelines. In some cases, AutoML is also used to create new models. It then becomes imperative to perform extensive validation on new data to ensure that they do not deviate from the initial training-set patterns. What should be validated in a data pipeline? To begin, basic feature validation should be done, including missing data or erroneous data. Data formats and ranges should also be checked. Even with data produced by machines like telemetry, errors can happen if there are issues with the source. Next comes data distribution…

Contents