LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: MLOps Essentials: Model Development and Integration

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Automated data validation

Automated data validation

From the course: MLOps Essentials: Model Development and Integration

Start my 1-month free trial Buy for my team

Automated data validation

“

- [Instructor] Automated data validation should be a key feature of any data pipeline. Typically, the data-processing logic is decided based on the initial set of data used by data scientists. Those assumptions would carry over to the first model that is built by data scientists and deployed in production. After the model is deployed, new data is continuously acquired and processed by automated data pipelines. In some cases, AutoML is also used to create new models. It then becomes imperative to perform extensive validation on new data to ensure that they do not deviate from the initial training-set patterns. What should be validated in a data pipeline? To begin, basic feature validation should be done, including missing data or erroneous data. Data formats and ranges should also be checked. Even with data produced by machines like telemetry, errors can happen if there are issues with the source. Next comes data distribution…

Contents