Detecting and Quantifying ML Model Degradation in Production

Detecting and Quantifying ML Model Degradation

Question

You have a forecasting ML model in production.

After several months of live operation, you notice that the model has been degrading in terms of predictive accuracy.

You have to introduce tools to detect and quantify the problem.

What is the best way to do that?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect because so that you can use Azure's data drift monitoring capabilities, registered datasets must be used, i.e.

the direct load functionality of the ImportData module doesn't fit for the solution.

Option B is incorrect because DataDriftDetector needs two registered datasets which it can use to compare from time to time, in order to determine any significant change in the profile of the data.

Option C is CORRECT because in order to provide the early detection of drifting data, you need two registered datasets, a baseline and a target which can then be regularly compared and evaluated by using the Azure ML DataDriftDetector class.

Option D is incorrect because you don't need to write your own script.

Use Azure ML DataDriftMonitor instead, which is specifically designed for this purpose.

Reference:

The best way to detect and quantify the problem of a degrading forecasting ML model in production is to use data drift detection techniques. Data drift refers to the change in the underlying distribution of the data over time, which can cause the model's performance to degrade.

Option A, which suggests collecting new data and adding a new ImportData step to the pipeline, does not address the problem of detecting data drift. It only provides the model with more data to work with, which may or may not improve its performance.

Option B is a better choice because it involves collecting new data and adding it to the training dataset to retrain the model. This step is important because it allows the model to learn from the new data and adjust its parameters accordingly. Additionally, configuring a DataDriftDetector will enable you to monitor changes in the data distribution and alert you to any potential problems with the model's performance. By comparing the performance of the original model with the retrained model, you can assess whether the new data has affected the model's accuracy and adjust accordingly.

Option C is similar to Option B, but instead of adding the new data to the training dataset, it suggests using the training dataset as a baseline and registering the new data as a target dataset. This approach can also help identify changes in the data distribution, but it may not be as effective at improving the model's performance as retraining the model with the new data.

Option D involves collecting new data as a new version of the training dataset, profiling and comparing them using a Python script, retraining the model if necessary, and publishing the new model. While this approach can be effective at detecting and addressing data drift, it is more complex and time-consuming than Options B and C.

In summary, the best approach to detect and quantify the problem of a degrading forecasting ML model in production is to use data drift detection techniques, such as configuring a DataDriftDetector, and retraining the model with the new data. Option B is the best choice because it addresses these key steps in a simple and effective way.