Predictive Model for Estimating Delay Times in Public Transportation Routes | SEO Best Practices

End-to-End Architecture for Predictive Model in Public Transportation

Question

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes.

Predictions are served directly to users in an app in real time.

Because different seasons and population increases impact the data relevance, you will retrain the model every month.

You want to follow Google-recommended best practices.

How should you configure the end-to-end architecture of the predictive model?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

A.

The problem at hand is to build a model that can estimate delay times for multiple transportation routes and serve predictions in real-time to users via a mobile app. As the data is subject to change with different seasons and population increases, the model needs to be retrained every month.

To follow Google-recommended best practices for configuring the end-to-end architecture of the predictive model, we need to consider the following aspects:

  1. Scalability: The architecture should be able to handle large amounts of data and user requests without compromising on performance.
  2. Flexibility: The architecture should allow for easy modifications and retraining of the model.
  3. Automation: The architecture should automate the process of retraining and deploying the model.

With these considerations in mind, let's evaluate each of the given options:

A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.

Kubeflow Pipelines is a platform for building, deploying, and managing end-to-end ML workflows. It allows users to create and schedule multi-step workflows for training, testing, and deployment. Using Kubeflow Pipelines, we can automate the entire process of retraining and deploying the model every month. This option is scalable and flexible, and it provides a high degree of automation.

B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.

BigQuery ML is a machine learning service that allows users to build and train ML models using SQL. It also has a scheduled query feature that can be used to trigger retraining of the model. However, this option may not be as flexible as Kubeflow Pipelines as it limits the choice of ML algorithms and data preprocessing techniques that can be used. Moreover, BigQuery ML may not be able to handle large amounts of data as efficiently as Kubeflow Pipelines.

C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.

Cloud Functions is a serverless platform that allows users to write and deploy code in response to events. Using Cloud Functions, we can write a script that launches a training and deploying job on AI Platform. The job can be triggered by Cloud Scheduler, which is a fully managed cron job service that allows users to schedule jobs in the cloud. This option is scalable and flexible, but it may require more effort to set up and maintain than Kubeflow Pipelines.

D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

Cloud Composer is a fully managed workflow orchestration service that allows users to author, schedule, and monitor workflows. Using Cloud Composer, we can programmatically schedule a Dataflow job that executes the workflow from training to deploying the model. This option is scalable and flexible, but it may also require more effort to set up and maintain than Kubeflow Pipelines.

Overall, option A, i.e., using Kubeflow Pipelines, seems to be the best choice as it provides scalability, flexibility, and automation, while also being relatively easy to set up and maintain. However, the choice ultimately depends on the specific requirements and constraints of the project.