TerramEarth manufactures heavy equipment for the mining and agricultural industries.
About 80% of their business is from mining and 20% from agriculture.
They currently have over 500 dealers and service centers in 100 countries.
Their mission is to build products that make their customers more productive.
Solution Concept - There are 20 million TerramEarth vehicles in operation that collect 120 fields of data per second.
Data is stored locally on the vehicle and can be accessed for analysis when a vehicle is serviced.
The data is downloaded via a maintenance port.
This same port can be used to adjust operational parameters, allowing the vehicles to be upgraded in the field with new computing modules.
Approximately 200,000 vehicles are connected to a cellular network, allowing TerramEarth to collect data directly.
At a rate of 120 fields of data per second, with 22 hours of operation per day, TerramEarth collects a total of about 9 TB/day from these connected vehicles.
Existing Technical Environment - TerramEarth's existing architecture is composed of Linux and Windows-based systems that reside in a single U.S, west coast based data center.
These systems gzip CSV files from the field and upload via FTP, and place the data in their data warehouse.
Because this process takes time, aggregated reports are based on data that is 3 weeks old.
With this data, TerramEarth has been able to preemptively stock replacement parts and reduce unplanned downtime of their vehicles by 60%
However, because the data is stale, some customers are without their vehicles for up to 4 weeks while they wait for replacement parts.
Business Requirements -Decrease unplanned vehicle downtime to less than 1 weekSupport the dealer network with more data on how their customers use their equipment to better position new products and servicesHave the ability to partner with different companies " especially with seed and fertilizer suppliers in the fast-growing agricultural business " to create compelling joint offerings for their customers Technical Requirements -Expand beyond a single datacenter to decrease latency to the American midwest and east coastCreate a backup strategyIncrease security of data transfer from equipment to the datacenterImprove data in the data warehouseUse customer and equipment data to anticipate customer needs Application 1: Data ingest - A custom Python application reads uploaded datafiles from a single server, writes to the data warehouse.
Compute:Windows Server 2008 R2 - 16 CPUs - 128 GB of RAM - 10 TB local HDD storage Application 2: Reporting - An off the shelf application that business analysts use to run a daily report to see what equipment needs repair.
Only 2 analysts of a team of 10 (5 west coast, 5 east coast) can connect to the reporting application at a time.
Compute:Off the shelf application.
License tied to number of physical CPUs - Windows Server 2008 R2 - 16 CPUs - 32 GB of RAM - 500 GB HDD Data warehouse:A single PostgreSQL server - RedHat Linux - 64 CPUs - 128 GB of RAM - 4x 6TB HDD in RAID 0 Executive Statement - What should you do?
Click on the arrows to vote for the correct answer
A. B. C. D.D.
The business requirements state that TerramEarth needs to decrease unplanned vehicle downtime to less than 1 week, support their dealer network with more data on how their customers use their equipment to better position new products and services, and have the ability to partner with different companies to create compelling joint offerings for their customers. The technical requirements state that they need to expand beyond a single data center to decrease latency to the American midwest and east coast, create a backup strategy, increase security of data transfer from equipment to the data center, and improve data in the data warehouse. The executive statement does not give any specific requirements or constraints, but it is safe to assume that TerramEarth is looking for a solution that is cost-effective, scalable, and reliable.
Option A suggests setting up a streaming Cloud Dataflow job to receive data from the ingestion process, and then cleaning the data in a Cloud Dataflow pipeline. Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real-time) and batch (historical) modes. This option seems appropriate for TerramEarth's use case as they are dealing with a large volume of data from connected vehicles, and need to process the data in near real-time to reduce downtime. However, this option does not address the other technical requirements such as expanding beyond a single data center and increasing security of data transfer.
Option B suggests creating a Cloud Function that reads data from BigQuery and cleans it, and triggering the Cloud Function from a Compute Engine instance. Cloud Functions is a serverless platform for running event-driven functions, and BigQuery is a fully-managed data warehouse for storing and querying large datasets. This option seems appropriate for TerramEarth's use case as it allows them to process and clean data in an efficient and scalable manner. However, this option does not address the other technical requirements such as expanding beyond a single data center and increasing security of data transfer.
Option C suggests creating a SQL statement on the data in BigQuery, saving it as a view, and running the view daily to save the result to a new table. This option seems appropriate for TerramEarth's use case as it allows them to process and clean data in a simple and efficient manner. However, this option does not address the business requirements of supporting the dealer network with more data on how their customers use their equipment, and does not address the other technical requirements such as expanding beyond a single data center and increasing security of data transfer.
Option D suggests using Cloud Dataprep to configure the BigQuery tables as the source, and scheduling a daily job to clean the data. Cloud Dataprep is a serverless data preparation service that allows users to visually explore, clean, and transform data in BigQuery. This option seems appropriate for TerramEarth's use case as it allows them to process and clean data in a simple and efficient manner. Additionally, Cloud Dataprep can be integrated with other GCP services such as Cloud Composer and Cloud Functions to automate the data preparation process. This option addresses the technical requirements of expanding beyond a single data center and increasing security of data transfer, as data is processed in the cloud instead of being transferred to a single data center. However, it does not address the business requirements of supporting the dealer network with more data on how their customers use their equipment.
Based on the given information, option D seems to be the most appropriate solution as it addresses most of the technical requirements, and provides an efficient and scalable way to clean the data. However, TerramEarth may need to consider additional solutions to address the business requirements of supporting the dealer network with more data on how their customers use their equipment.