TerramEarth: Cloud Architecture for Data Management and Analysis

Cloud Architecture for Data Management and Analysis

Question

TerramEarth manufactures heavy equipment for the mining and agricultural industries.

About 80% of their business is from mining and 20% from agriculture.

They currently have over 500 dealers and service centers in 100 countries.

Their mission is to build products that make their customers more productive.

Solution Concept - There are 20 million TerramEarth vehicles in operation that collect 120 fields of data per second.

Data is stored locally on the vehicle and can be accessed for analysis when a vehicle is serviced.

The data is downloaded via a maintenance port.

This same port can be used to adjust operational parameters, allowing the vehicles to be upgraded in the field with new computing modules.

Approximately 200,000 vehicles are connected to a cellular network, allowing TerramEarth to collect data directly.

At a rate of 120 fields of data per second, with 22 hours of operation per day, TerramEarth collects a total of about 9 TB/day from these connected vehicles.

Existing Technical Environment - TerramEarth's existing architecture is composed of Linux and Windows-based systems that reside in a single U.S, west coast based data center.

These systems gzip CSV files from the field and upload via FTP, and place the data in their data warehouse.

Because this process takes time, aggregated reports are based on data that is 3 weeks old.

With this data, TerramEarth has been able to preemptively stock replacement parts and reduce unplanned downtime of their vehicles by 60%

However, because the data is stale, some customers are without their vehicles for up to 4 weeks while they wait for replacement parts.

Business Requirements -Decrease unplanned vehicle downtime to less than 1 weekSupport the dealer network with more data on how their customers use their equipment to better position new products and servicesHave the ability to partner with different companies " especially with seed and fertilizer suppliers in the fast-growing agricultural business " to create compelling joint offerings for their customers Technical Requirements -Expand beyond a single datacenter to decrease latency to the American midwest and east coastCreate a backup strategyIncrease security of data transfer from equipment to the datacenterImprove data in the data warehouseUse customer and equipment data to anticipate customer needs Application 1: Data ingest - A custom Python application reads uploaded datafiles from a single server, writes to the data warehouse.

Compute:Windows Server 2008 R2 - 16 CPUs - 128 GB of RAM - 10 TB local HDD storage Application 2: Reporting - An off the shelf application that business analysts use to run a daily report to see what equipment needs repair.

Only 2 analysts of a team of 10 (5 west coast, 5 east coast) can connect to the reporting application at a time.

Compute:Off the shelf application.

License tied to number of physical CPUs - Windows Server 2008 R2 - 16 CPUs - 32 GB of RAM - 500 GB HDD Data warehouse:A single PostgreSQL server - RedHat Linux - 64 CPUs - 128 GB of RAM - 4x 6TB HDD in RAID 0 Executive Statement - European customers after a period of 36 months when it contains personal data.

In the new architecture, this data will be stored in both Cloud Storage and BigQuery.

What should you do?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

C.

The question describes a scenario where a company, TerramEarth, manufactures heavy equipment and collects a large amount of data from their vehicles. The company's current architecture involves a single data center in the U.S. that aggregates data from the field, which takes up to 3 weeks. The company wants to decrease unplanned downtime of their vehicles to less than 1 week and support their dealer network with more data to better position new products and services. They also want to improve data security, expand beyond a single data center to decrease latency to the American midwest and east coast, and create a backup strategy.

In addition to these requirements, the company has to comply with European data privacy laws, which require that personal data cannot be stored for more than 36 months. Therefore, the European data needs to be stored in both Cloud Storage and BigQuery, with a retention period of 36 months.

The solution should involve creating a BigQuery table for the European data and setting the retention period to 36 months. For Cloud Storage, the solution should involve enabling lifecycle management using a DELETE action with an Age condition of 36 months or setting the storage class to NONE with an Age condition of 36 months.

Option A is the correct answer. It suggests creating a BigQuery table for the European data and setting the retention period to 36 months. For Cloud Storage, it suggests enabling lifecycle management using a DELETE action with an Age condition of 36 months. This means that the data will be automatically deleted after 36 months of age, ensuring compliance with European data privacy laws. This solution also meets the requirement of creating a backup strategy by ensuring that the data is backed up in both BigQuery and Cloud Storage.

Option B suggests creating a BigQuery table for the European data and setting the retention period to 36 months. For Cloud Storage, it suggests setting the storage class to NONE with an Age condition of 36 months. This solution does not meet the requirement of creating a backup strategy, as the data will only be stored in Cloud Storage and not in BigQuery. Additionally, changing the storage class to NONE does not delete the data, but rather changes its storage class to a cheaper option, which is not suitable for compliance with European data privacy laws.

Option C suggests creating a time-partitioned BigQuery table for the European data and setting the partition expiration period to 36 months. For Cloud Storage, it suggests enabling lifecycle management using a DELETE action with an Age condition of 36 months. This solution meets the requirement of compliance with European data privacy laws and creating a backup strategy, but it does not address the requirement of expanding beyond a single data center to decrease latency to the American midwest and east coast.

Option D suggests creating a time-partitioned BigQuery table for the European data and setting the partition expiration period to 36 months. For Cloud Storage, it suggests setting the storage class to NONE with an Age condition of 36 months. This solution does not meet the requirement of expanding beyond a single data center to decrease latency to the American midwest and east coast, and changing the storage class to NONE does not delete the data, but rather changes its storage class to a cheaper option, which is not suitable for compliance with European data privacy laws.

In summary, option A is the correct answer as it meets all the requirements of the company, including compliance with European data privacy laws, creating a backup strategy, and expanding beyond a single data center to decrease latency to the American midwest and east coast.