Marqueguard Analytics: Text Search and Analytics Solution | AWS Certified Big Data Specialty Exam

Marqueguard Analytics: Text Search and Analytics Solution

Question

Marqueguard is a social media monitoring company headquartered in Brighton, England.

Marqueguard sells three different products: Analytics, Audiences, and Insights.

Marqueguard Analytics is a "self-serve application" or software as a service, which archives social media data in order to provide companies with information and the means to track specific segments to analyze their brands' online presence. The tool's coverage includes blogs, news sites, forums, videos, reviews, images and social networks such as Twitter and Facebook.

Users can search data by using Text and Image Search, and use charting, categorization, sentiment analysis and other features to provide further information and analysis.

Marqueguard has access to over 80 million sources. Marqueguard is looking for a managed service that makes it easy to deploy, operate, and scale “Search Service” clusters in the AWS Cloud which supports open-source text search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis and also integrate with web applications seamlessly. Which service would provide you the facility to perform text search and analytics? Select 1 option.

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect -Amazon EMR does not provide search as a managed service.

Besides, Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads.

Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html.

Option B is correct -Amazon Elasticsearch Service (Amazon ES) is a managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters in the AWS Cloud.

Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis.

With Amazon ES, you get direct access to the Elasticsearch APIs; existing code and applications work seamlessly with the service.

https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/what-is-amazon-elasticsearch-service.html

Option C is incorrect -DynamoDB is a document management database.

Though DynamoDB provides search capabilities, it is a managed NoSQL database service.

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database, so that you don't have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.

Also, DynamoDB offers encryption at rest, which eliminates the operational burden and complexity involved in protecting sensitive data.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html

Option D is incorrect -Redshift offers a Data warehouse as a service.

The Amazon Redshift service manages all of the work of setting up, operating, and scaling a data warehouse.

These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

https://docs.aws.amazon.com/redshift/latest/mgmt/overview.html

The best option for Marqueguard's requirements would be Amazon Elasticsearch Cluster, option B.

Amazon Elasticsearch is an open-source search and analytics engine that makes it easy to search, analyze, and visualize large volumes of data in near real-time. It is a fully managed service that simplifies the deployment, operation, and scaling of Elasticsearch clusters in the AWS cloud. It is highly scalable and can handle petabyte-scale data and millions of queries per second.

Amazon Elasticsearch Cluster is ideal for use cases such as log analytics, real-time application monitoring, and clickstream analysis because it provides near real-time search and analytics capabilities. Elasticsearch is designed to handle structured and unstructured data, making it ideal for Marqueguard's social media data.

Amazon Elasticsearch Cluster also integrates seamlessly with web applications, enabling Marqueguard to easily incorporate search and analytics capabilities into their software as a service offering. Additionally, Amazon Elasticsearch Cluster provides a rich set of features for text search and analytics, including full-text search, faceted search, geospatial search, and more.

Amazon EMR clusters, option A, is a managed big data platform that enables users to process vast amounts of data using open-source tools such as Apache Hadoop, Spark, and Hive. While EMR can be used for text search and analytics, it may not be the best fit for Marqueguard's use case because it is designed for batch processing and may not provide the real-time search capabilities that Marqueguard requires.

Amazon DynamoDB, option C, is a fully managed NoSQL database service that is designed for fast and predictable performance at any scale. While DynamoDB can be used to store and retrieve text data, it does not provide search or analytics capabilities out of the box.

Amazon Redshift, option D, is a fully managed data warehouse service that is designed for petabyte-scale data. While Redshift can be used for analytics, it is not designed for text search and may not provide the real-time search capabilities that Marqueguard requires.

In conclusion, Amazon Elasticsearch Cluster is the best option for Marqueguard's requirements as it provides the facility to perform text search and analytics with real-time capabilities and seamless integration with web applications.