AWS Certified Machine Learning - Specialty | Data Types for willRespondToCampaign Attribute | SEO Optimization

WillRespondToCampaign Data Types | SEO | AWS ML Specialty Exam

Question

You are a machine learning expert working for a marketing firm.

You are supporting a team of data scientists and marketing managers who are running a marketing campaign.

Your data scientists and marketing managers need to answer the question, “Will this user subscribe to my campaign?” You have been given a dataset in the form of a CSV file which is formatted as such: UserId, jobId, jobDescription, educationLevel, campaign, duration, willRespondToCampaign When you perform feature engineering on this dataset, which of the following data types would you use to define the willRespondToCampaign attribute?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect because you choose the CATEGORICAL data type for an attribute that holds a limited set of unique strings.

For example, a user name, the region, and a product code are categorical values.

The willRespondToCampaign attribute takes on either ‘yes' or ‘no' values, which are binary in nature.

Option B is incorrect because for each user observation you are trying to discern “Will this user subscribe to my campaign?” You are solving for a "yes" or "no" answer, which is binary data type, not a text data type.

Option C is correct because you choose the BINARY data type for an attribute that only has two possible values, such as yes or no, or true or false.

The attribute willRespondToCampaign has only two possible answers: yes or no.

Option D is incorrect because the willRespondToCampaign feature holds a "yes" or "no" value, you should define it as a binary data type, not a numeric data type.

Reference:

Please see the Machine Learning Mastery article titled Discover Feature Engineering, How to Engineer Features and How to Get Good at It.

https://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/

for a complete description of the schema attributes.

In this scenario, the target variable or dependent variable is "willRespondToCampaign," which is a variable that we want to predict. Therefore, the data type that we would use to define this attribute would depend on the nature of the data contained in it.

Based on the given options, we have the following choices:

A. Categorical: A categorical variable is a variable that can take on a limited number of values or categories. If "willRespondToCampaign" has limited response options, such as "yes," "no," or "maybe," it could be defined as a categorical variable.

B. Text: Text data refers to unstructured data that consists of sentences, paragraphs, or larger bodies of text. This data type would be appropriate if the "willRespondToCampaign" attribute contains a free text response that is not predefined.

C. Binary: A binary variable is a variable that can only take on two possible values, usually represented as 0 or 1. If "willRespondToCampaign" has a "yes/no" or "true/false" response option, it could be defined as a binary variable.

D. Numeric: Numeric data refers to data that can be represented as numbers, including integers and real numbers. This data type would be appropriate if the "willRespondToCampaign" attribute contains numerical values, such as scores or ratings.

Therefore, based on the given options, the appropriate data type to define the "willRespondToCampaign" attribute would be either categorical or binary, depending on the nature of the response options. If the response options are limited and predefined, then the categorical data type would be more appropriate. If the response options are only "yes" or "no," then the binary data type would be more appropriate.