You work as a machine learning specialist for a marketing consulting firm.
Your firm has an online retailer as a client that wants to apply different marketing strategies per segment of their customer base.
They have decided that the best way to segment their customers is by their purchase history.
You have all of the online retailer purchase histories from the last 5 years that you can use for your machine learning model. Which type of machine learning algorithm would give you segmentation based on purchase history in the most expeditious manner?
Click on the arrows to vote for the correct answer
A. B. C. D.Answer: B.
Option A is incorrect.
The k-nearest neighbor algorithm is used to find items that are similar to each other.
This may find purchases that are similar to each other, but not customers that have similar purchase history.
You would have to do additional modeling to use this algorithm.
Option B is correct.
The K-Means algorithm is used to find groups within data where the group members are similar to each other but different from members of other groups.
This is exactly what you are trying to solve: find groups of customers with similar purchase history.
Option C is incorrect.
The semantic segmentation algorithm is used to develop computer vision applications.
You are trying to solve a clustering problem.
So this algorithm would not work for this problem.
Option D is incorrect.
The Neural Topic Model algorithm is used to group documents into topics using the statistical distribution of words within the documents.
You are trying to solve a clustering problem.
So this algorithm would not work for this problem.
Reference:
Please see the Amazon SageMaker developer guide titled Using Amazon SageMaker Built-in Algorithms, and the article titled The 5 Clustering Algorithms Data Scientists Need to Know.
The most appropriate algorithm to segment customers based on purchase history in the most expeditious manner would be K-Means clustering.
K-Means clustering is a unsupervised learning algorithm, which is particularly useful for segmentation and clustering of data. In this algorithm, the data points are grouped into K clusters based on their similarity. The similarity is measured based on the distance between the data points.
In this case, the online retailer's purchase history can be used as input data. Each customer's purchase history can be represented as a feature vector, where each feature represents the purchase of a particular item. K-Means algorithm can then be applied on this input data to cluster similar customers together.
K-Means algorithm is computationally efficient and can handle large datasets. Additionally, it can be used with any distance metric, which makes it suitable for data with a high-dimensional feature space.
The other options, K-Nearest Neighbors (KNN), Semantic Segmentation, and Neural Topic Model (NTM) are not suitable for this task.
KNN is a supervised learning algorithm that requires labeled data to predict the class of a new data point based on the similarity with the closest labeled data points. This algorithm is not suitable for segmentation or clustering tasks.
Semantic Segmentation is a computer vision algorithm that is used to segment images into different regions based on semantic meaning. It is not suitable for this task as it is designed for image processing tasks and not customer segmentation.
NTM is a topic modeling algorithm used to extract topics from text data. While purchase history data may contain textual information, it is not the most appropriate algorithm for this task.