Distribution of Consultants and Billing Hours | Data Analysis for Machine Learning | AWS Certified Exam

Distribution of Consultants and Billing Hours

Question

You work as a machine learning specialist for a consulting firm where you analyze data about the consultants who work there in preparation for using the data in your machine learning models.

The features you have in your data are things like employee id, specialty, practice, job description, billing hours, and principle.

The principle attribute is represented as ‘yes' or ‘no', whether the consultant has made principle level or not.

For your initial analysis, you need to identify the distribution of consultants and their billing hours for the given period.

What visualization best describes this relationship?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E.

Answer: B.

Options A is incorrect.

You are looking for distribution on a single dimension: the consultants billing hours.

From the Amazon QuickSite User Guide titled Working with Visual Types in Amazon QuickSight, “A scatter chart shows multiple distributions, i.e., two or three measures for a dimension.”

Option B is correct.

You are looking for a distribution of a single dimension: the consultants billing hours.

From the Wikipedia article titled Histogram, “A histogram is an accurate representation of the distribution of numerical data.

It is an estimate of the probability distribution of a continuous variable.” The continuous variable in this question: the billing hours, binned into ranges (x-axis), at a frequency: the number of consultants at a billing hour range (y-axis).

Option C is incorrect.

From the Amazon QuickSite User Guide titled Working with Visual Types in Amazon QuickSight, “Use line charts to compare changes in measured values over a period of time.” You are looking for distribution, not a comparison of changes over a period of time.

Option D is incorrect.

From the Statistics How To article titled Types of Graphs Used in Math and Statistics, “A boxplot, also called a box and whisker plot, is a way to show the spread and centers of a data set.

Measures of spread include the interquartile range and the mean of the data set.

Measures of the center include the mean or average and median (the middle of a data set).” A Box Plot shows the distribution of multiple dimensions of the data.

Once again, you are looking for a distribution of a single dimension, not a distribution on multiple dimensions.

Option E is incorrect.From the Wikipedia article titled Bubble Chart, “A bubble chart is a type of chart that displays three dimensions of data.

Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size.” Once again, you are looking for a distribution of a single dimension, not a distribution on three dimensions.

Reference:

Please see the Amazon QuickSight user guide titled Working with Amazon QuickSight Visuals and the Statistics How To article titled Types of Graphs Used in Math and Statistics.

The best visualization for identifying the distribution of consultants and their billing hours for the given period is a Histogram.

A histogram is a graphical representation of a frequency distribution. It consists of a set of rectangles, where the area of each rectangle corresponds to the frequency or proportion of observations in a particular interval. The horizontal axis of a histogram represents the range of values for the variable being plotted, and the vertical axis represents the frequency or count of observations falling within each range.

In this case, the histogram can be plotted with the billing hours on the horizontal axis and the frequency or count of consultants on the vertical axis. The histogram will provide a visual representation of the distribution of billing hours among the consultants, highlighting patterns such as the most common billing hours, the range of billing hours, and any outliers in the data.

A scatter plot is not suitable for this analysis, as it is used to show the relationship between two variables, and in this case, we are only interested in the distribution of one variable.

A line chart is not appropriate because it is used to show trends over time or a continuous variable, whereas the data in this case consists of discrete variables.

A box plot can be used to show the distribution of a variable, but it does not provide as much detail as a histogram.

A bubble chart is used to display three dimensions of data, which is not required for this analysis.

Therefore, the best visualization for identifying the distribution of consultants and their billing hours for the given period is a histogram.