A machine learning (ML) specialist is using Amazon SageMaker hyperparameter optimization (HPO) to improve a models
accuracy. The learning rate parameter is specified in the following HPO configuration:
During the results analysis, the ML specialist determines that most of the training jobs had a learning rate between 0.01 and
0.1. The best result had a learning rate of less than 0.01. Training jobs need to run regularly over a changing dataset. The
ML specialist needs to find a tuning mechanism that uses different learning rates more evenly from the provided range
between MinValue and MaxValue.
Which solution provides the MOST accurate result?
A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on social media to acquire
new customers. Currently, the company has the following data in Amazon Aurora:
Profiles for all past and existing customers
Profiles for all past and existing insured pets
Premiums received Claims paid
What steps should be taken to implement a machine learning model to identify potential new customers on social media?
A retail company intends to use machine learning to categorize new products. A labeled dataset of current products was
provided to the Data Science team. The dataset includes 1,200 products. The labeled dataset has 15 features for each
product such as title dimensions, weight, and price. Each product is labeled as belonging to one of six categories such as
books, games, electronics, and movies.
Which model should be used for categorizing new products using the provided dataset for training?
A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled
data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.
The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained
model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist needs
to reduce the number of false negatives.
Which combination of steps should the Data Scientist take to reduce the number of false negative predictions by the model?
A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the
customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the
data engineer notices that many of the 100 features are highly correlated with each other.
Which steps should the data engineer take to address this issue? (Choose two.)
Reference: https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202 https://scikit-
A machine learning (ML) specialist is administering a production Amazon SageMaker endpoint with model monitoring
configured. Amazon SageMaker Model Monitor detects violations on the SageMaker endpoint, so the ML specialist retrains
the model with the latest dataset. This dataset is statistically representative of the current production traffic. The ML
specialist notices that even after deploying the new SageMaker model and running the first monitoring job, the SageMaker
endpoint still has violations.
What should the ML specialist do to resolve the violations?
An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist
believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?
A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to
use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the companys data
currently resides on premises and is 40 in size.
The company wants a solution that can transfer and automatically update data between the on-premises object storage and
Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.
Which solution meets these requirements?
Configure DataSync to make an initial copy of your entire dataset, and schedule subsequent incremental transfers of
changing data until the final cut-over from on-premises to AWS. Reference: https://aws.amazon.com/datasync/faqs/
A manufacturing company asks its machine learning specialist to develop a model that classifies defective parts into one of
eight defect types. The company has provided roughly 100,000 images per defect type for training. During the initial training
of the image classification model, the specialist notices that the validation accuracy is 80%, while the training accuracy is
90%. It is known that human-level performance for this type of image classification is around 90%.
What should the specialist consider to fix this issue?
The displayed graph is from a forecasting model for testing a time series.
Considering the graph only, which conclusion should a Machine Learning Specialist make about the behavior of the model?