amazon AWS Certified Data Analytics - Specialty online exam

What students need to know about the aws-certified-data-analytics-specialty-das-c01 exam

  • Total 164 Questions & Answers

Question 1

A manufacturing company wants to create an operational analytics dashboard to visualize metrics from equipment in near-
real time. The company uses Amazon Kinesis Data Streams to stream the data to other applications. The dashboard must
automatically refresh every 5 seconds. A data analytics specialist must design a solution that requires the least possible
implementation effort.
Which solution meets these requirements?

  • A. Use Amazon Kinesis Data Firehose to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
  • B. Use Apache Spark Streaming on Amazon EMR to read the data in near-real time. Develop a custom application for the dashboard by using D3.js.
  • C. Use Amazon Kinesis Data Firehose to push the data into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Visualize the data by using an OpenSearch Dashboards (Kibana).
  • D. Use AWS Glue streaming ETL to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
Answer:

B

Explanation:
Reference: https://aws.amazon.com/blogs/big-data/analyze-a-time-series-in-real-time-with-aws-lambda-amazon-kinesis-and-
amazon-dynamodb-streams/

Discussions

Question 2

An IoT company wants to release a new device that will collect data to track sleep overnight on an intelligent mattress.
Sensors will send data that will be uploaded to an Amazon S3 bucket. About 2 MB of data is generated each night for each
bed. Data must be processed and summarized for each user, and the results need to be available as soon as possible. Part
of the process consists of time windowing and other functions. Based on tests with a Python script, every run will require
about 1 GB of memory and will complete within a couple of minutes.
Which solution will run the script in the MOST cost-effective way?

  • A. AWS Lambda with a Python script
  • B. AWS Glue with a Scala job
  • C. Amazon EMR with an Apache Spark script
  • D. AWS Glue with a PySpark job
Answer:

A

Discussions

Question 3

A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in
Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also
helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team's requirements?

  • A. Geospatial chart
  • B. Line chart
  • C. Heat map
  • D. Tree map
Answer:

A

Explanation:
Reference: https://docs.aws.amazon.com/quicksight/latest/user/geospatial-charts.html

Discussions

Question 4

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data
Streams and is writing the data to Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the
S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and
transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently
showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)

  • A. Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ETL jobs against this AWS Glue table.
  • B. Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
  • C. Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
  • D. Use the groupFiles setting in the AWS Glue ETL job to merge small S3 files and rerun AWS Glue ETL jobs.
  • E. Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Answer:

A D

Explanation:
Reference: https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html
https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html

Discussions

Question 5

A company using Amazon QuickSight Enterprise edition has thousands of dashboards, analyses, and datasets. The
company struggles to manage and assign permissions for granting users access to various items within QuickSight. The
company wants to make it easier to implement sharing and permissions management.
Which solution should the company implement to simplify permissions management?

  • A. Use QuickSight folders to organize dashboards, analyses, and datasets. Assign individual users permissions to these folders.
  • B. Use QuickSight folders to organize dashboards, analyses, and datasets. Assign group permissions by using these folders.
  • C. Use AWS IAM resource-based policies to assign group permissions to QuickSight items.
  • D. Use QuickSight user management APIs to provision group permissions based on dashboard naming conventions.
Answer:

B

Explanation:
Reference: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/quicksight/update-folder-permissions.html

Discussions

Question 6

A manufacturing company uses Amazon Connect to manage its contact center and Salesforce to manage its customer
relationship management (CRM) data. The data engineering team must build a pipeline to ingest data from the contact
center and CRM system into a data lake that is built on Amazon S3.
What is the MOST efficient way to collect data in the data lake with the LEAST operational overhead?

  • A. Use Amazon Kinesis Data Streams to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
  • B. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon Kinesis Data Streams to ingest Salesforce data.
  • C. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
  • D. Use Amazon AppFlow to ingest Amazon Connect data and Amazon Kinesis Data Firehose to ingest Salesforce data.
Answer:

B

Explanation:
Reference: https://aws.amazon.com/kinesis/data-firehose/?kinesis-blogs.sort-by=item.additionalFields.createdDate&kinesis-
blogs.sort-order=desc

Discussions

Question 7

A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few
hours and read-only queries are run throughout the day and evening. There is a
particularly heavy load with no writes for several hours each morning on business days. During those hours, some queries
are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime.
What is the MOST cost-effective solution?

  • A. Enable concurrency scaling in the workload management (WLM) queue.
  • B. Add more nodes using the AWS Management Console during peak hours. Set the distribution style to ALL.
  • C. Use elastic resize to quickly add nodes during peak times. Remove the nodes when they are not needed.
  • D. Use a snapshot, restore, and resize operation. Switch to the new target cluster.
Answer:

A

Discussions

Question 8

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as
.csv files that are stored in Amazon S3. The companys analysts are using Amazon Athena to perform SQL queries against a
recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time,
and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)

  • A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
  • B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
  • C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
  • D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
  • E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
Answer:

B C

Explanation:
Reference: https://www.upsolver.com/blog/apache-parquet-why-use https://aws.amazon.com/blogs/big-data/work-with-
partitioned-data-in-aws-glue/

Discussions

Question 9

A company has a marketing department and a finance department. The departments are storing data in Amazon S3 in their
own AWS accounts in AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data.
The departments have some databases and tables that share common names.
The marketing department needs to securely access some tables from the finance department.
Which two steps are required for this process? (Choose two.)

  • A. The finance department grants Lake Formation permissions for the tables to the external account for the marketing department.
  • B. The finance department creates cross-account IAM permissions to the table for the marketing department role.
  • C. The marketing department creates an IAM role that has permissions to the Lake Formation tables.
Answer:

A B

Explanation:
Granting Lake Formation Permissions
Creating an IAM role (AWS CLI)
Reference: https://docs.aws.amazon.com/lake-formation/latest/dg/lake-formation-permissions.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html

Discussions

Question 10

A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from
AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis
with visualizations of logs in a way that requires minimal development effort.
Which solution meets these requirements?

  • A. Use an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad- hoc analyses and use Amazon QuickSight to develop data visualizations.
  • B. Create a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon OpenSearch Service (Amazon Elasticsearch Service). Use Amazon ES to perform text-based searches of the logs for ad-hoc analyses and use OpenSearch Dashboards (Kibana) for data visualizations.
  • C. Create an AWS Lambda function to convert the logs into .csv format. Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL queries and use Amazon QuickSight to develop data visualizations.
  • D. Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.
Answer:

D

Discussions
To page 2