Your neural network model is taking days to train. You want to increase the training speed. What can you do?
Your company is using WILDCARD tables to query data across multiple tables with similar names. The SQL statement is
currently failing with the following error:
Which table name will make the SQL statement work correctly?
You work for a shipping company that has distribution centers where packages move on delivery lines to route them
properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in
transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real
time while the packages are in transit. Which solution should you choose?
You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a
separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard
functions to generate daily and monthly reports for all time ranges. Recently, you discovered that some queries that cover
long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?
Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud.
Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about
a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?
You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery.
The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to
BigQuery for analysis. Which job type and transforms should this pipeline use?
You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work
in progress on your clusters. What should you do?
You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting
messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes
batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage
and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?
You want to rebuild your batch pipeline for structured data on Google Cloud. You are using PySpark to conduct data
transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run
time, you want to use a serverless tool and SOL syntax. You have already moved your raw data into Cloud Storage. How
should you build the pipeline on Google Cloud while meeting speed and processing requirements?
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
You have the following requirements:
You will batch-load the posts once per day and run them through the Cloud Natural Language API.
You will extract topics and sentiment from the posts.
You must store the raw posts for archiving and reprocessing.
You will create dashboards to be shared with people both inside and outside your organization.
You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for
historical archiving. What should you do?