Troubleshooting and Solving Data Join Pitfalls
BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator. BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.
Joining data tables can provide meaningful insight into your dataset. However when you join your data, there are common pitfalls that could corrupt your results. This lab focuses on avoiding those pitfalls. Types of joins:
- Cross join: combines each row of the first dataset with each row of the second dataset, where every combination is represented in the output.
- Inner join: requires that key values exist in both tables for the records to appear in the results table. Records appear in the merge only if there are matches in both tables for the key values.
- Left join: Each row in the left table appears in the results, regardless of whether there are matches in the right table.
- Right join: the reverse of a left join. Each row in the right table appears in the results, regardless of whether there are matches in the left table.
For more information about joins, see Join Page.
The dataset you'll use is an ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery. You have a copy of that dataset for this lab and will explore the available fields and row for insights.
For syntax information to help you follow and update the queries, see Standard SQL Query Syntax.
What you'll do
In this lab, you perform these tasks:
Use BigQuery to explore a dataset
Troubleshoot duplicate rows in a dataset
Create joins between data tables
Understand each join type
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Google Cloud Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.
Create a new dataset
Identify a key field in your ecommerce dataset
Pitfall: non-unique key
Join pitfall solution