Create the real-time Google Dataflow stream processing job
Process stream data using Cloud Dataflow
Prepare your data in BigQuery
Visualize Real Time Geospatial Data with Google Data Studio
This lab demonstrates how to use Google Dataflow to process real-time streaming data from a real-time real world historical data set, storing the results in BigQuery and then using Google Data Studio to visualize real-time geospatial data.
Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes via Java and Python APIs with the Apache Beam SDK. Cloud Dataflow provides a serverless architecture that can be used to shard and process very large batch data sets, or high volume live streams of data, in parallel.
BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage.
The data set that is used provides historic information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and will be used in all of the other labs in the Data Science on Google Cloud Quest.
Join Qwiklabs to read the rest of this lab...and more!
- Get temporary access to the Google Cloud Console.
- Over 200 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.