menu
arrow_back

Tracking Cryptocurrency Exchange Trades with Google Cloud Platform in Real-Time

Tracking Cryptocurrency Exchange Trades with Google Cloud Platform in Real-Time

Horas 9 Créditos

GSP603

Google Cloud Self-Paced Labs

Overview

Today’s financial world is complex, and the old technology used for constructing financial data pipelines isn’t keeping up. With multiple financial exchanges operating around the world and global user demand, these data pipelines have to be fast, reliable and scalable.

Currently, using an econometric approach—applying models to financial data to forecast future trends—doesn’t work for real-time financial predictions. And data that’s old, inaccurate or from a single source doesn’t translate into dependable data for financial institutions to use. But building pipelines with Google Cloud Platform (GCP) can solve some of these key challenges. In this post, we’ll describe how to build a pipeline to predict financial trends in microseconds. We’ll walk through how to set up and configure a pipeline for ingesting real-time, time-series data from various financial exchanges and how to design a suitable data model, which facilitates querying and graphing at scale.

You’ll find a tutorial below on setting up and deploying the proposed architecture using GCP, particularly these products:

The tutorial will explain how to establish a connection to multiple exchanges, subscribe to their trade feed, and extract and transform these trades into a flexible format to be stored in Cloud Bigtable and be available to be graphed and analyzed.

This will also set the foundation for ML online learning predictions at scale. You’ll see how to graph the trades, volume, and time delta from trade execution until it reaches our system (an indicator of how close to real time we can get the data). You can find more details on GitHub too.

image

Requirements / Solutions

image

Architectural overview

The typical requirement for trading systems is low latency data ingestion, and for this lab is extended with near real-time data storage and querying at scale. You will learn the following from this lab:

  1. Ingest real-time trading data with low latency from globally scattered datasources / exchanges. Possibility to adopt data ingest worker pipeline location. Easily add additional trading pairs / exchanges. Solution: Dataflow + Xchange Reactive Websockets Framework

  2. Demonstrate an unbounded streaming source code that is runnable with multiple runners. Solution: Apache BEAM

  3. Strong consistency + linear scalability + super low latency for querying the trading data. Solution: Bigtable

  4. Querying and visualization — Execute time series queries on Bigtable visualize it in on the webpage. Solution: Python Flask + Vis.js + Google BigTable Python Client

Architecture/How it works

The source code is written in Java 8, Python 2.7, JavaScript; and Maven, PIP for dependency/build management.

The code can be divided into five main framework units:

  1. Data ingestion — XChange Stream framework (Github link) Java library providing a simple and consistent streaming API for interacting with Bitcoin and other cryptocurrency exchanges via WebSocket protocol. XChange library is providing new interfaces for streaming API. User can subscribe for live updates via reactive streams of RxJava library. We use this JAVA 8 framework to connect and configure some exchanges (BitFinex, Poloniex, BitStamp, OkCoin, Gemini, HitBTC, Binance...). Link to the exchange / trading pair configuration code
  2. Parallel processing — Apache Beam (Github link) Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch, and stream (continuous) processing. Supported runners: Apache Apex, Apache Flink, Apache Gearpump, Apache Samza, Apache Spark, and Google Cloud Dataflow. You will learn how to create an unbounded streaming source/reader and manage basic watermarking, checkpointing, and record ID for data ingestion. Link to the bridge between BEAM and XChange Stream framework
  3. BigTable sink — Cloud Bigtable with Beam using the HBase API. (Github link) Connector and writer to Bigtable. You will see how to create a row key and create a Bigtable mutation function prior to writing to Bigtable. Link to the BigTable key creation / mutation function
  4. Realtime API endpoint — Flask web server at port:5000 + BigTable client (GitHub link) will be used to query the Bigtable and serve as API endpoint. Link to the BigTable query builder + results retrieval and sampling
  5. JavaScript Visualization — Vis.JS Flask template that will query the real-time API endpoint every 500ms. Link to the HTML template file

image

Flask web server will be run in the GCP VM instance

Pipeline definition

For every exchange + trading pair, a different pipeline instance is created. The pipeline consists of 3 steps:

  1. UnboundedStreamingSource that contains ‘Unbounded Streaming Source Reader' (bitStamp2)
  2. BigTable pre-writing mutation and key definition (ETH-USD Mut2)
  3. BigTable write step (ETH-USD2)

b75636e2b688dcfd.png

Bigtable row key design decisions

The DTO for this lab looks like this:

image

The row key structure is formulated in the following way:

TradingCurrency#Exchange#SystemTimestampEpoch#SystemNanosTime

E.g: a row key might look like BTC/USD#Bitfinex#1546547940918#63187358085

BTC/USD — Trading Pair

Bitfinex — Exchange

1546547940918 — Epoch timestamp ( more info)

63187358085 — System Nano time ( more info)

Why is nanotime added at the key end?

Nanotime is used to avoid multiple versions per row for different trades. Two DoFn mutations might execute in the same Epoch ms time if there is a streaming sequence of TradeLoad DTOs. NanoTime at the end will split Millisecond to an additional one million.

In your own environment, if this is not enough, you can hash the volume / price ratio and attach the hash at the end of the row key.

Row cells will contain an exact schema replica of the exchange TradeLoad DTO (see earlier in the table above). This choice will help you go from a specific (trading pair) — (exchange) to less specific (timestamp — nanotime) and avoid hotspots when you query the data.

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
Puntuación

—/100

Create a virtual machine to perform the creation of the pipeline and use as your website.

Ejecutar paso

/ 20

Install java, git, maven, pip, python 2.7, cloud bigtable command line

Ejecutar paso

/ 20

Create the bigtable instance.

Ejecutar paso

/ 20

Create a Google Cloud Storage bucket

Ejecutar paso

/ 10

Run the daraflow pipeline

Ejecutar paso

/ 20

Create a firewall rule to allow tcp:5000 for visualization.

Ejecutar paso

/ 10