Want to create interactive content? It’s easy in Genially!

Get started free

KX Performance Dashboard

dbaker

Created on October 17, 2022

Start designing with a free template

Discover more than 1500 professional designs like these:

Puzzle Diagram

Gear Diagram

Square Timeline Diagram

Timeline Diagram

Timeline Diagram 3

Timeline Diagram 4

Timeline Diagram 2

Transcript

High-frequency data benchmarking

High-frequency data benchmarking

Imperial College London

The following benchmarks were independently undertaken by Imperial College London. They define a database scenario for storing and querying data from stock and cryptocurrency exchanges. Benchmarks are divided into the following categories:

  1. Trade data, including averages and volume-weighted average price (VWAP)
  2. Order book, including bid/ask, spread, and national best bid and offer (NBBO)
  3. Complex query, including compute volatility for execution and mid-quote returns
  4. Writing efficiency
  5. Storage efficiency

Databases compared

kdb+ is a commercial in-memory time series database developed to support fast ingestion and immediate query of data. InfluxDB is an open-source database optimized for time series data horizontally partitioned into shards. TigerData* is an open-source time series database extension of PostgreSQL. ClickHouse is a column-oriented open-source database system that supports an extended version of SQL for large-scale OLAP applications.

kdb+

*Previously known as TimeScaleDB

Imperial College London

High-frequency data benchmarking

Test criteria

Test hardware

  • Intel i7-1065G7 CPU 16GB RAM.
  • All databases save data to SSD.
  • Only one database tested at a time.
  • Average obtained from 10 executions.

Implementation

The dataset is imported into each database via CSV files. The database schema and data types are maintained consistently across all databases.

Imperial College London

High-frequency data benchmarking

Data ingestion throughput (MB/s)

Ingestion and storage

kdb+ on-disk

47.35 MB/s

Data ingestion and storage performance results are evaluated from the storage and write efficiency benchmarks. Due to the unfair advantage of kdb+ in memory performance, tests are performed on disk to remove bias. Conclusion kdb+ outperforms all other tested solutions for overall performance in storage and ingestion efficiency.

30.19 MB/s

TigerData

4.94 MB/s

InfluxDB

ClickHouse

2.1 MB/s

0 10 20 30 40 50 Ingestion throughput (MB)

Data write efficiency (ms)

33,889 ms

kdb+ on-disk

324,854 ms

TigerData

53,150 ms

InfluxDB

ClickHouse

765,000 ms

0 200 400 600 800 Write efficiency (ms)

Imperial College London

High-frequency data benchmarking

Read queries

Performance of read queries in miliseconds

The read query performance results are evaluated from the following benchmarks:-

  • Avg trading volumes (Day)
  • Avg trading volumes (Month)
  • Order book (Week)
Benchmarks are classified as high read with light computational intensity. Conclusion kdb+ demonstrates the best overall performance for disk and memory queries.

81 ms

kdb+ on-disk

740 ms

94 ms

93 ms

InfluxDB

273 ms

146 ms

469 ms

3700 ms

TigerData

3938 ms

272 ms

ClickHouse

1991 ms

962 ms

0 1000 2000 3000 4000 Read queries (ms)

Avg trading volumes (day) Avg trading volumes (week) Order books (week)

Imperial College London

High-frequency data benchmarking

Computationally intensive queries

Performance of computationally intensive queries in miliseconds

The computationally intensive query performance results are evaluated from the following benchmarks:-

  • Weighted avg price (TWAP)
  • Market depth avg (Day)
  • Market depth avg (Week)
Benchmarks are classified as high read with heavy computational intensity. Conclusion kdb demonstrates the best overall performance in computationally intensive queries.

75 ms

kdb+ on-disk

533 ms

4375 ms

12,716 ms

InfluxDB

373 ms

4420 ms

334 ms

TigerData

4699 ms

17,092 ms

202 ms

ClickHouse

1626 ms

10,180 ms

0 4000 8000 12000 16000 20000 Queries (ms)

Weighted avg price Market depth avg (day) Market depth avg (week)

Imperial College London

High-frequency data benchmarking

Complex queries

Performance of complex queries in miliseconds

The complex query performance results are evaluated from the following benchmarks:-

  • Mid quote returns
  • Execution volatility
Conclusion kdb demonstrates the best overall performance for complex queries.

64 ms

kdb+ on-mem

41 ms

113 ms

kdb+ on-disk

51 ms

99 ms

InfluxDB

2009 ms

1614 ms

TigerData

324 ms

401 ms

ClickHouse

190 ms

0 500 1000 1500 2000 2500 Queries (ms)

Mid quote returns Execution volatility

Imperial College London

High-frequency data benchmarking

"It can be concluded that kdb+ is the most suitable database for financial analysis applications of time series data. kdb+ can quickly ingest data in bulk, with a good compression ratio for storage efficiency. kdb+ has stable low query latency for all benchmarks, including read queries, computational intensive queries, and complex queries."

Ruijie Xiong - Department of computing

High-frequency data benchmarking

Basic group-by (1bn rows)

DBOps benchmarks

ClickHouse

350 ms

DBOps is a public benchmark that compares the performance of various open-source database tools and technologies. Tests include mathematical and statistical calculations, group-by operations, and joins across a range of datasets to demonstrate performance under various conditions. Conclusion

  • kdb+ came first in 17 out of 18 categories
  • kdb+ performed up to 30x faster than Pandas in all categories
  • kdb+ comes first in 66% of all queries

150 ms

data.table

180 ms

juliadf

Polars

120 ms

980 ms

pydatatable

Spark

700 ms

80 ms

kdb+

0 200 400 600 800 1000 Execution time (ms)

Average speed rate to kdb+

Example: A speed rate of 5 means kdb+ is 5x faster

ClickHouse

13

data.table

juliadf

25

Polars

19

pydatatable

32

Spark

29

DuckDB

pandas

40

0 10 20 30 40 Speed rate multiplier

DBOps

High-frequency data benchmarking

STAC benchmarks

  • Provides the fastest execution for 98% of STAC-M3 queries
  • Outperforms all previous public disclosed results in 10 out of 17 Antuco tests
  • Outperforms all publicly disclosed results in 9 out of 10 Kanaga tests

STACTM is the finance industry standard for independent testing of time-series data processing.​ The STAC-M3TM benchmark suite is designed for testing solutions that enable high-speed analytics on time series data, such as tick-by-tick market data. It is highly focussed on the business of high-speed data analytics for historical in-memory data, as used by investment banks, hedge funds and similar trading entities.

Antuco: Uses a limited dataset size of 4.5 TB and simulates performance that would be obtained with a real-world dataset residing mostly on nonvolatile media. Kanaga: Uses a dataset size of 33–897 TB and simulates performance on large datasets with large numbers of concurrent requests.

“STAC” and all STAC names are trademarks or registered trademarks of the Securities Technology Analysis Center, LLC."

Learn more

STAC

High-frequency data benchmarking

What makes kdb so fast?

  • Columnar storage reduces memory bandwidth and CPU cycles
  • Minimal codebase reduces instruction latency
  • In-memory processing enables sub-millisecond analytics
  • Intelligent tiered storage eliminates I/O overhead
  • Vector processing replaces slow row-by-row operations
  • Functional programming in q simplifies parallelization and multicore performance
  • Optimized time-series functions deliver fast, native support for time-based analytics
Learn more

High-frequency data benchmarking

Get started today

Start your journey today with our free Personal Edition. Join developers and data engineers building apps for the world’s most demanding data environments.

Documentation
Community
kdb Personal Edition