High-frequency data benchmarking
High-frequency data benchmarking
Imperial College London
The following benchmarks were independently undertaken by Imperial College London. They define a database scenario for storing and querying data from stock and cryptocurrency exchanges. Benchmarks are divided into the following categories:
- Trade data, including averages and volume-weighted average price (VWAP)
- Order book, including bid/ask, spread, and national best bid and offer (NBBO)
- Complex query, including compute volatility for execution and mid-quote returns
- Writing efficiency
- Storage efficiency
Databases compared
kdb+ is a commercial in-memory time series database developed to support fast ingestion and immediate query of data. InfluxDB is an open-source database optimized for time series data horizontally partitioned into shards. TigerData* is an open-source time series database extension of PostgreSQL. ClickHouse is a column-oriented open-source database system that supports an extended version of SQL for large-scale OLAP applications.
kdb+
*Previously known as TimeScaleDB
Imperial College London
High-frequency data benchmarking
Test criteria
Test hardware
- Intel i7-1065G7 CPU 16GB RAM.
- All databases save data to SSD.
- Only one database tested at a time.
- Average obtained from 10 executions.
Implementation
The dataset is imported into each database via CSV files. The database schema and data types are maintained consistently across all databases.
Imperial College London
High-frequency data benchmarking
Data ingestion throughput (MB/s)
Ingestion and storage
kdb+ on-disk
47.35 MB/s
Data ingestion and storage performance results are evaluated from the storage and write efficiency benchmarks. Due to the unfair advantage of kdb+ in memory performance, tests are performed on disk to remove bias. Conclusion kdb+ outperforms all other tested solutions for overall performance in storage and ingestion efficiency.
30.19 MB/s
TigerData
4.94 MB/s
InfluxDB
ClickHouse
2.1 MB/s
0 10 20 30 40 50 Ingestion throughput (MB)
Data write efficiency (ms)
33,889 ms
kdb+ on-disk
324,854 ms
TigerData
53,150 ms
InfluxDB
ClickHouse
765,000 ms
0 200 400 600 800 Write efficiency (ms)
Imperial College London
High-frequency data benchmarking
Read queries
Performance of read queries in miliseconds
The read query performance results are evaluated from the following benchmarks:-
- Avg trading volumes (Day)
- Avg trading volumes (Month)
- Order book (Week)
Benchmarks are classified as high read with light computational intensity. Conclusion kdb+ demonstrates the best overall performance for disk and memory queries.
81 ms
kdb+ on-disk
740 ms
94 ms
93 ms
InfluxDB
273 ms
146 ms
469 ms
3700 ms
TigerData
3938 ms
272 ms
ClickHouse
1991 ms
962 ms
0 1000 2000 3000 4000 Read queries (ms)
Avg trading volumes (day) Avg trading volumes (week) Order books (week)
Imperial College London
High-frequency data benchmarking
Computationally intensive queries
Performance of computationally intensive queries in miliseconds
The computationally intensive query performance results are evaluated from the following benchmarks:-
- Weighted avg price (TWAP)
- Market depth avg (Day)
- Market depth avg (Week)
Benchmarks are classified as high read with heavy computational intensity. Conclusion kdb demonstrates the best overall performance in computationally intensive queries.
75 ms
kdb+ on-disk
533 ms
4375 ms
12,716 ms
InfluxDB
373 ms
4420 ms
334 ms
TigerData
4699 ms
17,092 ms
202 ms
ClickHouse
1626 ms
10,180 ms
0 4000 8000 12000 16000 20000 Queries (ms)
Weighted avg price Market depth avg (day) Market depth avg (week)
Imperial College London
High-frequency data benchmarking
Complex queries
Performance of complex queries in miliseconds
The complex query performance results are evaluated from the following benchmarks:-
- Mid quote returns
- Execution volatility
Conclusion kdb demonstrates the best overall performance for complex queries.
64 ms
kdb+ on-mem
41 ms
113 ms
kdb+ on-disk
51 ms
99 ms
InfluxDB
2009 ms
1614 ms
TigerData
324 ms
401 ms
ClickHouse
190 ms
0 500 1000 1500 2000 2500 Queries (ms)
Mid quote returns Execution volatility
Imperial College London
High-frequency data benchmarking
"It can be concluded that kdb+ is the most suitable database for financial analysis applications of time series data. kdb+ can quickly ingest data in bulk, with a good compression ratio for storage efficiency. kdb+ has stable low query latency for all benchmarks, including read queries, computational intensive queries, and complex queries."
Ruijie Xiong - Department of computing
High-frequency data benchmarking
Basic group-by (1bn rows)
DBOps benchmarks
ClickHouse
350 ms
DBOps is a public benchmark that compares the performance of various open-source database tools and technologies. Tests include mathematical and statistical calculations, group-by operations, and joins across a range of datasets to demonstrate performance under various conditions. Conclusion
- kdb+ came first in 17 out of 18 categories
- kdb+ performed up to 30x faster than Pandas in all categories
- kdb+ comes first in 66% of all queries
150 ms
data.table
180 ms
juliadf
Polars
120 ms
980 ms
pydatatable
Spark
700 ms
80 ms
kdb+
0 200 400 600 800 1000 Execution time (ms)
Average speed rate to kdb+
Example: A speed rate of 5 means kdb+ is 5x faster
ClickHouse
13
data.table
juliadf
25
Polars
19
pydatatable
32
Spark
29
DuckDB
pandas
40
0 10 20 30 40 Speed rate multiplier
DBOps
High-frequency data benchmarking
STAC benchmarks
- Provides the fastest execution for 98% of STAC-M3 queries
- Outperforms all previous public disclosed results in 10 out of 17 Antuco tests
- Outperforms all publicly disclosed results in 9 out of 10 Kanaga tests
STACTM is the finance industry standard for independent testing of time-series data processing. The STAC-M3TM benchmark suite is designed for testing solutions that enable high-speed analytics on time series data, such as tick-by-tick market data. It is highly focussed on the business of high-speed data analytics for historical in-memory data, as used by investment banks, hedge funds and similar trading entities.
Antuco: Uses a limited dataset size of 4.5 TB and simulates performance that would be obtained with a real-world dataset residing mostly on nonvolatile media. Kanaga: Uses a dataset size of 33–897 TB and simulates performance on large datasets with large numbers of concurrent requests.
“STAC” and all STAC names are trademarks or registered trademarks of the Securities Technology Analysis Center, LLC."
Learn more
STAC
High-frequency data benchmarking
What makes kdb so fast?
- Columnar storage reduces memory bandwidth and CPU cycles
- Minimal codebase reduces instruction latency
- In-memory processing enables sub-millisecond analytics
- Intelligent tiered storage eliminates I/O overhead
- Vector processing replaces slow row-by-row operations
- Functional programming in q simplifies parallelization and multicore performance
- Optimized time-series functions deliver fast, native support for time-based analytics
Learn more
High-frequency data benchmarking
Get started today
Start your journey today with our free Personal Edition. Join developers and data engineers building apps for the world’s most demanding data environments.
Documentation
Community
kdb Personal Edition
KX Performance Dashboard
dbaker
Created on October 17, 2022
Start designing with a free template
Discover more than 1500 professional designs like these:
View
Puzzle Diagram
View
Gear Diagram
View
Square Timeline Diagram
View
Timeline Diagram
View
Timeline Diagram 3
View
Timeline Diagram 4
View
Timeline Diagram 2
Explore all templates
Transcript
High-frequency data benchmarking
High-frequency data benchmarking
Imperial College London
The following benchmarks were independently undertaken by Imperial College London. They define a database scenario for storing and querying data from stock and cryptocurrency exchanges. Benchmarks are divided into the following categories:
Databases compared
kdb+ is a commercial in-memory time series database developed to support fast ingestion and immediate query of data. InfluxDB is an open-source database optimized for time series data horizontally partitioned into shards. TigerData* is an open-source time series database extension of PostgreSQL. ClickHouse is a column-oriented open-source database system that supports an extended version of SQL for large-scale OLAP applications.
kdb+
*Previously known as TimeScaleDB
Imperial College London
High-frequency data benchmarking
Test criteria
Test hardware
Implementation
The dataset is imported into each database via CSV files. The database schema and data types are maintained consistently across all databases.
Imperial College London
High-frequency data benchmarking
Data ingestion throughput (MB/s)
Ingestion and storage
kdb+ on-disk
47.35 MB/s
Data ingestion and storage performance results are evaluated from the storage and write efficiency benchmarks. Due to the unfair advantage of kdb+ in memory performance, tests are performed on disk to remove bias. Conclusion kdb+ outperforms all other tested solutions for overall performance in storage and ingestion efficiency.
30.19 MB/s
TigerData
4.94 MB/s
InfluxDB
ClickHouse
2.1 MB/s
0 10 20 30 40 50 Ingestion throughput (MB)
Data write efficiency (ms)
33,889 ms
kdb+ on-disk
324,854 ms
TigerData
53,150 ms
InfluxDB
ClickHouse
765,000 ms
0 200 400 600 800 Write efficiency (ms)
Imperial College London
High-frequency data benchmarking
Read queries
Performance of read queries in miliseconds
The read query performance results are evaluated from the following benchmarks:-
- Avg trading volumes (Day)
- Avg trading volumes (Month)
- Order book (Week)
Benchmarks are classified as high read with light computational intensity. Conclusion kdb+ demonstrates the best overall performance for disk and memory queries.81 ms
kdb+ on-disk
740 ms
94 ms
93 ms
InfluxDB
273 ms
146 ms
469 ms
3700 ms
TigerData
3938 ms
272 ms
ClickHouse
1991 ms
962 ms
0 1000 2000 3000 4000 Read queries (ms)
Avg trading volumes (day) Avg trading volumes (week) Order books (week)
Imperial College London
High-frequency data benchmarking
Computationally intensive queries
Performance of computationally intensive queries in miliseconds
The computationally intensive query performance results are evaluated from the following benchmarks:-
- Weighted avg price (TWAP)
- Market depth avg (Day)
- Market depth avg (Week)
Benchmarks are classified as high read with heavy computational intensity. Conclusion kdb demonstrates the best overall performance in computationally intensive queries.75 ms
kdb+ on-disk
533 ms
4375 ms
12,716 ms
InfluxDB
373 ms
4420 ms
334 ms
TigerData
4699 ms
17,092 ms
202 ms
ClickHouse
1626 ms
10,180 ms
0 4000 8000 12000 16000 20000 Queries (ms)
Weighted avg price Market depth avg (day) Market depth avg (week)
Imperial College London
High-frequency data benchmarking
Complex queries
Performance of complex queries in miliseconds
The complex query performance results are evaluated from the following benchmarks:-
- Mid quote returns
- Execution volatility
Conclusion kdb demonstrates the best overall performance for complex queries.64 ms
kdb+ on-mem
41 ms
113 ms
kdb+ on-disk
51 ms
99 ms
InfluxDB
2009 ms
1614 ms
TigerData
324 ms
401 ms
ClickHouse
190 ms
0 500 1000 1500 2000 2500 Queries (ms)
Mid quote returns Execution volatility
Imperial College London
High-frequency data benchmarking
"It can be concluded that kdb+ is the most suitable database for financial analysis applications of time series data. kdb+ can quickly ingest data in bulk, with a good compression ratio for storage efficiency. kdb+ has stable low query latency for all benchmarks, including read queries, computational intensive queries, and complex queries."
Ruijie Xiong - Department of computing
High-frequency data benchmarking
Basic group-by (1bn rows)
DBOps benchmarks
ClickHouse
350 ms
DBOps is a public benchmark that compares the performance of various open-source database tools and technologies. Tests include mathematical and statistical calculations, group-by operations, and joins across a range of datasets to demonstrate performance under various conditions. Conclusion
150 ms
data.table
180 ms
juliadf
Polars
120 ms
980 ms
pydatatable
Spark
700 ms
80 ms
kdb+
0 200 400 600 800 1000 Execution time (ms)
Average speed rate to kdb+
Example: A speed rate of 5 means kdb+ is 5x faster
ClickHouse
13
data.table
juliadf
25
Polars
19
pydatatable
32
Spark
29
DuckDB
pandas
40
0 10 20 30 40 Speed rate multiplier
DBOps
High-frequency data benchmarking
STAC benchmarks
STACTM is the finance industry standard for independent testing of time-series data processing. The STAC-M3TM benchmark suite is designed for testing solutions that enable high-speed analytics on time series data, such as tick-by-tick market data. It is highly focussed on the business of high-speed data analytics for historical in-memory data, as used by investment banks, hedge funds and similar trading entities.
Antuco: Uses a limited dataset size of 4.5 TB and simulates performance that would be obtained with a real-world dataset residing mostly on nonvolatile media. Kanaga: Uses a dataset size of 33–897 TB and simulates performance on large datasets with large numbers of concurrent requests.
“STAC” and all STAC names are trademarks or registered trademarks of the Securities Technology Analysis Center, LLC."
Learn more
STAC
High-frequency data benchmarking
What makes kdb so fast?
Learn more
High-frequency data benchmarking
Get started today
Start your journey today with our free Personal Edition. Join developers and data engineers building apps for the world’s most demanding data environments.
Documentation
Community
kdb Personal Edition