Trading Strategy has published datasets for all historical Uniswap v2 and v3 DEX trades across multiple blockchains.
This is the largest publicly available dataset of individual on-chain trades. As opposed to the prior art, this dataset is multichain and expands its scope outside the Ethereum mainnet.
What is included in the DEX trade dataset?
The dataset contains all realised historical trades for
- All trades for Uniswap v2 and compatible DEXes
- All trades for Uniswap v3 and compatible DEXes
- Trading pair
- Chain, block, transaction related to trade
- Timestamp (block produced)
- Assets paid
- Assets received
- Transaction originator address (sender)
- Approximate dollar exchange for fiat currency value conversion (if available)
- From the period 2020-05 - 2024-01
Currently, failed (reverted) trades are not included. The dataset sorting criteria is for maximum compression, not for the timestamp, so you need to resort to the data before you use it.
Data collection includes the following blockchains
- Polygon
- Arbitrum
- Ethereum mainnet
- BNB Smart Chain
- Avalanche C-Chain
The current dataset is 2B rows and 80 GBytes in size in a Zstd-compressed Parquet file.
How can I use this data?
We believe this dataset benefits:
- Design new kinds of trading strategies
- Analysing issues with MEV, frontrunning and sandwich attacks
- Analysing trading behaviour and blockchain usage patterns, e.g. adoption
The data is mainly for statistical purposes. The dataset may not directly translate price action data like market mid-price or OHCLV because all trades include price impact (average price per token vs. top bid). Some individual trades or transactions might be missing due to blockchain node issues and technical problems.
You can download the Parquet file from the Trading Strategy Backtesting datasets page. At the moment, it is free and only requires a newsletter subscription. The easiest way to use the data is with the Tradings Srategy data SDK.
We believe the single-click download is easier to use than APIs. Parquet file is a standard container format for data science, and it is easy to read in Python or any other programming language and tool.
The dataset is so large that in-memory data processing is impossible. We recommend saving a copy of the file, performing local streaming processing, and reducing the dataset to be smaller for your purposes.
Want to see your own data here?
Don't hesitate to contact us if your DEX or a blockchain wants to be included in the dataset. Most of the relevant source code is available in the MIT-licensed web3-ethereum-defi package, and we are happy to take sponsorships to expand the coverage.