Hello everyone! I’m considering releasing a dataset containing authentic human trading data from the DeFi (decentralized finance) world, and I’d love to gauge the community’s interest. This dataset is built from real on-chain data and covers multiple years of activity in decentralized exchanges (DEXs).
Why Could This Be Valuable?
- Authentic Behavior Patterns: The data represents how real users (and bots) trade in decentralized markets, which could reveal patterns not evident in synthetic or traditional finance datasets. DeFi’s peer-to-peer nature means we get to observe organic market behavior without intermediaries, resulting in a rich source of open financial data nature.com.
- Microstructure Research: For those interested in market microstructure, this dataset lets you analyze the “guts” of each trade. You could study how trades are executed (e.g. multi-hop swaps, liquidity sources), how different participants behave, or the impact of certain strategies in a DeFi context. Traditional market microstructure studies are often limited by data access, but here we have granular on-chain details available to dig into.
- Modeling & Simulation: Machine learning practitioners can use this data to train models on real trading sequences – for example, to predict trader actions, detect anomalies, or simulate agent behavior in a realistic DeFi environment. Researchers have even analyzed tens of millions of DEX transactions to derive insights, finding such data highly useful for advanced ML research nature.com.
- Behavioral and Time-Series Analysis: If you’re into behavioral finance or time-series analysis, this dataset can help examine how traders react to different market events (like volatile swings, new token launches, yield farming incentives, etc.). The fine-grained, timestamped data allows for studying everything from intraday trading rhythms to long-term adoption trends in the DeFi ecosystem.
What’s in the Dataset?
- Real Human Trading Activity: It consists of actual transaction data from DeFi trading (not simulated), reflecting genuine user behavior on decentralized exchanges. Because these trades occur on-chain, the data is inherently open and rich in detail.
- Microstructure Details: Each trade is broken down into its component on-chain transfers at every step. For example, you’ll see every token movement involved in a swap or multi-step trade, providing a step-by-step view of how each transaction unfolds.
Looking for Feedback and Collaboration
I’m posting here to see if others would be interested in exploring this dataset or collaborating on projects that utilize it. If there’s enough interest, I can proceed with cleaning it up and uploading it to the Hugging Face Hub (or another accessible repository).
Questions for the community:
- Would you find this kind of dataset useful for your work or research?
- What would you want to build or investigate with it?
- Any suggestions on the format or additional features that would make it more useful (e.g. aggregated statistics, labels for certain events, etc.)?
I’m excited about the potential of this data and would love to hear your thoughts. Let me know if you’re interested or have ideas on how we could use it. Let’s potentially collaborate and turn this raw data into some awesome models or insights!
Looking forward to your feedback and seeing if we can get a little DeFi dataset project going together. ![]()