Hello, I wanted to share some results on a recent paper I submitted to arxiv (will be available early next week).
I used the Mapper algorithm, a topological data analysis (TDA) tool, to identify six statistically distinct MEV behaviors within an Olympus MEV dataset, curated from on chain data (can read more here).
Mapper algorithm lets us transform continuous spaces of datasets to discrete graph structures while preserving topological and geometric information. I wrote a preview post on mirror which contains charts and a paper overview.
The initial findings are very interesting because Mapper can group MEV addresses that perform the same type of actions (atomic dex-to-dex arbitrage and liquidations) and qualify distinct behavior from a statistical point of view. Some questions that can be initially answered are:
- Is the MEV bot performing dex-to-dex arbitrage in reaction to human trading volume?
- Is the MEV bot performing dex-to-dex arbitrage in reaction to liquidations?
- Is the MEV bot performing dex-to-dex arbitrage in reaction to broader market conditions?
The TDA toolset contains very general tools and it’s relatively straight forward to adapt code to any MEV (or non-MEV) dataset and can be integrated into existing scikit-learn ML pipelines. Based off of these initial findings, I expect to build a more general end to end machine learning pipeline from these results.
Very interesting and thanks for sharing! I’ve read about TDA, but haven’t read through your paper yet. My question is how is it different from other clustering algorithms such as SVM, K-means, DBSCAN or simply running PCA on the raw data and do a scatter plot on each pair of PC? It will get similar graphs with classification. Is TDA more accurate, quicker or easier to train? What’s the advantage of using it?
Cool to see someone applying TDA in Blockchain (and in MEV!). I tried something similar using Persistent (co)homology to detect algorithmic rug pulls, (a continuation of the work https://arxiv.org/abs/2201.072209). One of the fundamental problems I had about it is the computational complexity of computing the homology groups of large data sets. Did you use any natural sub-sampling method?
I haven’t put out the paper yet. I am a first time arxiv author and I’m not sure how long it will take to go through the process to submit the paper. I think I will put a draft up on github in the meantime.
To answer your question, I actually use DBSCAN in the Mapper algorithm as a “local clustering” step. You can similarly use any of the clustering algorithms you specified in the Mapper algorithm TDA pipeline.
Shortly put, TDA is theoretically superior than using SVM, K-means, or DBSCAN by itself because you get pure math topological and geometric guarantees (from algebraic topology). TDA is independent of choice of measure and robust to noise.
There are a lot of interesting applications currently showing that adding a TDA mapper/persistent homology step into a current data model pipeline can increase the accuracy/performance of existing ML and deep learning models as well as explain the “black box” of a neural network model
hey I tried to click the link to your arxiv link, but it says the ‘Article identifier ‘2201.072209’ not recognized’
Computational complexity ofcourse depends on what you are considering as “large”. I didn’t use a natural sub-sampling method because my dataset only contained 528 daily data points. The library I used,
giotto-tda, offers c++ parallelization features. That might give you enough performance gains such that you won’t have to worry about computational complexity anymore. The library came out less than 2 years ago and is in active development! They also are working on a giotto-deep library for topological deep learning too right now!
Releasing a draft of the paper on github as arxiv is taking longer than expected to process the paper.