Decentralized Exchange Classification Dataset: AlphaCore

Dataset information

This dataset was provided by Cuneyt Akcora from U Manitoba, Friedhelm Victor from TU Berlin, Murat Kantarcioglu and Yulia Gel from UT Dallas.

The Ethereum blockchain stores the transactions that have been executed between roughly 200M account addresses. Ethereum contains two types of account addresses: externally owned and smart contract code accounts. Externally owned accounts (EOA) are controlled by private keys that are managed by real-life entities. Some entities are ordinary users, whereas others are organizations such as blockchain exchanges. There are two types of exchanges; centralized exchanges (CEX), also known as custodial exchanges, manage users’ funds on their behalf, via multiple, centrally controlled EOAs. Decentralized exchanges (DEX) in contrast, typically do not require placing funds in the custody of a single entity, and are typically implemented as smart contract accounts. As exchanges play a major role in blockchain transaction networks, and DEX have gained significant popularity with the advent of Decentralized Finance, understanding these type of addresses and associated transactions has emerged as an important task.

This dataset consists of weighted, directed graphs with partially available node labels. Specifically, it consists of token (asset) networks that have been extracted from the Ethereum blockchain between Oct-16-2018 and May-04-2020 and are among the largest during that time frame. It covers the ERC20 assets TUSD, BAT, MANA, MGC, BNT, HEX, AMB, LINK, DAI, HT, AZ, LAMB, SAI, EGT, MXM, USDP, MKR, USDC, NPXS, STORJ, BNB, EBK, WETH, KICK, OMG, KNC, ZRX and ENJ which correspond to the token address field. The data is not anonymized, and can thus be looked up with online block explorers and linked to external information. These networks can be used individually, or jointly, as some nodes may appear in multiple networks. Node labels were obtained in May 2020 from, a prominent Ethereum block explorer, that curates and maintains address labels. In total, 296 addresses from 149 centralized and decentralized exchange addresses are listed publicly, which are likely used frequently. The dataset also provides address labels (label, address, name, asset) for addresses in the 0.1 depth Alphacore of the stablecoin network.

Number of graphs: 28
Directed: Yes
Node features: No
Edge features: Yes
Graph labels: No
Node labels: Partially
Temporal: Yes
Possible tasks
ClassificationGiven a token transaction network and a list of centralized (CEX) and decentralized (DEX) addresses, predict which other Ethereum addresses belong to an exchange
Core decompositionGiven a token transaction network, identify its cores by using node features. Use the list of centralized (CEX) and decentralized (DEX) addresses as your ground truth with the assumption that cex and dex addresses appear in the highest core of the network (see the AlphaCore article cited below for a justification of this assumption)

Paper: Alphacore: Data Depth based Core Decomposition, by F. Victor, C.G. Akcora, Y.R. Gel, M. Kantarcioglu, ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Source (citation)

    File Description
