How airdrop teams use clustering tools (Nansen, Arkham, Bubblemaps)
How airdrop teams use clustering tools (Nansen, Arkham, Bubblemaps)
Every major airdrop in the last three years has run some form of clustering analysis before final token distribution. Optimism did it before OP wave 2. Arbitrum filtered addresses quietly before the ARB snapshot. LayerZero turned it into a public spectacle, inviting the community to submit sybil reports for bounties before ZRO launched in mid-2024. The tooling has gotten sharper, the analysts running it have gotten better, and the cost of getting caught has gone from “miss a few hundred dollars” to “lose five figures in a single snapshot.”
If you are still operating under the assumption that spreading activity across multiple wallets with random delays is enough to evade detection, you are behind. The gap between what farming communities discuss publicly and what foundation analytics teams actually run has widened significantly. This article is about that gap. I am going to walk through how Nansen, Arkham Intelligence, and Bubblemaps are used operationally by protocol teams, what signals each tool surfaces, and where each one fails. This is not a beginner explainer. I am assuming you already understand what an airdrop snapshot is, what a sybil wallet means, and why protocols want to filter them.
The goal is not to help you evade detection. The goal is to give you an accurate mental model of what you are actually up against, so you can make informed decisions about where to put your time and capital. An operator who understands the other side’s tooling makes better bets.
background and prior art
On-chain clustering predates modern airdrop analytics by years. The earliest systematic work came from blockchain forensics firms like Chainalysis and Elliptic, which built address-clustering heuristics for law enforcement use cases starting around 2014-2015. Their core insight, which still underlies most consumer-facing tools today, is that addresses can be linked through common-input-ownership heuristics in UTXO chains, and through behavioral and temporal correlation on account-based chains like Ethereum.
The shift toward airdrop-specific clustering happened roughly in 2021-2022, when the scale of farming activity became large enough that protocols noticed their token distributions were heavily skewed. Uniswap’s UNI airdrop in September 2020 was relatively clean partly because no one had optimized for it yet. By the time ENS launched in November 2021, organized farming was already a known phenomenon. By the time Optimism and Arbitrum launched in 2022 and 2023, foundation teams were building in explicit sybil-filtering steps. The tools available to do that filtering have since matured into a small ecosystem with distinct strengths.
It is worth noting that clustering analysis for airdrop compliance sits in a different legal and ethical frame than law enforcement forensics. Protocol teams are making internal eligibility decisions, not filing criminal referrals. That said, the underlying graph theory and heuristics are directly borrowed from the forensics world. If you want to understand the academic foundation, the USENIX Security 2020 paper on address clustering by Friedhelm Victor remains one of the cleaner write-ups of deposit address reuse as a clustering heuristic on Ethereum specifically.
the core mechanism
Clustering tools, regardless of vendor, are building a graph where nodes are addresses and edges are relationships. The relationships can be:
- funding path edges: address A sent ETH to address B, directly or through an intermediate
- behavioral edges: addresses A and B interacted with the same contracts in the same block or same session window
- temporal edges: addresses A and B were funded within N minutes of each other from the same source
- fingerprint edges: addresses A and B share metadata signals like gas price strategy, nonce patterns, or token approval sequences
The output is a cluster, which is a set of addresses that are statistically likely to be controlled by the same operator. Different tools weight these edges differently, and that is where the practical differences between Nansen, Arkham, and Bubblemaps become meaningful.
Nansen (nansen.ai) is primarily a labeled-wallet database with a clustering layer on top. Their core product is the Smart Money label set, which tags wallets by behavior category. For airdrop analysis, what matters more is their entity clustering feature and their wallet profiler. When a foundation team queries a wallet on Nansen, they can see funding sources, first transactions, and which other labeled entities the wallet has interacted with. The clustering logic Nansen applies is mostly behavioral, built on pattern matching against their labeled wallet database. If your wallet was funded from an exchange withdrawal address that has also funded 300 other addresses that all went on to farm the same protocol in the same week, that is a signal. Nansen’s annual subscription for the tier that includes entity clustering was around $3,000-$4,000/year as of early 2025 for individual plans, with custom enterprise pricing for protocols that want API access to run bulk queries.
Arkham Intelligence (arkhamintelligence.com) takes a different approach. Their core differentiator is entity-level identity attribution rather than behavioral clustering. Arkham’s AI system, which they call “Ultra”, tries to attach real-world identity labels to addresses, not just behavioral categories. They also built the Intel Exchange, which launched with the ARKM token in July 2023 and allows researchers and bounty hunters to sell attribution data to buyers. For airdrop teams, the relevant Arkham features are: entity dashboards that aggregate multiple addresses under one operator identity, the “demix” tool for tracing mixed or layered fund flows, and the public intelligence database that anyone can query. Arkham’s basic tier is free, which is significant because it means small protocol teams without large analytics budgets can still run preliminary checks.
Bubblemaps (bubblemaps.io) is the most visually intuitive of the three and the most narrowly focused. It does one thing: it takes a token’s holder list and renders the relationship graph between top holders as an interactive bubble chart. Clusters of wallets that share funding sources or have transferred tokens between each other appear as visually connected bubbles. It is particularly effective for catching naive farming setups where the operator has been moving tokens between wallets without thinking about the graph it creates. Bubblemaps is free for basic use and has been widely used for due diligence on new token launches rather than strictly airdrop filtering, but the use case overlaps significantly.
In practice, foundation teams do not use just one of these. The typical workflow I have seen discussed across research discords and post-mortems looks like this:
1. Pull snapshot addresses (full list, pre-filter)
2. Run batch funding-source analysis (often Arkham or custom script against on-chain data)
3. Cluster by funding source + timing windows (Nansen API or proprietary tooling)
4. Visual inspection of suspicious clusters (Bubblemaps or Nansen graph view)
5. Apply confidence threshold, mark clusters above threshold as ineligible
6. Human review of edge cases near threshold
7. Final distribution excludes flagged clusters
Step 6 is where appeals processes come from. The threshold-setting step is where protocols make a policy decision about false positive tolerance. Setting the threshold too tight catches more farmers but also incorrectly flags legitimate users who happen to share infrastructure (corporate users behind NAT, people who used the same centralized faucet service, etc.).
worked examples
Optimism OP airdrop 1 and 2 (2022)
Optimism’s first OP airdrop in May 2022 used a relatively simple eligibility criteria set: bridging activity, governance participation, Gitcoin donor history, and similar. What got less attention was the filtering step. Before finalizing the list, the Optimism team ran sybil analysis and excluded a material number of addresses. The exact methodology was not published in detail, but addresses with clear common-funding-source patterns and near-identical transaction histories on L1 were excluded. The second airdrop criteria set, released in February 2023, was explicitly designed with sybil-resistance in mind, requiring attestation-linked activity rather than just raw transaction counts.
The lesson from Optimism is that even relatively early, well-funded protocols were doing this filtering. If you were farming OP with a batch of wallets funded from the same CEX withdrawal in the same 30-minute window, you were at risk in 2022, not just 2024.
LayerZero ZRO airdrop (2024)
This is the most documented example of an aggressive public sybil hunt. LayerZero’s approach before the ZRO distribution was to run an open bounty program where external researchers could submit evidence of sybil clusters and receive a share of the tokens reclaimed from flagged addresses. The LayerZero sybil reporting process was documented on their official channels and attracted significant participation from on-chain analysts.
What made LayerZero’s process particularly thorough was the combination of their internal analysis with crowdsourced forensics. External contributors were motivated by direct financial incentive to find clusters, which meant the analysis covered more ground than any internal team could. Bubblemaps was widely used by community researchers during this period because it made the token transfer relationships visually obvious. An address that had bridged tokens to 40 other addresses, all of which had identical LayerZero interaction patterns, showed up immediately as a connected bubble cluster.
The outcome: LayerZero flagged a very large number of addresses. Exact figures shifted during the process, but reports at the time indicated hundreds of thousands of addresses were under review or excluded. Operators who had taken care to route funding through multiple intermediaries and introduce behavioral variance fared better. Operators who had batch-funded from a single CEX hot wallet lost everything.
Arbitrum ARB airdrop (2023)
Arbitrum did not publicize their filtering methodology in detail, but several post-distribution analyses by on-chain researchers, including threads circulated on crypto Twitter in March-April 2023, identified patterns suggesting that same-day funding + same-protocol interaction clusters were removed from the final eligible set. The ARB airdrop was notable because it was large enough (1.16 billion ARB allocated to users) that even modest filtering impacted significant token value. Researchers using Nansen’s entity view identified groups of 10-50 wallets that had near-identical transaction histories and were funded within hours of each other, and confirmed post-distribution that most of those groups received zero allocation.
The Arbitrum case is a good illustration of how Nansen’s behavioral clustering works in practice. The common-input heuristic used for UTXO chains does not apply to Ethereum, so Nansen relies more on behavioral timing and interaction pattern matching. Two wallets that both funded on January 15, 2023, both approved USDC to Arbitrum bridge on January 16, both made a GMX trade on January 17, and both did a Camelot LP deposit on January 20 are going to cluster together regardless of whether they share a direct funding source. The probability that two independent organic users followed that exact sequence in that time window is low enough that it flags as a cluster.
edge cases and failure modes
False positives from shared infrastructure
The most common failure mode in clustering analysis is flagging wallets that are legitimately independent but share infrastructure. A team at a crypto fund might have 10 employees who all use the same company Ethereum node for gas estimation, causing their transactions to land in similar block positions. A group of friends who read the same alpha channel and act on tips simultaneously will create behavioral clusters. Users of certain smart wallet products or account abstraction implementations share nonce patterns or deployment signatures that can look like a cluster signature.
This is not a hypothetical. Post-Arbitrum, there were credible reports of protocol teams or DAOs whose members were collectively filtered because their multi-sig coordination created correlated on-chain behavior. Appeals processes exist partly because of this failure mode.
Threshold calibration errors
Setting the cluster-exclusion confidence threshold is a policy decision, not a technical one, and teams get it wrong in both directions. Set too low, and you exclude a large portion of real users. Set too high, and organized farmers with decent operational security pass through. The problem is that teams often do not know their error rate until after distribution, when they see complaints or post-distribution analysis. There is no ground truth to calibrate against during the analysis.
Mixing services and privacy tools
Tornado Cash (now sanctioned), various cross-chain privacy routers, and even just routing through multiple CEX accounts can break funding-source graphs. Arkham’s demix tool is specifically designed to trace through one layer of mixing, but multi-hop mixing with intermediate holding periods stretching weeks or months significantly degrades clustering confidence. This is not a novel insight, and protocol teams know it. The response has been to weight behavioral clustering more heavily where funding-source clustering is ambiguous. You cannot fully break behavioral correlation with mixing, only funding-source correlation.
The timing window problem
Most temporal clustering heuristics use a configurable time window: wallets funded within N hours of each other from the same source are clustered. The choice of N is arbitrary and creates exploitable edges. A batch of wallets funded within 5 minutes almost certainly clusters. The same batch funded across 72 hours might not, depending on the tool’s threshold. This is why naive batch-funding detection is relatively easy to evade by adding delays, but more sophisticated analysts are aware of this and extend their time windows or layer in behavioral signals that are harder to delay-proof.
Post-snapshot analysis catching pre-snapshot behavior
A point that is underappreciated: some airdrop teams do not finalize their filtering until weeks or months after the snapshot. LayerZero’s sybil process ran for over a month after the initial announcement. This means that behavior you engaged in before the snapshot can be analyzed using data that became available after, including Arkham entity attributions that were added to the database between your activity date and the analysis date. A wallet you funded from a CEX two years ago might have that CEX withdrawal address labeled in Arkham today even if it was not labeled at the time.
Over-reliance on single-tool analysis
Foundation teams that rely on a single tool get the blind spots of that tool. Bubblemaps misses clusters that did not transfer tokens between wallets. Nansen’s behavioral clustering misses wallets with genuinely diverse activity profiles that happen to share a funding source. Arkham’s entity attribution has gaps in its coverage, particularly for newer addresses and non-Ethereum chains. The most rigorous processes layer multiple tools and include manual review. Farmers who study only one tool’s methodology are optimizing against an incomplete model of the actual analysis being run.
For anyone thinking about the multi-account operations side of this more broadly, the multiaccountops.com blog has useful material on the infrastructure patterns that reduce correlation risk at the wallet management layer.
what we learned in production
Running operations across multiple protocols over 2022-2024, the clearest takeaway is that clustering tools have gotten better faster than evasion techniques have. The delta between what is publicly discussed in Telegram groups and what foundation analytics teams actually run is at least 12-18 months. By the time a particular evasion technique becomes common knowledge in farming communities, the analysts running airdrop filters have already seen it in their data and adjusted for it.
The second thing I would note is that the human review step matters more than the automated step. I have seen cases where wallets with genuinely organic activity were caught in automated clusters because of infrastructure overlap, and those wallets recovered their eligibility through appeals. I have also seen sophisticated automated setups that passed the initial filter get flagged in human review because the reviewer noticed a behavioral pattern the algorithm missed. Protocols that run both steps are harder to fool than protocols that run only automated analysis. The presence or absence of an appeals process is actually a signal about how seriously a team is taking the filtering problem.
One more operational note: Nansen’s wallet labels and Arkham’s entity database are living systems. They are updated continuously. An address that was unlabeled at snapshot time might get labeled six months later when someone publishes an Arkham bounty claiming they identified it. That retroactive labeling matters if the protocol is still in distribution or has future airdrops. Operating on the assumption that the label state at snapshot time is the only relevant state is a mistake.
For practitioners who also operate in the browser fingerprinting and multi-account infrastructure space, the antidetectreview.org blog covers the analogous problems on the web2 side, which has some instructive parallels for thinking about signal correlation at the infrastructure layer.
The core message from all of this is not that clustering tools are infallible. They are not. They have real failure modes, calibration problems, and coverage gaps. The message is that they are good enough that casual or naive farming setups do not survive them, and that the sophistication required to reliably evade them has increased substantially. Whether that sophistication is worth acquiring for a given opportunity is a risk-adjusted calculation that every operator has to make for themselves.
references and further reading
-
USENIX Security 2020: Address Clustering Heuristics for Ethereum - Friedhelm Victor’s foundational paper on deposit address reuse as a clustering signal on Ethereum. The academic basis for most of what commercial tools do.
-
Nansen Documentation - Official Nansen product documentation covering wallet labels, entity clustering, and API access. Useful for understanding exactly what data is in their labeled wallet set.
-
Arkham Intelligence Platform - Arkham’s main platform. The Intel Exchange section shows active entity attribution bounties, which gives a real-time view of what kinds of attribution are being actively researched.
-
Bubblemaps - Direct access to Bubblemaps’ token visualization tool. Running any token you are interacting with through this before committing capital is basic due diligence at this point.
-
LayerZero Official Blog - LayerZero’s published communications around the ZRO distribution process, including their sybil hunting methodology and the community reporting program.
For related analysis on this site, the airdropfarming.org blog index has additional deep-dives. The Optimism OP airdrop retrospective covers the specific filtering decisions made across OP waves 1-4 with more detail on eligibility criteria design. The wallet hygiene for airdrop operators piece covers the infrastructure side of reducing clustering risk. And the LayerZero sybil hunt: what the data showed goes deeper into the specific heuristics used during the ZRO process.
Written by Xavier Fok
disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.