Why your sybil wallets get clustered by Nansen and Arkham
Why your sybil wallets get clustered by Nansen and Arkham
most operators running multiple wallets across airdrop campaigns think the threat model is simple: don’t send tokens from wallet A to wallet B. keep the wallets separated. use different seed phrases. that mental model is about ten years out of date.
Nansen and Arkham Intelligence have both built graph-theoretic clustering engines that don’t care about your seed phrase rotation. they care about behavioral fingerprints, gas funding patterns, transaction timing, and on-chain interaction sequences. the result is that a well-funded team can cluster hundreds of your wallets from a single afternoon of on-chain data with no off-chain information at all. protocols hire these firms. Hop Protocol used Nansen data to filter sybils before distributing HOP in 2022. LayerZero ran its own community sybil hunting process in 2024 with Nansen entity labels as supporting evidence. Optimism’s team has publicly discussed using clustering heuristics in their allocation filtering.
the stakes are not hypothetical. if your wallets cluster, you lose the allocation entirely, you may get your addresses published on a community sybil list, and subsequent campaigns will pick up those labeled addresses from the analytics layer before you even submit a claim. this article is about how the clustering actually works at a technical level, where operators make mistakes that are invisible to them, and what the realistic counter-strategies look like.
background and prior art
the academic foundation goes back to a 2013 paper by Meiklejohn, Pomarole, Jordan, Levchenko, McCoy, Voelker, and Savage titled “A Fistful of Bitcoins: Characterizing Payments Among Men with No Names”, presented at IMC. the core insight was that Bitcoin transactions with multiple inputs almost always imply common control: if a transaction spends outputs from address X and address Y simultaneously, both addresses are controlled by the same entity. this is called the common-input-ownership (CIO) heuristic and it remains the foundation of most commercial clustering tools even on UTXO chains.
EVM chains changed the mechanics but not the underlying problem. Ethereum doesn’t have native multi-input transactions the same way, but the equivalent signals are everywhere: shared gas funders, correlated interaction timing, identical contract call sequences, and deposit address reuse at centralized exchanges. Chainalysis, Nansen, Arkham, and Elliptic each built on top of these signals and layered in proprietary entity tagging, off-chain intelligence, and machine learning models that infer clusters probabilistically rather than deterministically. the result is that by 2024 the bar to cluster an EVM wallet farm is dramatically lower than most operators assume.
the core mechanism
let me walk through the actual signals in order of how easy they are to detect.
gas funding topology
this is the single largest mistake i see. you generate 50 wallets. you fund them all for gas from a single intermediate wallet, or worse, directly from a Binance withdrawal. even if you use a “burner” intermediate, that intermediate wallet’s funding pattern is visible on-chain. the resulting graph looks like a star: one hub, 50 spokes. Nansen’s entity graph picks this up automatically. the hub gets labeled. every spoke inherits the cluster.
the correct mental model is that a gas funder is a hard link between every wallet it touches. if wallet G funds wallets 1 through 50 with ETH, those 50 wallets are trivially clustered via G, even if G itself is not labeled. the analyst doesn’t need to know who G is. they just need to see that G is the common ancestor.
[CEX withdrawal]
|
[funder]
/ | \
[w1] [w2] [w3] ... [w50]
this pattern is detectable in a single query against an indexed chain database. the fix, partial as it is, is to introduce multiple layers of intermediary with actual delay and volume noise between each hop. but even that doesn’t solve the problem if the behavioral signals downstream are correlated.
timing correlation
if your 50 wallets all interact with the same protocol within a 20-minute window, that window is itself a clustering signal. Arkham runs temporal correlation analysis: wallets that repeatedly appear together in the same protocol interaction windows, even if never directly linked, accumulate a soft cluster score. this is probabilistic, not deterministic, but the probability adds up fast if you’re running batch jobs.
the mechanism is similar to traffic analysis in network security. you don’t need to see the content of the packets, just that packets from 50 sources cluster around the same timestamps. automated farming scripts almost always produce these timing signatures because they process wallets sequentially or in small thread pools, generating activity bursts with predictable inter-wallet timing gaps.
behavioral fingerprinting
this is the one that kills sophisticated operators who think they’ve solved the funding and timing problems. behavioral fingerprinting is about the sequence and selection of on-chain interactions.
consider a hypothetical Layer 2 campaign. a genuine user might bridge ETH to Arbitrum, swap on Uniswap, provide liquidity on Camelot, use GMX once, then bridge back. another genuine user might bridge in, buy an NFT on Treasure, use Radiant, and swap a few times. the distributions of actions across genuine users are high-variance and path-dependent.
an automated farm running the same script produces 50 wallets that bridged on the same day, swapped the same pairs in the same order, with transaction amounts that are multiples of each other. even if the timing is staggered and the gas funders are clean, the action graph is identical across the cluster.
Nansen builds interaction fingerprints per wallet and compares them across their labeled universe. if your 50 wallets all look like each other and unlike the organic user distribution, they cluster. the math here is not exotic: cosine similarity on interaction vectors is enough to surface tight clusters.
CEX deposit address reuse
this one is under-appreciated. many operators eventually move tokens out via a centralized exchange deposit. if wallet A and wallet B both deposit tokens to the same CEX deposit address (even at different times), they are linked. most major exchanges generate a unique deposit address per user, meaning two wallets sending to the same deposit address are almost certainly controlled by the same person.
Nansen’s documentation explicitly describes their entity labeling for exchange deposit addresses. Arkham’s entity clustering on their platform works similarly. both tools have scraped and labeled exchange deposit addresses at scale. this means that even if your wallets are never directly connected on-chain, the moment two of them deposit to your personal Binance address, they’re linked in the analytics layer.
RPC and IP leakage
this is off-chain but matters. if you’re submitting transactions for multiple wallets through the same RPC endpoint (Alchemy, Infura, QuickNode), your IP and request patterns are visible to that provider. most RPC providers log source IPs against the from addresses of transactions. this data doesn’t appear directly in Nansen or Arkham, but it can surface through law enforcement requests, data breaches, or provider partnerships.
the more immediate problem is that some frontends fingerprint browser sessions. metamask connected to a dApp can leak that multiple wallets were managed from the same browser session. this is not directly Nansen or Arkham territory, but protocol teams doing internal filtering have used session data alongside on-chain analytics.
worked examples
example 1: Hop Protocol sybil filter (2022)
Hop Protocol distributed HOP tokens in June 2022. before the snapshot was finalized, the team hired Nansen to help identify sybil clusters. the filter used was based on: wallets that had bridged fewer than a certain number of times, wallets that were newly created in the weeks before the snapshot, and wallets that shared common funding sources.
a significant number of addresses were identified as sybils and removed. some community members disputed specific exclusions, but the methodology was publicly discussed and the Nansen entity graph was cited as a primary input. operators who had funded their wallets from a single exchange withdrawal and then done minimal on-chain activity beyond the bridge itself were the primary victims. the timing correlation was also a factor: many wallets had been created and used in a cluster around the same dates.
example 2: Optimism sybil reduction across multiple rounds
Optimism’s airdrop rounds, particularly OP1 and OP3, included explicit sybil filtering. the OP team has discussed publicly that they used on-chain graph analysis to identify wallets that shared funding sources or showed correlated interaction patterns with the Optimism bridge and major L2 DeFi protocols.
one pattern that surfaced repeatedly in community post-mortems: operators who batch-funded wallets from Binance, had those wallets bridge to Optimism on the same day, interact with Velodrome and Synthetix in sequence, then return to mainnet. the sequence matching across wallets in the same cluster was the primary signal. amounts also varied by a predictable multiplier (e.g., 0.1 ETH, 0.2 ETH, 0.3 ETH across the cluster), which is consistent with a script parameterizing position sizes linearly.
example 3: LayerZero sybil hunt (2024)
the LayerZero airdrop in 2024 included a community sybil hunting process that was unprecedented in scale. the project posted a public address list and invited the community to flag sybils in exchange for a share of the reclaimed allocation. the LayerZero team used Nansen entity labels and Arkham entity data as part of their validation layer for submitted reports.
the clustering signals that appeared most frequently in validated sybil reports: shared gas funders (one address funding 10-200 OFT interaction wallets), identical cross-chain messaging sequences (same source chain, destination chain, token, and amount across the cluster), and CEX deposit address convergence (multiple wallets ultimately sending ZRO to the same destination). operators who had used multiaccounting tools without proper gas isolation were heavily represented. for more on the tooling side of this kind of isolation, the team at multiaccountops.com has covered the operational stack in detail.
edge cases and failure modes
failure mode 1: “clean” intermediaries that aren’t
you route your gas funding through three hops before it reaches your farming wallets. the problem is that if all three hops happen within 24 hours with no other activity, the chain of custody is obvious. a real user who happens to receive ETH from a stranger and then passes it on doesn’t do so in a linear chain with no other interactions and no delay. the intermediary structure itself becomes a fingerprint. time delays of days to weeks, combined with the intermediary actually using the ETH for other organic activity, break this pattern more effectively than additional hops.
failure mode 2: amount correlation
operators often fund wallets with the same ETH amount or simple multiples. 50 wallets all funded with 0.15 ETH, or funded with 0.1, 0.2, 0.3 ETH in ascending order, are trivially clustered by amount alone as a secondary signal. organic wallets receive ETH in irregular amounts from irregular sources. the counter-strategy is to vary amounts with genuine noise (not just rounding differently) and to let wallets accumulate ETH from multiple small on-chain interactions over time before using them for campaign activity.
failure mode 3: the “fresh wallet” problem
a wallet created two days before an airdrop snapshot and then used only for the qualifying action is a strong sybil signal regardless of funding source. protocol teams use wallet age as a first-pass filter. what’s less appreciated is that wallet age alone isn’t sufficient cover: a two-year-old wallet that has been dormant for 23 months and then springs into activity exactly when a farming opportunity appears reads as a purchased or reactivated wallet, which has its own cluster risk.
genuine age requires genuine history. wallets that have interacted with protocols across multiple market cycles, received gas from diverse sources, and shown organic participation patterns over time are expensive to farm at scale because you’re essentially buying or building history, not just generating addresses. see our wallet rotation strategy deep-dive for how to think about building wallet age programmatically.
failure mode 4: cross-chain tracing
many operators think that bridging from Ethereum to an L2 and using a different address on the L2 breaks the link. it does not, if the bridge transaction maps the L2 address to the L1 funding source. most canonical bridges (Arbitrum’s bridge, Optimism’s bridge) create a direct on-chain link between the L1 deposit address and the L2 recipient address. if your L1 address is linked to a cluster, your L2 address inherits that cluster label.
third-party bridges like Across, Hop, and Stargate have less direct address-to-address mapping in some configurations, but the token amounts and timing still provide correlation signals. the counter-strategy is to use canonical bridges with different L1 source addresses per L2 destination, and to ensure those L1 source addresses are themselves not clustered. this is covered in more depth in our gas wallet management guide.
failure mode 5: contract interaction ordering
this is subtle. if your farming script calls contracts in a fixed order (approve, then swap, then bridge, then claim) with fixed parameters, the transaction sequence is machine-readable as a fingerprint. even without timing correlation, the fact that 50 wallets have identical contract call sequences with identical function selectors in identical order is enough to flag them as a cluster.
the counter-strategy requires varying interaction sequences genuinely, which means either manual intervention or a script that randomizes protocol choice, ordering, timing, and amounts within a realistic distribution. the distribution itself matters: if you randomize amounts uniformly between 0.05 and 0.5 ETH, that uniform distribution is itself a fingerprint relative to the roughly log-normal distribution of organic users. for a deeper look at how antidetect tooling interacts with on-chain fingerprinting, antidetectreview.org has relevant coverage of the browser-layer side of the stack.
what we learned in production
running wallet farms across the 2021-2024 cycle, the single most expensive lesson was that on-chain hygiene is not additive, it’s multiplicative. fixing the gas funder problem while leaving timing correlation intact reduces your risk only marginally, because analysts use multiple signals in combination. a wallet that fails three soft criteria gets flagged even if no single criterion is a hard disqualifier.
the second lesson is that entity label propagation is persistent and fast. once Nansen or Arkham labels a funding address as a “sybil funder” or associates it with a known farm, every downstream wallet inherits that label. these labels don’t expire quickly, and in some cases they propagate to future airdrops years later. i’ve seen wallet clusters flagged in 2024 based on funding patterns from 2022. the practical implication is that addresses you’ve burned in prior campaigns should be treated as permanently labeled, not reusable with different downstream wallets.
the third lesson, which is uncomfortable to admit, is that the analytical tools have outpaced most farming operations. Nansen’s pricing starts around $150/month for their pro tier and gives sophisticated users access to entity graphs that would have cost hundreds of thousands of dollars in custom analytics work five years ago. Arkham’s intelligence marketplace means that labeled address data is crowd-sourced and continuously updated. protocol teams that want to filter sybils don’t need to build custom tooling anymore. they pay for a Nansen subscription, export entity clusters, and cross-reference against their snapshot. the cost to the protocol of filtering a farm is lower than the cost to the operator of maintaining a clean one. that asymmetry is the fundamental problem. for more on how the sybil landscape has evolved specifically around the LayerZero campaign, see our LayerZero sybil filter analysis.
for newer operators, the realistic path is not to try to beat the analytics tools directly. it’s to run fewer wallets with higher organic activity per wallet, ensure each wallet has genuine protocol history across multiple projects and time periods, and be selective about which campaigns you participate in relative to the sophistication of their sybil filtering. campaigns with large treasuries and public commitments to sybil filtering will invest in Nansen and Arkham. campaigns with smaller teams and tighter timelines may rely on simpler heuristics that are easier to avoid.
the analytics layer is not perfect. probabilistic clustering produces false positives. appeal processes exist at some protocols. but the direction of travel is toward more sophisticated filtering, not less, and operators who plan around the tools as they were in 2021 will continue to lose allocations to filters that have been updated since then. understanding the mechanism is the minimum required to make informed decisions about risk.
references and further reading
-
Meiklejohn et al., “A Fistful of Bitcoins: Characterizing Payments Among Men with No Names” (IMC 2013), the foundational academic paper on Bitcoin address clustering heuristics.
-
Nansen Documentation, covering their entity labeling methodology, smart money classification, and wallet profiling tools.
-
Arkham Intelligence, their entity clustering platform and intel marketplace where labeled address data is crowd-sourced and traded.
-
Chainalysis Blog, covers address clustering methodology, exchange deposit address heuristics, and cross-chain tracing techniques across multiple detailed posts.
-
proxyscraping.org blog, covers RPC rotation and IP hygiene as it relates to on-chain transaction submission, relevant for the network-layer side of wallet isolation.
Written by Xavier Fok
disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.