← back to blog

Behavioural fingerprints chain analysis uses to cluster wallets

Behavioural fingerprints chain analysis uses to cluster wallets

There is a common misunderstanding in the farming community that wallets are anonymous by default. They are pseudonymous at best, and the gap between those two words matters enormously when a project’s anti-sybil team has access to Chainalysis Reactor, TRM Labs, or even just a decent graph database and a free afternoon. Chain analysis has evolved far beyond simple address reuse. The clustering methods used today are built on behavioural fingerprints, meaning patterns in how wallets act, when they act, and who they act with. Those patterns are often more durable than any attempt to separate wallets by using different IP addresses or fresh seed phrases.

The stakes here are concrete. Projects that ran sybil detection post-snapshot in 2023 and 2024, including Arbitrum, ZkSync, and LayerZero, disqualified hundreds of thousands of addresses. The LayerZero sybil bounty campaign that ran through May and June 2024 resulted in roughly 800,000 addresses being flagged across community submissions and internal analysis. That is not a rounding error. It represents real money left on the table by operators who understood wallet separation at a surface level but did not understand how deeply on-chain behaviour leaks identity. This article is about those leaks: what they are, how they are detected, and where the genuine edge cases and failure modes sit.

I am going to be direct about something before we go further. Understanding these fingerprinting techniques is not about circumventing legitimate compliance processes. It is about building a realistic mental model of what chain analysis can and cannot see, which is useful both for operators structuring legitimate multi-wallet setups and for anyone who wants to understand how the underlying detection machinery works. The analysis methods described here are published in academic literature and vendor documentation. Nothing here is secret.

background and prior art

The foundational paper for address clustering on Bitcoin is Meiklejohn et al., “A Fistful of Bitcoins” (IMC 2013). The core insight they formalized is the common input ownership heuristic: if two addresses appear together as inputs to the same transaction, both inputs must have been signed by the same private key (or coordinated key set), which means they belong to the same wallet or entity. This heuristic is simple, powerful, and still in use today as a baseline layer in every serious chain analysis platform.

The EVM ecosystem complicated things in interesting ways. Ethereum accounts are not UTXOs, so the common input heuristic does not apply directly. But the EVM introduced new behavioural surfaces: gas funding patterns, smart contract interaction sequences, token approval patterns, NFT minting timing, bridge usage, and cross-chain behaviour. Academic and industry work through 2020 to 2024 built up a substantial body of clustering methods adapted to this environment. Firms like Chainalysis, TRM Labs, Elliptic, and Nansen operationalised these methods at scale, and projects began licensing them for airdrop eligibility reviews. The FATF guidance on virtual assets also pushed exchanges and increasingly token issuers toward stricter identity clustering as a compliance norm, which created commercial demand for these tools at exactly the moment airdrop farming became a full-time occupation for a non-trivial number of people.

the core mechanism

Behavioural fingerprinting for wallet clustering works by extracting features from on-chain activity and then running graph analysis, statistical similarity, or machine learning across those features to group addresses that behave identically or near-identically. Here is how the main signal types work in practice.

gas funding topology. This is the single most reliable clustering signal on EVM chains. When you fund twenty wallets from a single source address, that source becomes a hub node in a directed funding graph. Even if you use an intermediate exchange withdrawal address, clustering algorithms look for shared funding ancestry within N hops. Projects like Nansen have public “smart money” and “fund flow” graph visualisations that already expose this. The problem is not just direct funding. It is the tree shape. If wallet A funded wallets B, C, and D, and those in turn each funded three more wallets, the shape of that tree is distinctive and can be matched against other similar tree structures to identify operators running the same playbook.

timing correlation. Wallets that consistently act within a short time window of each other, across multiple transactions, across multiple protocols, create a strong timing signal. This is not just same-block or same-minute activity. Chainalysis and TRM both look at session-level timing, meaning a cluster of wallets that all become active within minutes of each other and go quiet at the same time, day after day, looks like an automated operation running on a schedule. Human wallets are temporally noisy. Automated wallets are not.

interaction sequence similarity. If twenty wallets all interact with the same set of contracts in the same order over the same week, the probability that this is coincidental is extremely low. This is analogous to browser fingerprinting via installed plugins: any single signal is weak, but the combination of contract A, then contract B, then contract C, then bridge to chain X, then contract D creates a sequence fingerprint. Projects running retroactive analysis can compute the edit distance between interaction sequences across all eligible wallets and flag those with near-zero edit distance as likely sybils.

token and NFT distribution. When a single wallet distributes tokens or NFTs to many other wallets, those receiving wallets are immediately suspicious if they then all perform the same subsequent action. The funding graph and the distribution graph are both analysed. If wallet A mints an NFT and sends it to wallets B through Z, and all of B through Z then bridge to the same chain the next day, that is a behavioural cluster.

gas price and nonce patterns. Automated wallets often use identical gas settings because they are run from the same script with the same configuration. Nonce sequences can also leak information about how wallets are managed. If two wallets always use exactly the same base fee multiplier and the same priority fee, and this pattern holds across dozens of transactions, it becomes a fingerprint for the underlying tooling.

contract creation and bytecode. If an operator deploys smart contracts as part of their farming setup, those contracts may share bytecode or deployment patterns. Chain analysis tools can identify wallets that deployed contracts with identical or near-identical bytecode and cluster them as controlled by the same operator.

Here is a simplified representation of what a timing and sequence feature vector might look like for a cluster of wallets:

# pseudocode: feature extraction for a wallet
def extract_features(wallet_address, chain="arbitrum"):
    txs = get_transactions(wallet_address, chain)
    features = {
        "active_hours": get_active_hour_distribution(txs),
        "contract_sequence": get_contract_interaction_sequence(txs),
        "gas_price_mean": mean([tx.gas_price for tx in txs]),
        "gas_price_std": std([tx.gas_price for tx in txs]),
        "funding_source_hash": hash(get_funding_source(wallet_address, hops=3)),
        "bridge_sequence": get_bridge_usage_sequence(txs),
        "session_gaps": get_inter_session_gap_distribution(txs),
    }
    return features

# clustering: pairwise cosine similarity across all eligible wallets
# wallets with similarity > threshold flagged as same-entity cluster

The actual implementations at Chainalysis and TRM are considerably more sophisticated, using graph neural networks and probabilistic entity resolution, but the conceptual pipeline is the same: extract features, compute similarity, cluster.

worked examples

the LayerZero sybil hunt, 2024. LayerZero ran a community sybil bounty from May to June 2024, with submitters earning 10% of the tokens that would have gone to addresses they successfully flagged. The primary detection method used by top bounty hunters was funding graph analysis combined with interaction sequence matching. One publicly discussed approach involved building a graph of all addresses that had used LayerZero, then identifying all gas funding sources, then walking up the funding tree to find hub addresses that had funded many downstream wallets. Wallets that shared a funding hub within two hops and had interaction sequence similarity above a threshold were submitted as clusters. The final count of self-reported and externally flagged sybil addresses was approximately 2.05 million addresses, with around 800,000 ultimately accepted as valid sybil clusters by the LayerZero team. The bounty mechanism itself was controversial but the underlying technical detection was sound.

zkSync airdrop, June 2024. The zkSync Era airdrop in June 2024 distributed ZK tokens with a sybil filtering pass that ran before snapshot. The filtering criteria published by Matter Labs included explicit flags for wallets that received their initial ETH from a common source, wallets with identical or near-identical transaction count and volume patterns, and wallets that used the same bridges in the same order. One public postmortem from a farming group that got largely wiped out described a cluster of 340 wallets that shared a common Binance withdrawal address as funding source within three hops. All 340 were excluded. The lesson there was not that exchange withdrawals are safe; it is that exchange withdrawals are tracked and the hop distance from a shared source is measured.

Arbitrum airdrop, March 2023. The Arbitrum airdrop used a points-based eligibility system and ran sybil filtering that included a check for wallets with identical transaction timing patterns. Several farming groups reported that their wallets, which had been run through an automated script that executed transactions at fixed intervals (e.g., every Tuesday and Thursday between 09:00 and 10:00 UTC), were flagged because the timing signature was machine-readable. The fix in subsequent campaigns was to introduce randomised delays, but by then the Arbitrum snapshot had already been taken. The estimated number of addresses excluded for sybil-adjacent patterns was not formally published, but community estimates at the time put it in the range of tens of thousands of addresses.

edge cases and failure modes

hop distance is not a fixed standard. Different projects and different chain analysis tools use different hop depths when tracing funding ancestry. Some use two hops, some use five. An operator who tested their separation against a two-hop analysis and concluded they were clean may be visible under a five-hop analysis. There is no public documentation for what hop depth specific projects use, which means assuming you are clean after a surface-level check is risky. The TRM Labs risk intelligence documentation discusses indirect exposure concepts in their financial crime context, and the same logic applies here.

timing randomisation helps but is not sufficient. Adding random delays to automated scripts reduces timing correlation but does not eliminate it. If your twenty wallets all start their weekly activity within a thirty-minute window despite random per-transaction delays, the session-level timing is still clustered. The randomisation needs to operate at the session level, not just the transaction level. Wallets that are genuinely independent humans do not all wake up in the same thirty-minute window.

the interaction sequence fingerprint survives gas price randomisation. A common operator response to fingerprinting concerns is to randomise gas prices. This addresses the gas price signal but does nothing for interaction sequence similarity. If your wallets all do Uniswap swap, then Aave deposit, then Stargate bridge, then repeat, the sequence fingerprint is intact regardless of what gas price multiplier you used.

shared infrastructure leaks. If you run twenty wallets through the same RPC provider (e.g., a custom Alchemy endpoint), the same proxy, or the same VPN exit node, that is an off-chain signal that chain analysis firms can access when they work with partners who have access to that data. On-chain clustering is the primary method, but projects with compliance partnerships can supplement it with IP and infrastructure data. This is discussed in the context of virtual asset service provider obligations in FATF Recommendation 16 guidance. The practical implication is that on-chain separation alone is not a complete operational model if you are also leaving shared infrastructure traces.

counter-strategy: organic-looking behavioural diversity. The most effective counter to behavioural fingerprinting is genuine behavioural diversity, meaning wallets that interact with different protocols in different orders at genuinely different times with different gas configurations and funded through different paths. This is expensive in time and money to achieve at scale, and most operators cannot sustain it across hundreds of wallets. The realistic position is that tighter separation (fewer wallets per funding source, longer time horizons, more diverse activity) reduces clustering risk without eliminating it. There is no method that makes wallets with shared economic origin completely invisible to a determined analyst with graph access. See also the discussion of anti-detect browser limitations at antidetectreview.org/blog/ for the analogous off-chain side of this problem.

what we learned in production

Running wallet operations across multiple campaigns, the clearest lesson is that the clustering risk is not uniform across projects. Projects with large airdrop budgets and reputational stakes in appearing to have broad genuine user bases (zkSync, LayerZero, Arbitrum, Starknet) have real incentive to run serious sybil detection and have either built or licensed tooling to do it. Projects with smaller budgets running through a simpler snapshot process may not be doing graph analysis at all and may rely only on basic filters like minimum transaction count or minimum volume. Calibrating your operational model to the specific project’s detection capability matters more than building a universal maximum-separation setup for everything.

The second lesson is about time horizons. Most of the flagged wallets in the campaigns I have reviewed were flagged on funding ancestry and timing, not on interaction sequence. This suggests that the interaction sequence clustering, while technically sound, is computationally expensive to run at the scale of millions of wallets and may only be applied to wallets that already look suspicious on simpler metrics. Wallets that pass the basic funding and timing checks may not receive the deeper sequence analysis. That is not a guarantee, but it is a practical observation about where detection effort gets concentrated. For more on how multi-wallet operational strategy intersects with these detection concerns, the multiaccountops.com/blog/ community has published several practical frameworks worth reviewing.

The third lesson is that the detection landscape is getting better faster than the evasion toolkit. Graph neural network approaches to wallet clustering, which are now being piloted by at least Chainalysis and TRM based on their published research hiring and product announcements, are substantially harder to evade than heuristic clustering because they learn from labelled examples of known sybil clusters and generalise to previously unseen patterns. An operator who built a playbook based on evading 2022-era heuristics may not be aware that the underlying detection model has been retrained on examples that include that exact playbook. The pace of improvement in detection tooling is one of the better arguments for keeping farming operations small and genuinely diverse rather than scaling up a templated approach.

For operators who want to go deeper on proxy and infrastructure separation in conjunction with on-chain strategy, proxyscraping.org/blog/ covers the residential proxy and datacenter proxy landscape in useful operational detail.

references and further reading

  1. Meiklejohn et al., “A Fistful of Bitcoins: Characterizing Payments Among Men with No Names,” IMC 2013. The foundational academic paper on address clustering heuristics. Still required reading.

  2. FATF, “Updated Guidance for a Risk-Based Approach to Virtual Assets and Virtual Asset Service Providers,” 2021. Sets the compliance context that drives commercial demand for clustering tools.

  3. Chainalysis, “The Chainalysis 2024 Crypto Crime Report”. The annual Chainalysis crime report (available on their blog) describes entity clustering methodology at a high level and is one of the few public windows into how commercial tools approach attribution.

  4. TRM Labs, “Understanding Blockchain Intelligence”. TRM’s resources section includes white papers on indirect exposure and entity resolution that are directly relevant to understanding multi-hop funding analysis.

  5. Buterin, “Proof of Humanity and Sybil Resistance,” ethereum.org community research. Background on why sybil resistance is a fundamental problem in decentralised systems and why on-chain behavioural analysis is the primary practical tool for it.

Written by Xavier Fok

disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.

need infra for this today?