Centralized Indexers: Blockchain’s Biggest Attack Surface

[2025-12-31] | Shinzō Team

You've audited your smart contracts, maybe twice. You've run fuzzing campaigns, tested edge cases, argued with your team about reentrancy guards that are probably overkill. Your frontend has been stress-tested. Your RPC failover logic has been reviewed. You've thought carefully about key management, wallet integration, transaction simulation.

Then you needed blockchain data. Historical transactions, token balances, event logs across thousands of blocks. The kind of queries that would take minutes against a raw node, if your node could even handle them.

So you did what everyone does. You signed up for an indexer, copied an API key from a dashboard, and moved on. Not because you didn't care about the infrastructure. Because there wasn't another option that shipped.

That's not a failure of diligence. That's a failure of the ecosystem.

The blockchain industry spent a decade obsessing over consensus mechanisms, validator economics, and on-chain governance. We built elaborate systems to ensure no single party can manipulate transaction history or censor writes. Then we handed the read layer to a handful of centralized companies and called it a day.

This is how that happened, and why it's now the single biggest vulnerability in your stack.

How We Got Here

Blockchains aren't databases. They're consensus machines. Every piece of data lives in a structure optimized for verification, not retrieval.

Your token balance isn't stored as a value you can look up. It's the sum of every state change since genesis, computed on demand. Historical DEX volume? Millions of events across thousands of blocks. Running these queries against a raw node is computationally brutal, painfully slow, and operationally expensive. Most teams can't justify the infrastructure. Most users won't tolerate the latency.

Indexers filled the gap. Infura and Alchemy became load-bearing infrastructure for Ethereum. Covalent, QuickNode, and Chainstack followed. They run full nodes, process every block, store the results in queryable databases, and expose APIs that turn complex on-chain archaeology into simple REST calls.

The pitch worked because the pain was real. Why spend months building custom data pipelines when you can plug into an API and ship next week?

But that convenience came with a cost nobody put on the invoice.

The Trust Problem

When your application queries an indexer, you're trusting that they've correctly processed every relevant transaction. That they haven't omitted data through bugs or negligence or something worse. That their infrastructure hasn't been compromised. That they'll still be operating next year, with the same API, at a price you can afford.

None of this is verified cryptographically. You can't prove any of it. You're just trusting.

The industry that built trustless money now trusts Infura to tell it what happened.

Defenders point to reputation and economic incentives. But reputation is not cryptography. Incentives are not proofs. We don't accept "they seem trustworthy" as a security model for consensus. Why do we accept it for data access?

Why You Can't Verify

The trust problem isn't just about provider behavior. It's architectural.

Indexers store blockchain data in conventional databases: relational stores, document databases, proprietary data systems built for centralized applications with controlled access. These systems assume a trusted environment. They provide no mechanism for cryptographic verification of data integrity. They can store data. They cannot prove that data is accurate.

Blockchain data is different. Every transaction is signed. Every block is hashed. State roots enable Merkle proofs. The entire architecture exists so you can verify without trusting.

Indexers take this verifiable data, strip the verification, and store it in systems that can't prove anything about their own contents. The response you get from Alchemy's API could be accurate or fabricated. The bytes look identical either way. The cryptographic chain of trust ends at their infrastructure.

This isn't about whether providers are honest. Their systems cannot offer proof even if they wanted to. You can't verify because the infrastructure wasn't built for verification. The best you can do is trust and hope.

When It Breaks

Infura outages have repeatedly taken down major portions of the Ethereum ecosystem. MetaMask goes dark. dApps show stale data or fail to load entirely. Users can't access funds, not because the blockchain is down, but because the API sitting in front of it is unreachable.

Applications that marketed themselves as decentralized discovered they had a single point of failure they'd never seriously examined. The blockchain kept producing blocks. The validators kept validating. And users sat there staring at loading spinners because one company in one data center had a bad day.

This is what "decentralized" looks like in practice: decentralized writes, centralized reads.

Availability is only the obvious problem. Any infrastructure that can go down can also be compelled to deny service. Any API that returns accurate data can also return manipulated data. The attack surface exists whether or not anyone has exploited it yet.

The Operational Grind

Beyond the architectural issues, there's the daily reality of building on this stack.

Rate limits that throttle your application right when traffic spikes. API deprecations that break production with 30 days notice. Response format inconsistencies across providers that force you to write adapter layers for what should be standardized data. The cold sweat of realizing your primary indexer is having issues and your "fallback" provider has a different data model.

Migration between providers isn't a config change. It's a project. Different endpoints, different authentication, different rate limit structures, different WebSocket implementations. Teams that chose Alchemy two years ago and now want to switch discover their entire backend is coupled to provider-specific quirks they didn't realize they were depending on.

This is vendor lock-in dressed up as developer experience.

The Economics of Extraction

Indexers position themselves as infrastructure providers. Functionally, they're toll collectors.

They sit between blockchain data, which is public, and applications that need it, and they charge for access. The data itself is free. Processing it is expensive. Indexers built businesses capturing the margin.

This creates predictable incentives. Indexers benefit from complexity. They benefit from proprietary APIs that make switching painful. They benefit from being difficult to replace. Open standards and portable data formats would commoditize their business, so don't expect them to lead that charge.

Ecosystems pay twice: once for validators to secure the network, and again for indexers to make that data usable. The indexers don't contribute to network security. They don't stake. They don't validate. They extract value from infrastructure someone else is paying to operate.

For blockchain foundations and ecosystem funds, this is a slow bleed. The cost is distributed across hundreds of applications, buried in operations budgets, rarely totaled up. But a well-funded Series A can easily spend six figures annually on indexing before they've processed a single fee-generating transaction.

Cross-Chain Multiplies the Problem

Multi-chain makes all of this worse.

Every blockchain has different data structures, event formats, and state models. Applications that need data from Ethereum, Arbitrum, Solana, and Cosmos integrate with multiple indexers, each with its own API patterns and trust assumptions. Or they use aggregation services that query multiple indexers under the hood, adding another layer of indirection and another party to trust.

The multi-chain future everyone keeps promising depends on reliable cross-chain data access. Right now that access runs through a patchwork of centralized providers with no unified trust model, no shared verification standard, and no way to prove that data from different chains is even consistent with itself.

We haven't solved this problem. We've just agreed to stop talking about it.

The Core Contradiction

The blockchain industry tells users their transactions are trustless, their assets secured by cryptographic proof. The applications delivering that experience depend on infrastructure that offers neither.

Your smart contract might be immutable. Your transaction might be finalized by thousands of validators. But the data populating your UI comes from a server you can't verify, operated by a company you can't audit, returning results you can't prove. The trustlessness of the underlying chain becomes irrelevant to your actual experience.

This is the gap between what blockchain promises and what it delivers. It exists because we treated indexing as someone else's problem instead of as core infrastructure deserving the same rigor we apply to consensus.

What Comes Next

Validators already process every transaction. Cryptographic primitives for verifiable computation exist. Peer-to-peer networking stacks are mature. The technical pieces are there. What's missing is the will to use them.

The blockchain industry got comfortable with centralized convenience because it shipped faster. We kept pretending the read layer didn't matter because fixing it was hard. The result is an industry that sells trustlessness while routing every query through infrastructure that can't prove anything about itself.

The infrastructure you build on becomes the infrastructure you're stuck with. The question is whether you'll build on foundations designed for verification or keep trusting that Alchemy had a good day.




What if you didn't have to trust? We're building that.
X · Telegram · Discord · GitHub ·  shinzo.network