Your Blockchain Node Doesn't Index Itself

[2026-02-06] | Shinzō Team

Your Blockchain Node Doesn't Index Itself

The conversation follows a predictable pattern. A developer complains about indexer downtime, rate limits, or data accuracy issues. Someone responds with what sounds like wisdom from blockchain's early days: "Just run your own node."

The suggestion carries an appealing logic. Blockchains are supposed to be decentralized. If you're depending on a centralized indexer, you've reintroduced the trust problem that blockchains were meant to solve. Running your own node sounds like reclaiming that original vision of self-sovereignty.

Except it doesn't work. Not because running nodes is bad, but because the advice misunderstands what indexers actually do. A node gives you raw blockchain access. An indexer gives you queryable, structured data derived from that raw access. These are different things, and solving for one doesn't automatically solve for the other.

What Running a Node Actually Gets You

A full node participates in the blockchain network. It downloads blocks, validates transactions, and maintains the current state. Depending on configuration, it might also store historical data going back to the genesis block.

This gives you direct, trustless access to several things: the current state of any account or contract, the ability to broadcast transactions to the network, raw block data as it arrives, and basic queries against the current state tree. For certain use cases, this is exactly what you need. Validators obviously need nodes. Some enterprise compliance requirements mandate direct blockchain access. Specific security contexts might require eliminating all intermediaries.

But notice what's missing from that list. There's no indexed historical data, no aggregated views across multiple contracts, no way to efficiently query event logs at scale. You can't ask "What were all the transfers of this token in the last month?" without scanning millions of blocks yourself. You can't get "total trading volume across these liquidity pools" without building the aggregation pipeline from scratch.

A node gives you the raw stream. Your application needs answers.

The Indexing Gap

Suppose you run your own node and want to answer a simple question: What's the current token balance for a specific address? For native tokens, the node can answer this directly. For ERC-20 tokens, you need to query the token contract's state. Still straightforward.

Now try a slightly harder question: What's the complete transaction history for this address? The node stores transactions organized by block, not by address. To answer this question, you need to scan every block since the address first appeared, check every transaction, and collect the relevant ones. For an active address on a chain with millions of blocks, this takes hours or days.

This is what indexers do. They pre-process blockchain data into structures optimized for application queries. They maintain secondary indices so that address lookups are instant. They aggregate events across contracts. They compute derived metrics. They turn an append-only log into something resembling a traditional database.

Running your own node doesn't give you an indexer. It gives you the raw input that an indexer needs. You've replaced a third-party indexer dependency with the need to build a first-party indexer. All the complexity remains. All the code still needs to be written. Now it's entirely your responsibility to build, deploy, debug, and maintain.

The Operational Reality

Even setting aside the indexing gap, running blockchain infrastructure is operationally demanding. An Ethereum archive node requires multiple terabytes of storage, with that number growing by hundreds of gigabytes annually. Initial sync takes days or weeks depending on your hardware and network connection. The hardware requirements are substantial: fast NVMe drives, significant RAM, and consistent bandwidth. These requirements increase over time as chains accumulate more history.

Then there's ongoing maintenance. Node software needs regular updates, sometimes urgently when critical vulnerabilities are discovered. Chains hard fork and require coordinated upgrades within specific time windows. Miss an upgrade deadline and your node falls out of consensus. Monitoring and alerting are essential because a node that falls behind or crashes silently is worse than no node at all. It will serve stale data without announcing that anything is wrong. Incident response becomes your team's problem, including the 2 AM pages when something breaks.

If you've ever been through this, you know the frustration. You didn't get into blockchain development to babysit infrastructure. Every hour spent debugging sync issues or upgrading node software is an hour not spent on your actual product. And yet here you are, because someone told you this was the path to sovereignty.

One node is also a single point of failure. For production reliability, you need redundancy. That means multiple nodes, load balancing, geographic distribution, and failover logic. You're now building the same infrastructure that the indexer providers built, except they've amortized that cost across thousands of customers.

For teams that support multiple chains, multiply these requirements accordingly. A node for Ethereum. A node for Polygon. A node for Arbitrum. Each with its own software, sync requirements, and maintenance burden. The operational complexity scales faster than linearly.

The Cost Calculation

Sometimes the math works out. Large organizations with dedicated infrastructure teams might find that running their own nodes costs less than paying indexer fees. The break-even point depends on query volume, number of chains, and how you value engineering time.

For most teams, the math doesn't work. Indexer services exist precisely because running this infrastructure is expensive and specialized. The services aggregate demand across many customers, achieve economies of scale, and offer their expertise at a fraction of what it would cost each customer to build independently.

But cost isn't the only consideration. Even when the direct financial comparison favors self-hosting, there's opportunity cost to account for. Engineering hours spent maintaining blockchain infrastructure are hours not spent on product development. Attention directed toward node operations is attention diverted from core business problems. The true cost includes everything your team doesn't build while they're keeping nodes running.

The Verification Problem Remains

This is where the "run your own node" advice most clearly misses the point. The fundamental problem with centralized indexers isn't that someone else operates them. It's that their output can't be cryptographically verified.

When an indexer tells you a token balance or transaction count, you have no way to verify that answer without re-doing all the computation yourself. You trust the indexer because of reputation, not cryptographic proof. This trust requirement exists regardless of who operates the indexer.

Think about what verification means elsewhere in blockchain. When you receive a transaction, you can verify its validity against the chain's state root. When you query a Merkle proof, you can mathematically confirm inclusion. These guarantees exist because the blockchain provides cryptographic evidence alongside its data. Indexers provide no such guarantees. They return JSON responses that you either believe or don't.

Running your own node and building your own indexer doesn't eliminate this problem. It moves the trust from a third party to your own infrastructure. You've reduced counterparty risk, which has value, but you haven't created a trustless system. Your users still can't verify your indexer's output. Your own team can't verify that the indexer processed everything correctly without manual audits. Bugs in your indexing code will produce incorrect data that looks identical to correct data.

Self-hosted centralized infrastructure is still centralized infrastructure. You've changed who you trust, not whether trust is required.

Toward Actual Solutions

The instinct behind "run your own node" is correct. Blockchain applications shouldn't depend on trusted intermediaries for data access. The entire point of building on blockchains is that trust can be replaced with verification. But running your own centralized infrastructure doesn't create decentralization. It just changes who controls the centralized infrastructure.

What's actually needed is infrastructure where cryptographic proofs make trust unnecessary. Data pipelines that produce verifiable outputs. Indexing systems where the derivation from blockchain state can be mathematically confirmed.

The path forward starts with validators. They already process every transaction and maintain blockchain state. They're the most authoritative source of blockchain data. Embedding indexing within validator infrastructure eliminates the need for separate parties to reconstruct the same information. The computational work is already being done. Indexing becomes an extension of validation rather than a separate, redundant process.

From there, indexed data can be stored with cryptographic proofs of its derivation from on-chain state. These proofs allow any party to verify accuracy without trusting the indexer. Peer-to-peer distribution ensures that no single party controls access. Content addressing provides integrity guarantees at every layer. The architecture mirrors the blockchains it indexes rather than contradicting them.

This is what Shinzo is building. Not another centralized indexer. Not tools to help you run your own centralized indexer. Infrastructure that makes the trust problem disappear by making verification possible from validator to application.

The next time someone suggests running your own node as the solution to indexer problems, you'll know what they're actually proposing: substantial operational overhead, persistent complexity, and a trust model that's only marginally better than what you had before. The instinct toward self-sovereignty is correct. The implementation path just leads somewhere else.


What if you didn't have to choose between running your own infrastructure and trusting someone else's? We're building that. Come build with us: X · Telegram · Discord · GitHub