LibrarySolana Infra Deep Mastery0%
6 months5 phasesbyRohit·@rohitdevsol0%

Solana Infra Deep Mastery

Geyser → Turbine → sBPF → Gossip → Jito · 6 Months · Pure Rust · No AI-Assisted Code

Five production projects that go deeper than any public Solana course: a Geyser-backed custom indexer, a raw Turbine shred consumer, a hand-written sBPF interpreter, a validator health monitor with RPC routing, and an atomic Jito liquidation engine. Every concept in Rust, from source.

Phase 1Months 1–2

Geyser Plugin + Custom Indexer

Build a production-grade Geyser plugin in Rust that hooks into a local validator (or devnet via Yellowstone gRPC as fallback), streams account and transaction updates for a target program, writes them to Postgres with slot-consistency guarantees, and exposes a minimal RPC surface (getProgramAccounts, getSignaturesForAddress) served entirely from your own index. No third-party RPC dependency for the indexed program. The GeyserPlugin trait is small enough to fit in your head in a week — but the validator's data model (AccountsDB, BankForks, slot pipeline, rooted vs confirmed vs processed) is exactly what you need to understand before touching sBPF.

Week-by-Week Plan

Week 1

Geyser Plugin Trait + Validator Data Model

Read the GeyserPlugin trait in agave/geyser-plugin-interface/src/geyser_plugin_interface.rs — every method signature and doc comment. Understand the notification lifecycle: on_load → on_account_update → on_transaction_notified → on_slot_status_update. Critical: slot statuses — Processed (optimistic), Confirmed (supermajority voted), Rooted (finalised, cannot be rolled back). Write a skeleton plugin that logs every notification type with slot number, status, and pubkey. Load it into a local test-validator via --geyser-plugin-config. Checkpoint: see rooted slots advancing in your logs.

Week 2

AccountsDB + BankForks Internals

AccountsDB stores accounts in append-only files (AppendVec) keyed by (pubkey, slot). AccountsIndex maps pubkey → list of (slot, storage_location). When a slot roots, stale entries are purged. BankForks: the validator maintains a tree of Bank objects, one per slot fork. The replay stage roots confirmed forks and prunes dead ones. Your plugin sees updates from all forks — only emit to Postgres when status becomes Rooted (or Confirmed if you accept reorgs). Read agave/accounts-db/src/accounts_db.rs and agave/runtime/src/bank_forks.rs. Build a fork-aware buffer: hold account updates in memory keyed by (slot, pubkey), flush to Postgres only on root notification.

Week 3

Postgres Integration with Slot-Consistency Guarantees

Schema: accounts(pubkey, slot, lamports, owner, data, write_version), transactions(signature, slot, block_time, meta, message), slot_status(slot, status, parent). Use Postgres UPSERT (INSERT ... ON CONFLICT DO UPDATE) keyed on (pubkey, slot) for accounts. Write a Rust worker thread per table using tokio::sync::mpsc channels from the plugin callbacks — plugin callbacks must return fast, never block on DB. Implement a compaction job: after a slot roots, delete all non-rooted versions of updated pubkeys. Add getProgramAccounts: SELECT * FROM accounts WHERE owner = $1 AND slot = (SELECT MAX(slot) FROM slot_status WHERE status = 'rooted').

Week 4

RPC Surface + Yellowstone gRPC Fallback

Expose getSignaturesForAddress via a minimal axum HTTP server. Build the Yellowstone gRPC fallback path: when running without a local validator, connect to a Yellowstone endpoint (Helius or Triton) via tonic, subscribe to the same program's account and transaction updates, and feed them through the same flush pipeline. This teaches you the gRPC streaming model and protobuf deserialization. Write integration tests against devnet: verify that your index matches RPC getAccountInfo for 10 known pubkeys. Benchmark write throughput with cargo flamegraph. Deliverable: tested, benchmarked, README written.

Build Projects

  • Custom Geyser Indexer: Rust dylib implementing GeyserPlugin; fork-aware account buffer (HashMap<(Slot, Pubkey), AccountData>); Postgres writer threads via mpsc — plugin callbacks never block; slot_status tracking drives flush vs discard; getProgramAccounts + getSignaturesForAddress via axum; Yellowstone gRPC client as devnet fallback via tonic+protobuf. Integration-tested against devnet with 10 known pubkeys verified against RPC. Cargo flamegraph benchmark for write throughput.
Phase Exit Criteria

Your index's getProgramAccounts output matches Helius RPC for the target program over a 1-hour window with zero discrepancies.

Phase 2Months 2–3

Turbine Shred Consumer

Build a standalone Rust binary that subscribes to a validator's Turbine UDP port, receives data and coding shreds, reconstructs entry batches using Reed-Solomon erasure coding, deserialises entries into transactions, and writes them to disk or a channel — with no RPC involved. Raw block data from the wire. Nobody in the Indian Solana ecosystem has published a working standalone shred reconstructor. Your C/ELF/binary background maps directly: shreds are packed binary structs with precise field offsets, and the Agave source is your spec.

Week-by-Week Plan

Week 5

Shred Format + UDP Receive Loop

Shred anatomy: 1228-byte fixed-size packets, split into data shreds and coding shreds per FEC set. Since Agave ~1.16, all new shreds are Merkle shreds — the legacy chained variant is gone. Parse the ShredVariant byte (first byte after common header) to detect Legacy vs Merkle. Common header: slot (8), index (4), version (2), fec_set_index (4). Merkle data header adds: data_size (2), flags (1), parent_offset (2), reference_tick (1). Merkle coding header: num_data_shreds (2), num_coding_shreds (2), position (2). Build a UDP socket listener on port 8001 (devnet) or discover the TVAN port via solana gossip show-validators. Parse at least 100 shreds correctly before moving to RS reconstruction.

Week 6

Reed-Solomon FEC Set Reconstruction

Each FEC set contains n_data data shreds + n_coding coding shreds. You can reconstruct the full set from any n_data shreds. Use the reed-solomon-erasure crate — don't implement RS from scratch. Buffer shreds by (slot, fec_set_index). Once you have n_data shreds (data or coding), call ReedSolomon::reconstruct_data. Verify Merkle proof: Agave's Merkle shred includes a proof path from each shred's data hash up to the root. The root is signed by the leader's identity keypair. Reconstruct the Merkle tree and verify the root signature with ed25519-dalek. Common failure: shred index ordering — data shreds must be 0..n_data, coding shreds n_data..n_data+n_coding. Verify FEC set boundaries rigorously.

Week 7

Entry Deserialization + Output Pipeline

After FEC reconstruction you have raw entry bytes. Entries are bincode-serialized: tick count (u64) + PoH hash (32 bytes) + vec of transactions. Each transaction: signatures vec + message (header + account keys + recent blockhash + instructions). Deserialize using bincode::deserialize::<Vec<Entry>>. Write entries to a tokio::sync::broadcast channel so downstream consumers can subscribe. Add a file sink that writes entries as newline-delimited JSON for inspection. Checkpoint: use solana-ledger-tool to extract entries from a devnet snapshot for the same slot range and diff them against your reconstructed entries. Build a simple slot latency metric: time from first shred received to entry deserialized.

Build Projects

  • Turbine Shred Consumer: tokio UDP socket on Turbine port; parse ShredVariant byte, common header, Merkle data/coding headers; FEC set buffer (HashMap<(Slot, FecSetIndex), ShredSet>); RS reconstruction via reed-solomon-erasure when n_data threshold met; Merkle proof verification via ed25519-dalek; bincode Entry deserialization; broadcast channel for downstream consumers; JSON file sink; slot latency metric. Diffed against solana-ledger-tool output for same slot range.
Phase Exit Criteria

Reconstruct 10 consecutive slots from devnet UDP without using any RPC call. Entry hashes must match solana-ledger-tool verify output.

Phase 3Months 3–4

sBPF Interpreter — Mini SVM

Build a minimal sBPF virtual machine in Rust: ELF loader for SBF object files, full sBPF ISA decode (ALU64, JMP, LD/ST, CALL, EXIT), a register file (r0–r10 + pc), a stack, a heap, and a syscall dispatch table covering sol_log_, sol_log_64_, sol_alloc_free_, and sol_invoke_signed_. Goal: load a compiled Anchor program's .so, execute a known instruction, verify account mutations match what the real validator produces. This is the capstone skill that separates infra engineers from program developers — understanding how the SVM executes programs at the VM level makes every other Solana problem easier. Your C/Linux/ELF background maps directly: sBPF ELF loading is simpler than native ELF, and the ISA is a strict subset of eBPF.

Week-by-Week Plan

Week 8

ELF Loader + sBPF ISA Decode

sBPF ELF format: SHF_ALLOC sections only, no dynamic linking, no relocations (except R_BPF_64_64 for map references). The .text section contains sBPF instructions. Use the goblin crate to parse the ELF header and locate .text. sBPF instructions are 8 bytes: opcode (1) + regs (1, dst:4 + src:4) + offset (2) + imm (4). Implement a decode table for all 90 opcodes. ALU64 class: ADD, SUB, MUL, DIV, OR, AND, LSH, RSH, NEG, MOD, XOR, MOV, ARSH — both register and immediate variants. JMP class: JA, JEQ, JNE, JLT, JLE, JGT, JGE, JSLT, JSLE, JSGT, JSGE, CALL, EXIT. LD/ST class: LDXW, LDXH, LDXB, LDXDW, STXW, STXH, STXB, STXDW. Print disassembly of a hello-world .so as a milestone.

Week 9

Register File, Stack, Heap + Execution Loop

Register file: 11 64-bit registers r0–r10. r1–r5 are argument registers for CALLs. r10 is the read-only frame pointer. Stack: 512 bytes per stack frame, 64 frames max (32KB total). On CALL, push current frame pointer and return address; on EXIT, pop. Heap: sol_alloc_free_ manages a bump allocator — start at a fixed offset above stack, grows up, never frees individual allocations. The execution loop: fetch 8 bytes at pc, decode, execute, advance pc. Implement bounds checking on every memory access — sBPF programs are untrusted. Verify your loop against the test programs in solana-rbpf/tests/. Milestone: execute a simple counter increment program and verify r0 holds the expected return value.

Week 10

Syscall Dispatch Table + Account Mutation Verification

Syscalls are CALL instructions targeting a special hash-based index (not a function offset). sol_log_ (hash 0x207559bd): read C string from r1/r2, print. sol_log_64_ (hash 0x5c2a3178): log five u64 registers. sol_alloc_free_ (hash 0xa22b9c85): bump allocate r1 bytes, return pointer in r0 or 0 if r2 != 0 (free is a no-op). sol_invoke_signed_ (hash 0x83f00e8f): cross-program invocation stub — serialize the AccountMeta slice, find the callee program, recurse into a new VM instance. Load a real compiled Anchor counter program (.so), construct a synthetic AccountInfo slice matching its expected accounts, execute the increment instruction, verify the account data mutation matches the real validator's output from getAccountInfo. Read solana-rbpf AFTER you've struggled — focus on vm.rs and memory_region.rs.

Build Projects

  • Mini SVM: goblin-based ELF loader for .so files (SBF target); 90-opcode sBPF decode table; register file r0-r10, 32KB stack with frame push/pop, bump-allocator heap; memory bounds checker on every LD/ST; syscall dispatch table (sol_log_, sol_log_64_, sol_alloc_free_, sol_invoke_signed_); CPI recursion via nested VM instances. Executes a compiled Anchor counter .so, account mutations verified against real validator output.
Phase Exit Criteria

Load any public devnet program's .so, execute one instruction, and have the account data mutation match getAccountInfo within ±1 slot.

Phase 4Months 4–5

Validator Health Monitor + RPC Router

Build a Rust daemon that (1) monitors a set of validators via gossip or RPC for vote credits, skip rate, delinquency, and slot lag; (2) exposes a Prometheus endpoint with per-validator metrics; (3) acts as a smart RPC proxy that routes incoming JSON-RPC requests to the healthiest available validator with automatic failover. This is the tooling that every professional Solana operation needs and almost nobody has published — it goes deep into the Agave gossip protocol (CRDS), slot commitment tracking, and hysteresis-based routing.

Week-by-Week Plan

Week 11

Gossip Protocol + CRDS Parsing

Solana's gossip protocol is a CRDS (Cluster Replicated Data Store) — a push/pull gossip network. Each node maintains a local CRDS table of CrdsValue entries. Key entry types: ContactInfo (node's IP, ports, pubkey, shred version), Vote (vote transactions), NodeInstance (restart counter), SnapshotHashes. Connect to devnet entrypoint (entrypoint.devnet.solana.com:8001) via UDP. Implement the gossip ping/pong handshake (serialized bincode packets). Request CRDS pull from a known peer: send CrdsFilter + a pull request packet. Parse CrdsValue::ContactInfo to discover validator IPs and TPU/TVU/RPC ports. Deliverable: binary that discovers all devnet validators via gossip without using getVoteAccounts or any RPC call. Read agave/gossip/src/crds.rs and crds_value.rs.

Week 12

Metrics Collection + Skip Rate Computation

Skip rate: fraction of leader slots where the validator failed to produce a block. Fetch from getLeaderSchedule (current epoch leader assignments) and getBlocksWithLimit for your validators. skip_rate = (assigned_slots - produced_slots) / assigned_slots. Track over a rolling 100-slot window. Vote credits per epoch: from getVoteAccounts, field 'epochCredits' — delta from epoch N to N+1. Slot lag: (cluster_slot - validator_slot). Expose all metrics on a /metrics endpoint in Prometheus text format using the prometheus crate. Implement delinquency detection: a validator is delinquent if it has not voted in the last N slots (Agave default: 150). Run solana-watchtower in parallel to validate your delinquency alerts match.

Week 13

Smart RPC Proxy with Health-Aware Failover

Build an axum HTTP proxy that accepts JSON-RPC requests and forwards to the best available backend. Health score: composite of (1 - skip_rate) * 0.4 + vote_credits_rate * 0.4 + (1 - slot_lag / max_lag) * 0.2. Use hysteresis: once a backend is marked unhealthy (score < 0.3), it stays removed until score exceeds 0.6 to prevent flapping. Implement sticky sessions for getAccountInfo/getSignaturesForAddress — route all requests for the same pubkey to the same backend within a session to avoid reading stale state across forks. Handle backend errors with exponential backoff retry to the next best backend. Track P50/P99 RPC latency per backend via a sliding window. Deliverable: proxy handles 1000 req/s without adding >5ms median latency.

Build Projects

  • Validator Health Daemon: UDP gossip client parsing CRDS ContactInfo without RPC; skip rate computation over rolling 100-slot window; vote credits delta tracking per epoch; slot lag monitor; Prometheus /metrics endpoint (prometheus crate); axum JSON-RPC proxy with composite health score routing; hysteresis-based failover (unhealthy < 0.3, recover at > 0.6); sticky session routing for account-read methods; P50/P99 latency histograms per backend. Handles 1000 req/s with <5ms added median latency.
Phase Exit Criteria

Proxy correctly routes away from a validator you manually pause (kill -STOP) within 3 seconds, and routes back within 30 seconds of resume.

Phase 5Months 5–6

Atomic Liquidation Engine

Build a Rust engine that (1) uses your Phase 1 Geyser plugin to watch positions in a lending protocol (Marginfi or a mock perp program) approaching liquidation; (2) fetches price directly from Pyth oracle accounts via the plugin stream (no HTTP polling); (3) constructs a liquidation transaction with a Jito tip; (4) submits as an atomic Jito bundle. The full pipeline runs in Rust from account update to bundle submission with a sub-100ms latency budget. This phase combines everything: Geyser streaming, account deserialization, transaction construction, and the Jito MEV supply chain.

Week-by-Week Plan

Week 14

Pyth Oracle Parsing + Position Monitoring

Parse Pyth PriceUpdateV2 accounts by hand — do not use the pyth-sdk-solana crate, parse the struct offsets directly. PriceAccount layout (V1/V2): magic (4 bytes, 0xa1b2c3d4), version (4), atype (4), size (4), price_type (4), exponent (4), num_component_prices (4), num_quoters (4), last_slot (8), valid_slot (8), twap (SmaInfo: 24 bytes), twac (24), drv1/drv2 (8+8), product (32), next (32), prev_slot (8), agg (PriceInfo: 32 bytes, price i64 + conf u64 + status u32 + corp_act u32 + pub_slot u64). Subscribe to SOL/USD, ETH/USD, BTC/USD feed accounts via your Phase 1 Geyser plugin. Verify price updates every slot against Pyth's REST API. Write a Marginfi position deserializer — read the Marginfi IDL and manually decode account data without using the Marginfi SDK.

Week 15

Liquidation Detection + Transaction Construction

Health factor computation: for each Marginfi position, health = sum(collateral_value * weight) / sum(liability_value / weight). When health < 1.0, the position is liquidatable. Build a position registry that tracks all open positions by subscribing to the Marginfi bank program account updates via Geyser. Maintain a min-heap of positions ordered by health factor — positions near 1.0 go to the top. When health drops below 1.02 (early warning), pre-construct the liquidation transaction: (1) flash-loan the liability asset if needed, (2) repay the liability CPI into Marginfi, (3) seize collateral. Use AddressLookupTables to fit all accounts in one tx. Simulate the transaction via simulateTransaction before submitting — verify expected collateral seized and profit.

Week 16

Jito Bundle Submission + Sub-100ms Pipeline

Jito bundle mechanics: 1–5 transactions submitted atomically. Transaction N+1 can only land if N lands. Include a tip transaction to the Jito tip distribution program (one of 8 tip accounts). Tip amount should be ~50% of expected profit — run experiments to find the minimum tip that lands. Use jito-searcher-client crate (gRPC) to submit bundles to the Jito Block Engine. Implement the full timing budget: Geyser callback → position update → health check → tx construct + sign → gRPC submit should complete in under 100ms. Use Tokio's tokio::time::Instant profiling at each stage. Tune with: pre-signing keypairs, pre-fetching blockhash every 400ms, pre-building instruction templates. Track bundle landing rate and compare tip efficiency curves. Stretch: subscribe to Jito's mempool feed to detect competing liquidation bots and outbid them.

Build Projects

  • Atomic Liquidation Engine: Geyser subscriber watching Marginfi bank program accounts; Pyth oracle account deserializer (hand-parsed struct offsets, no SDK); position registry with min-heap ordered by health factor; liquidation tx template with ALT pre-built; health factor re-computation on every account update; Jito bundle: [liquidation_tx, tip_tx] via jito-searcher-client gRPC; blockhash pre-fetched every 400ms; full pipeline latency <100ms from Geyser callback to bundle submission. Bundle landing rate and tip efficiency tracked over 100 liquidations.
Phase Exit Criteria

Successfully liquidate at least 5 real undercollateralised positions on devnet/mainnet-fork within the 100ms latency budget. Bundle landing rate >60%.