Skip to content
Chitral Patil

Discipline under uncertainty.

रियाज़ · प्रणाली · स्वर

Chitral Patil

AI infrastructure, open-source systems, and Hindustani classical music.

I study and build the systems behind modern AI — especially the messy economics of serving LLMs in production. I'm also a Hindustani classical vocalist trained since childhood, which probably explains the obsession with structure, timing, and improvisation.

  • AI Infra
  • LLM Economics
  • vLLM
  • Open Source
  • Hindustani Classical
  • Product
live · vllm-cost-meterstreaming
λC_effTTFTTPOTKV

≈ USD 1.90

~68% utilization

ताल

illustrative · cost/1M output tokens

latency is tempo · telemetry is drone

Currently

What I'm in the middle of

A snapshot, not a CV. Updated when the work moves.

  • Buildingvllm-cost-meter — live, objective cost telemetry for self-hosted LLM inference.
  • ResearchingHow concurrency, latency SLOs, and hardware change the real cost of serving open-weight models.
  • WritingField notes on why per-token pricing hides the actual economics of inference.
  • PracticingHindustani classical vocals — riyaz, the same discipline that shapes the rest.

Research

The real cost of serving LLMs

An independent, concurrency-aware look at what inference actually costs — measured, not assumed.

Featured paper · arXiv

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

Token price is not serving cost.

Public LLM cost calculators often reduce serving economics to a static token price or assumed utilization. This work studies how request rate, concurrency, latency SLOs, hardware, model architecture, and quantization interact to change the real effective cost of self-hosted LLM inference.

36.3×Underutilization penaltyeffective cost near idle vs. saturated, same hardware
0.21 → 15.25USD / 1M output tokenseffective cost range on identical H100 hardware
42Benchmark validation runsacross load and concurrency regimes
H100 / A100Hardware validatedtwo GPU classes, measured not assumed

Open source

Build the meter, not another calculator

A read-only observer that tells operators the truth about serving cost under their own traffic.

Flagship · open source

vllm-cost-meter

Objective live telemetry + effective cost-per-million-token meter for vLLM servers.

A read-only observer for running vLLM servers that ingests Prometheus metrics and surfaces live effective LLM serving cost against the operator's actual traffic.

  • Reads vLLM Prometheus metrics
  • Tracks throughput, request rate, TTFT, TPOT, E2E latency, prompt / generation lengths, batch state, and KV cache
  • Computes live effective cost-per-million-token visibility
  • Ships benchmark reference curves from the paper
  • Independent, reproducible research artifact

Systems + Sound

Two halves, one discipline

Before AI, there was riyaz. The habits transfer more than people expect.

Sound

Before AI, there was riyaz.

I started training in Hindustani classical music when I was around five. That training shaped how I think: patience, structure, improvisation, attention, and the ability to stay with complexity for a long time.

Systems

  • rhythmconcurrency
  • tempolatency
  • raagstructured systems
  • tanpura dronetelemetry

Improvisation inside structure, repetition without boredom, deep listening for the thing that is slightly off. The same instincts run both columns.

Arc

How the throughline runs

Not a résumé timeline — the same instinct showing up in different rooms.

  1. Age ~5रियाज़

    Riyaz begins

    Hindustani classical vocal training starts. Years of repetition, structure, and listening before any code.

  2. Earlier workसंकेत

    Data & systems

    Scrapers, ETL over SEC filings, and software systems — learning to model messy real-world data and ship.

  3. Productअभ्यास

    Building things people use

    Propael and other product experiments. Taste for the painful real problem, not the demo.

  4. Nowप्रणाली

    LLM inference economics

    A concurrency-aware cost methodology and vllm-cost-meter — measuring what serving actually costs.

Discipline under uncertainty.

If it serves tokens, it has a cost. Let’s measure it.

For LLM serving economics, open-source telemetry, research collaboration — or just to talk systems and sound.