Discipline under uncertainty.

रियाज़ · प्रणाली · स्वर

Chitral Patil

AI infrastructure, open-source systems, and Hindustani classical music.

I study and build the systems behind modern AI — especially the messy economics of serving LLMs in production. I'm also a Hindustani classical vocalist trained since childhood, which probably explains the obsession with structure, timing, and improvisation.

Read the paper View vllm-cost-meter Explore music

AI Infra
LLM Economics
vLLM
Open Source
Hindustani Classical
Product

live · vllm-cost-meterstreaming

≈ USD 1.90

~68% utilization

ताल

illustrative · cost/1M output tokens

latency is tempo · telemetry is drone

Currently

What I'm in the middle of

A snapshot, not a CV. Updated when the work moves.

Buildingvllm-cost-meter — live, objective cost telemetry for self-hosted LLM inference.
ResearchingHow concurrency, latency SLOs, and hardware change the real cost of serving open-weight models.
WritingField notes on why per-token pricing hides the actual economics of inference.
PracticingHindustani classical vocals — riyaz, the same discipline that shapes the rest.

Research

The real cost of serving LLMs

An independent, concurrency-aware look at what inference actually costs — measured, not assumed.

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

Token price is not serving cost.

Public LLM cost calculators often reduce serving economics to a static token price or assumed utilization. This work studies how request rate, concurrency, latency SLOs, hardware, model architecture, and quantization interact to change the real effective cost of self-hosted LLM inference.

Read the paper Research detail

36.3×Underutilization penaltyeffective cost near idle vs. saturated, same hardware

0.21 → 15.25USD / 1M output tokenseffective cost range on identical H100 hardware

42Benchmark validation runsacross load and concurrency regimes

H100 / A100Hardware validatedtwo GPU classes, measured not assumed

Open source

Build the meter, not another calculator

A read-only observer that tells operators the truth about serving cost under their own traffic.

Flagship · open source

vllm-cost-meter

Objective live telemetry + effective cost-per-million-token meter for vLLM servers.

A read-only observer for running vLLM servers that ingests Prometheus metrics and surfaces live effective LLM serving cost against the operator's actual traffic.

View on GitHub

Reads vLLM Prometheus metrics
Tracks throughput, request rate, TTFT, TPOT, E2E latency, prompt / generation lengths, batch state, and KV cache
Computes live effective cost-per-million-token visibility
Ships benchmark reference curves from the paper
Independent, reproducible research artifact

See all projects

Systems + Sound

Two halves, one discipline

Before AI, there was riyaz. The habits transfer more than people expect.

Sound

Before AI, there was riyaz.

I started training in Hindustani classical music when I was around five. That training shaped how I think: patience, structure, improvisation, attention, and the ability to stay with complexity for a long time.

Explore music

Systems

rhythmconcurrency
tempolatency
raagstructured systems
tanpura dronetelemetry

Improvisation inside structure, repetition without boredom, deep listening for the thing that is slightly off. The same instincts run both columns.

Writing

Field notes

Short, sharp pieces on inference economics, tools, and practice.

All writing →

May 2026LLM economics

OpenAI-compatible is not cost-compatible

A matching API surface tells you nothing about what a token actually costs to serve. Two endpoints can speak the same protocol and live in different economic universes.

Read

Apr 2026telemetry

Meter beats calculator

Calculators predict a best case you will rarely hit. A meter reads the truth off live telemetry. Why I built the meter instead of another spreadsheet.

Read

Mar 2026music

Riyaz and research

What years of Hindustani classical practice taught me about systems: improvisation inside constraints, repetition without boredom, and staying with complexity.

Read

Arc

How the throughline runs

Not a résumé timeline — the same instinct showing up in different rooms.

Age ~5रियाज़
Riyaz begins
Hindustani classical vocal training starts. Years of repetition, structure, and listening before any code.
Earlier workसंकेत
Data & systems
Scrapers, ETL over SEC filings, and software systems — learning to model messy real-world data and ship.
Productअभ्यास
Building things people use
Propael and other product experiments. Taste for the painful real problem, not the demo.
Nowप्रणाली
LLM inference economics
A concurrency-aware cost methodology and vllm-cost-meter — measuring what serving actually costs.

Discipline under uncertainty.

If it serves tokens, it has a cost. Let’s measure it.

For LLM serving economics, open-source telemetry, research collaboration — or just to talk systems and sound.

Contact / collaborate Read the paper

Chitral Patil

What I'm in the middle of

The real cost of serving LLMs

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

Build the meter, not another calculator

vllm-cost-meter

Two halves, one discipline

Field notes

OpenAI-compatible is not cost-compatible

Meter beats calculator

Riyaz and research

How the throughline runs

Riyaz begins

Data & systems

Building things people use

LLM inference economics

If it serves tokens, it has a cost. Let’s measure it.