OpenAI-compatible is not cost-compatible
A matching API surface tells you nothing about what a token actually costs to serve. Two endpoints can speak the same protocol and live in different economic universes.
ReadDiscipline under uncertainty.
रियाज़ · प्रणाली · स्वर
AI infrastructure, open-source systems, and Hindustani classical music.
I study and build the systems behind modern AI — especially the messy economics of serving LLMs in production. I'm also a Hindustani classical vocalist trained since childhood, which probably explains the obsession with structure, timing, and improvisation.
≈ USD 1.90
~68% utilization
ताल
illustrative · cost/1M output tokens
latency is tempo · telemetry is drone
Currently
A snapshot, not a CV. Updated when the work moves.
Research
An independent, concurrency-aware look at what inference actually costs — measured, not assumed.
Featured paper · arXiv
Token price is not serving cost.
Public LLM cost calculators often reduce serving economics to a static token price or assumed utilization. This work studies how request rate, concurrency, latency SLOs, hardware, model architecture, and quantization interact to change the real effective cost of self-hosted LLM inference.
Open source
A read-only observer that tells operators the truth about serving cost under their own traffic.
Objective live telemetry + effective cost-per-million-token meter for vLLM servers.
A read-only observer for running vLLM servers that ingests Prometheus metrics and surfaces live effective LLM serving cost against the operator's actual traffic.
Systems + Sound
Before AI, there was riyaz. The habits transfer more than people expect.
Sound
Before AI, there was riyaz.
I started training in Hindustani classical music when I was around five. That training shaped how I think: patience, structure, improvisation, attention, and the ability to stay with complexity for a long time.
Systems
Improvisation inside structure, repetition without boredom, deep listening for the thing that is slightly off. The same instincts run both columns.
Writing
Short, sharp pieces on inference economics, tools, and practice.
A matching API surface tells you nothing about what a token actually costs to serve. Two endpoints can speak the same protocol and live in different economic universes.
ReadCalculators predict a best case you will rarely hit. A meter reads the truth off live telemetry. Why I built the meter instead of another spreadsheet.
ReadWhat years of Hindustani classical practice taught me about systems: improvisation inside constraints, repetition without boredom, and staying with complexity.
ReadArc
Not a résumé timeline — the same instinct showing up in different rooms.
Hindustani classical vocal training starts. Years of repetition, structure, and listening before any code.
Scrapers, ETL over SEC filings, and software systems — learning to model messy real-world data and ship.
Propael and other product experiments. Taste for the painful real problem, not the demo.
A concurrency-aware cost methodology and vllm-cost-meter — measuring what serving actually costs.
Discipline under uncertainty.
For LLM serving economics, open-source telemetry, research collaboration — or just to talk systems and sound.