OquiliaOquiliaOquilia — India's Financial Intelligence Platform
Calculators
Compare
Tax
NRI
News
Consult
Oquilia Advisor
HomeCalculatorsConsultNews

Talk to Subodh Bajpai · Advocate

Free 15-min phone consultation. No payment, no signup.

+91 84008 60008Or view paid consultations from ₹5,000 →
View All CalculatorsSIP CalculatorEMI CalculatorIncome TaxFD CalculatorPPF CalculatorAll 150+ Calculators
View All CompareHome Loan RatesPersonal LoansCredit CardsHealth InsuranceTerm InsuranceMutual FundsFD RatesEducation Loan
View All TaxOld vs New RegimeTax Saving under 80CIncome Tax Slabs 2025Capital Gains TaxSave Tax on SalaryITR Filing Guide
View All NRINRI Investment GuideNRI Tax FilingNRI Banking & NRE FDNRI Real EstateDTAA CalculatorNRE FD Calculator
View All NewsLatest NewsSubodh's Law ColumnSARFAESI DefenceBlog / GuidesReports
View All ConsultFree 15-min call · +91 84008 60008DTAA Review · ₹5,000FEMA Compounding · ₹15,000NRI Tax Filing Review · ₹7,500About Subodh Bajpai, Advocate
View All ToolsAm I Underinsured?Policy AuditJargon DecoderMutual Fund Discovery
For Business
View All LearnFinancial GlossaryFAQAbout OquiliaContact
Oquilia Advisor
  1. Home
  2. News
  3. A Miami Startup Says It Slashed LLM Costs. Should India Care?
News

A Miami Startup Says It Slashed LLM Costs. Should India Care?

Subquadratic claims its sparse-attention model runs 56x faster and cuts long-context inference from $2,600 to $8. The receipts are now trickling out, and the stakes are highest where compute is dearest.

Oquilia Newsroom
Financial news desk covering SEBI, RBI, IRDAI, and Budget-related developments.
|3 min read · 740 words
Verified Sources|Last reviewed: 19 June 2026
A Miami Startup Says It Slashed LLM Costs. Should India Care? — Startups on Oquilia

The News

Subquadratic, a Miami-based startup that emerged from stealth in May 2026, says it has cracked one of the oldest cost problems in modern AI: the way transformer models slow to a crawl as the text they process grows longer. After weeks of thin detail and loud scepticism, the company has begun publishing benchmarks to back its claim.

The firm, led by chief executive Justin Dangel and chief technology officer Alex Whedon, is targeting what engineers call quadratic attention scaling. In a standard model, doubling the input roughly quadruples the computation. A 10,000-word passage already demands around 50 million multiplications. Subquadratic's answer, which it calls dynamic sparse attention, multiplies only the token pairs it judges relevant rather than every possible combination.

The headline figures are striking. The company reports its model is 56 times faster than systems built on FlashAttention, scores 89.7% on the LiveCodeBench coding test, and handles context windows of up to 12 million tokens against roughly one million for leading rivals. On a long-context retrieval task, it cites 98% accuracy at the six-to-twelve-million-token range. Most arresting of all is the price: $8 to run the RULER 128 test, versus $2,600 for Anthropic's Opus 4.6.

Why It Matters

If the numbers hold, the implication is not a smarter model but a far cheaper one. Inference cost, the price of actually running a model after training, has quietly become the binding constraint on whether AI products make money.

The caveats are real. Independent researcher Will Depue likened the claim to "running a four-minute mile" - not impossible, but extraordinary. Jeanine Sinanan-Singh of the data firm Appen, which validated some results, noted the awkward truth that shocking numbers are hard to credit when the company reports them itself. And Subquadratic reused weights from Qwen, an open-source Chinese model, rather than training from scratch, which dents its story of a clean architectural reinvention.

The pattern is familiar. When efficiency breakthroughs land, the field tends to absorb the idea rather than the company. FlashAttention itself began as an academic technique before becoming an industry default.

Indian Angle

Nowhere does cheaper inference matter more than in markets that price in rupees. Indian AI startups serve users who will not pay developed-world subscription fees, which makes the gap between a $2,600 run and an $8 run less a curiosity than a survival question. Long context windows also suit Indian workloads directly: parsing lengthy legal contracts, multilingual government documents and dense regional-language corpora is exactly where token counts balloon.

Domestic model builders such as Sarvam and Krutrim have leaned on efficiency rather than raw scale, partly out of necessity given limited access to top-tier GPUs. A proven sparse-attention method would be a gift to that strategy, lowering the compute bill for firms that cannot match American capital. It would also strengthen the case behind the IndiaAI Mission, the government's roughly Rs 10,372 crore programme, which is betting that India can compete on cleverness and cost rather than brute spend.

There is a cautionary note too. Subquadratic's reuse of Qwen weights mirrors a live Indian debate: how much of a "sovereign" model is genuinely homegrown, and how much is fine-tuning atop someone else's open weights. For regulators at MeitY weighing what counts as indigenous AI, the distinction is about to get harder to police, not easier.

FAQ

What exactly did Subquadratic claim to solve?

The quadratic attention bottleneck, where computation grows roughly fourfold each time input length doubles. Its dynamic sparse attention processes only token pairs it deems relevant, which the company says cuts cost and lets context windows stretch to 12 million tokens.

Are the benchmark numbers independently confirmed?

Partly. The data firm Appen validated some results, and the company has published figures such as 89.7% on LiveCodeBench. But researchers remain cautious, and the model is not yet broadly available for outside testing despite long waitlists.

Why does this matter for Indian startups?

Inference cost is the main barrier to profitable AI products in price-sensitive markets. If a method delivers comparable quality far more cheaply, it directly improves the unit economics for firms like Sarvam and Krutrim and for the wider IndiaAI Mission.

Where can I read the original reporting?

MIT Technology Review published the detailed account of Subquadratic's claims, benchmarks and the scepticism around them. The link to the full piece is below.

This story was reported by MIT Technology Review. Read the full original coverage at MIT Technology Review.

Sources & Citations

  1. A startup claims it broke through a bottleneck that's holding back LLMs — MIT Technology Review

Frequently Asked Questions

What exactly did Subquadratic claim to solve?

The quadratic attention bottleneck, where computation grows roughly fourfold each time input length doubles. Its dynamic sparse attention processes only token pairs it deems relevant, which the company says cuts cost and lets context windows stretch to 12 million tokens.

Are the benchmark numbers independently confirmed?

Partly. The data firm Appen validated some results, and the company has published figures such as 89.7% on LiveCodeBench. But researchers remain cautious, and the model is not yet broadly available for outside testing despite long waitlists.

Why does this matter for Indian startups?

Inference cost is the main barrier to profitable AI products in price-sensitive markets. If a method delivers comparable quality far more cheaply, it directly improves the unit economics for firms like Sarvam and Krutrim and for the wider IndiaAI Mission.

Where can I read the original reporting?

MIT Technology Review published the detailed account of Subquadratic's claims, benchmarks and the scepticism around them. The link to the full piece is in the attribution paragraph at the end of this article.

This article was last reviewed on 19 June 2026by Oquilia's editorial team. Every claim is sourced from primary regulatory materials (CBDT, IRDAI, RBI, SEBI, Indian Kanoon). View our methodology.

Found an error? Report an issue.

CalculatorsInsuranceInvestTaxLoansNRIMBAHNIAI
Oquilia

150+ calculators · Zero commissions

Oquilia

Intelligent financial analysis. 150+ calculators & unbiased analysis.

Data: IRDAI · RBI · SEBI · AMFI

Calculators

  • SIP
  • EMI
  • Income Tax
  • FD
  • PPF
  • NPS
  • Gratuity
  • HRA
  • ELSS
  • All 150+

Insurance

  • Compare Plans
  • Companies
  • Claims Data
  • Hospitals
  • Health Premium
  • Term Premium
  • Section 80D

Tax & Loans

  • Old vs New
  • Capital Gains
  • TDS
  • Home Loan EMI
  • Car Loan EMI
  • Rent vs Buy
  • Prepayment

More Tools

  • Invest Hub
  • Tax Planning
  • Loan Tools
  • NRI Hub
  • MBA Finance
  • HNI Wealth
  • Glossary
  • News
  • Blog
  • Reports
  • Tools
  • Oquilia Advisor

Company

  • About
  • Contact
  • FAQ
  • Legal Hub
  • Privacy
  • Terms
  • Disclaimer
  • Cookie Policy
  • Grievance
  • Disclosure

Newsletter

Monthly digest

Policy moves, deadline reminders, and the most-used calculators each month.

Reviewed by Subodh Bajpai, Senior Partner & MBA Finance (XLRI)

Legal & Grievance Partner: Unified Chambers & Associates, Delhi High Court

Designed & developed by QX137, React & Next.js studio

Regulatory & data sources

RBISEBIIRDAIIncome Tax DeptAMFIPFRDAOECD TaxBISWorld Bank

Regulatory data last updated: May 2026. Figures are cross-checked against primary IRDAI, SEBI, RBI, CBDT and AMFI publications before they ship.

© 2026 Oquilia. Not a licensed financial advisor. All third-party logos and trademarks belong to their respective owners.

PrivacyTermsDisclaimerSitemap