==$0

Tokens, in a blink

TileRT runs frontier LLMs in real time.

Partners

GLM MindLab Xiaomi MiMo vLLM InferAct

Millisecond Intelligence at Frontier Scale

TileRT, in motion.

A prompt, streaming live

Watch one Q&A unfold at real TileRT speed — no comparison, just the feel of it.

↑ more models soon 0 300 600 TPS 1K 64K 128K 200K CONTEXT LENGTH

Sustained at scale

Single-user TPS on GLM-5 from 1K to 200K context. Holds steady where most engines collapse. More model results landing soon.

Speed is what comes after intelligence.

AI is going autonomous. In that world, what differentiates isn't intelligence — it's speed.

That's the gap TileRT is built for.

CHAT ERA 2023 — 2024 AGENTIC ERA 2025 — Now AUTONOMOUS ERA Next two years model value compute cost ChatGPT Cursor Claude Code Codex factories agents finance vehicles ECONOMICS value < cost DEMAND model quality ECONOMICS value ≈ cost DEMAND token throughput ECONOMICS value »» cost DEMAND SPEED
2023 — 2024 · Chat Era

ChatGPT

Economics value < cost
Demand model quality
2025 — Now · Agentic Era

Cursor · Claude Code · Codex

Economics value ≈ cost
Demand token throughput
Next two years · Autonomous Era

factories · agents · finance · vehicles

Economics value »» cost
Demand SPEED
New

Blog & News

Production 2026-05-22

TileRT in production — powering GLM-5.1 on Z.ai MaaS

GLM-5.1-highspeed is now live on Z.ai, powered by TileRT — from experimental prototype to real production.

Blog 2026-05-21

Speed as the Next Scaling Law

Inside TileRT and production-scale GLM-5.1 inference — persistent kernels, tile pipelines, and heterogeneous workers.

Release 2026-02-14

v0.1.3 — GLM-5 available in TileRT

Support full size GLM-5-FP8 in TileRT with up to 500+ user decode TPS.

Release 2026-01-26

v0.1.2 — Multi-token prediction (MTP)

Multi-Token Prediction (MTP) enabled in TileRT, reaching up to 600+ user TPS for DeepSeek-V3.2.

Release 2025-12-23

v0.1.1 — Performance optimization

Achieved 1.35x further speedup (3 ~ 4x speedup over baseline), reaching 250+ user decode TPS for DeepSeek-V3.2-Exp.

Release 2025-11-20

v0.1.0 — Initial public release

Initial public release, supporting DeepSeek-V3.2-Exp, achieving fastest inference speed among all available baselines.

Ecosystem

TileRT is part of a growing ecosystem of tile-based AI computing.

Language

TileLang

Tile-based pythonic language for programming AI computing.

Operators

TileOPs

High-performance LLM operator library built on TileLang.

Framework

TileScale

Distributed framework for AI computing across all scales.

Runtime · you are here

TileRT

Ultra-low-latency runtime for LLM inference.