==$0

TileRT

Ultra-Low-Latency LLM Inference

A revolutionary tile-level runtime engine that unlocks inference speed for state-of-the-art AI models.

Try Online GitHub Get Started
$ pip install tilert
New

Latest News

Release 2026-02-14

v0.1.3 — GLM-5 available in TileRT

Support full size GLM-5-FP8 in TileRT with up to 500+ user decode TPS.

Release 2026-01-26

v0.1.2 — Multi-token prediction (MTP)

Multi-Token Prediction (MTP) enabled in TileRT, reaching up to 600+ user TPS for DeepSeek-V3.2.

Release 2025-12-23

v0.1.1 — Performance optimization

Achieved 1.35x further speedup (3 ~ 4x speedup over baseline), reaching 250+ user decode TPS for DeepSeek-V3.2-Exp.

Release 2025-11-20

v0.1.0 — Initial public release

Initial public release, supporting DeepSeek-V3.2-Exp, achieving fastest inference speed among all available baselines.

Why TileRT?

Designed from the ground up for latency-critical LLM serving, and powered by advanced compiler techniques.

Ultra-Low Latency

Prioritizes responsiveness over throughput. Achieve millisecond-level time per output token (TPOT) for models with hundreds of billions of parameters.

First Tile-Level Runtime

Decomposes LLM models into fine-grained tile-level tasks and dynamically reschedules computation, I/O, and communication across multiple devices.

Compiler-Driven Technology

Leverages advanced compiler techniques to automatically minimize idle time and maximize hardware utilization through highly overlapped execution across GPUs.

No Compromise on Accuracy

Preserves full model quality and accuracy without any lossy optimizations such as quantization or distillation.

Scalable for Multi-GPU

Built for multi-device deployment, with dynamic scheduling that overlaps computation and communication.

High-Value Scenarios

TileRT is designed for empowering high-value AI scenarios, such as agents, vibe coding, trading, and real-time decision-making.

Performance Benchmark

TileRT demonstrates substantial speedups in token generation over baselines.

*Note: TileRT and SGLang use MTP=3, and vLLM use MTP=1 as it fails at 3

Quick Start

Get TileRT up and running in minutes with Docker and pip.

1

Pull the Docker Image

$ docker pull tileai/tilert:v0.1.0
2

Launch the Container

# Path to the workspace you want to mount
WORKSPACE_PATH="xxx"
$ docker run --gpus all -it -v $WORKSPACE_PATH:/workspace/ tileai/tilert:v0.1.0
3

Install TileRT

$ pip install tilert

Note: For the most reliable experience, we strongly recommend using the provided Docker image. See the GitHub repository for full documentation.

Ecosystem

TileRT is part of a growing ecosystem of compiler and runtime tools for AI workloads.

TileLang

Tile-level programming language for AI kernel development.

TileScale

Programming framework for distributed AI workloads.

TileRT

Tile-level runtime engine for ultra-low-latency LLM inference.