TileRT

Why TileRT?

Designed from the ground up for latency-critical LLM serving, and powered by advanced compiler techniques.

Ultra-Low Latency

Prioritizes responsiveness over throughput. Achieve millisecond-level time per output token (TPOT) for models with hundreds of billions of parameters.

First Tile-Level Runtime

Decomposes LLM models into fine-grained tile-level tasks and dynamically reschedules computation, I/O, and communication across multiple devices.

Compiler-Driven Technology

Leverages advanced compiler techniques to automatically minimize idle time and maximize hardware utilization through highly overlapped execution across GPUs.

No Compromise on Accuracy

Preserves full model quality and accuracy without any lossy optimizations such as quantization or distillation.

Scalable for Multi-GPU

Built for multi-device deployment, with dynamic scheduling that overlaps computation and communication.

High-Value Scenarios

TileRT is designed for empowering high-value AI scenarios, such as agents, vibe coding, trading, and real-time decision-making.

Quick Start

Get TileRT up and running in minutes with Docker and pip.

1

Pull the Docker Image

$ docker pull tileai/tilert:v0.1.0

2

Launch the Container

# Path to the workspace you want to mount
WORKSPACE_PATH="xxx"
$ docker run --gpus all -it -v $WORKSPACE_PATH:/workspace/ tileai/tilert:v0.1.0

3

Install TileRT

$ pip install tilert

Note: For the most reliable experience, we strongly recommend using the provided Docker image. See the GitHub repository for full documentation.

Ecosystem

TileRT is part of a growing ecosystem of compiler and runtime tools for AI workloads.

TileRT

Latest News

Try TileRT Online — Experience Ultra-Low-Latency Inference

v0.1.3 — GLM-5 available in TileRT

v0.1.2 — Multi-token prediction (MTP)

v0.1.1 — Performance optimization

v0.1.0 — Initial public release

Why TileRT?

Ultra-Low Latency

First Tile-Level Runtime

Compiler-Driven Technology

No Compromise on Accuracy

Scalable for Multi-GPU

High-Value Scenarios

Performance Benchmark

Quick Start

Pull the Docker Image

Launch the Container

Install TileRT

Ecosystem

TileLang

TileScale

TileRT