vLLM Hong Kong Meetup

Agenda

Session Abstracts  |  Speaker Bios

10:00 – 10:30
30 mins

Registration

10:30 – 10:40
10 mins

Welcome Speech

Jiaju Zhang — Chief Architect, Red Hat
10:40 – 11:10
30 mins

Getting started with vLLM: The leading open-source LLM inference engine for Private AI

Christopher Nuland — Principal Technical Marketing Engineer, AI BU, Red Hat
Peter Ho — Senior Solution Architect, Red Hat
11:10 – 11:40
30 mins

Multi-modal inference and deployment using vLLM

Cyrus Leung — Multi-Modality Co-Lead, vLLM Team
11:40 – 12:10
30 mins

LLM inference on AMD GPUs: A Technical Deep Dive

Haichen Zhang — Senior PM, AI Engineering, AMD
12:10 – 12:15
5 mins

Group Photo

12:15 – 13:15
60 mins

AMD Hands-on Workshop: Minimax M2 Agent Tutorial

Haichen Zhang — Senior PM, AI Engineering, AMD
13:15 – 14:15
60 mins

Lunch & Networking

14:15 – 14:45
30 mins

From Offline to Online Inference: Why Serving Is Hard—and How vLLM Helps

Henry Wong — Data Scientist, Python User Group HK
14:45 – 15:15
30 mins

Deep Adaptation and Engineering Practice of vLLM on MetaX GPU

William Chan — Software Expert, AI Solutions, MetaX
15:15 – 15:45
30 mins

KVCache Practices at MiniMax for Agentic Workloads: From Traffic Characteristics to Architectural Insights

Zebin Li — Software Engineer, MiniMax
15:45 – 16:15
30 mins

vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving

Han Gao — vLLM-Omni Core Maintainer
16:15 – 17:00
45 mins

Networking

17:00

Event End

Subject to change.