Preface
Setting Up the Environment
1.
Week 1: From Matmul to Text
1.1.
Attention and Multi-Head Attention
1.2.
Positional Encodings and RoPE
1.3.
Grouped/Multi Query Attention
1.4.
RMSNorm and MLP
1.5.
The Qwen2 Model
1.6.
Generating the Response
1.7.
Sampling and Preparing for Week 2
2.
Week 2: Tiny vLLM
2.1.
Key-Value Cache
2.2.
Quantized Matmul (2 Days)
2.3.
Flash Attention (2 Days)
2.4.
Chunked Prefill
2.5.
Continuous Batching
3.
Week 3: Serving
Glossary Index