1. Preface
  2. Setting Up the Environment
  3. Week 1: From Matmul to Text
    1. Attention and Multi-Head Attention
    2. Positional Encodings and RoPE
    3. Grouped/Multi Query Attention
    4. RMSNorm and MLP
    5. The Qwen2 Model
    6. Generating the Response
    7. Sampling and Preparing for Week 2
  4. Week 2: Tiny vLLM
    1. Key-Value Cache
    2. Quantized Matmul (2 Days)
    3. Flash Attention (2 Days)
    4. Chunked Prefill
    5. Continuous Batching
  5. Week 3: Serving
  6. Glossary Index