Preface
Setting Up the Environment
1.
Week 1: From Matmul to Text
1.1.
Attention and Multi-Head Attention
1.2.
Positional Embeddings and RoPE
1.3.
Grouped/Multi Query Attention
1.4.
Multilayer Perceptron Layer and Transformer
1.5.
Wiring the Qwen2 Model
1.6.
Loading the Model
1.7.
Generating the Response
2.
Week 2: Optimizing
3.
Week 3: Serving
Glossary Index