rasbt-LLMs-from-scratch/ch02
2025-02-28 10:16:21 -06:00
..
01_main-chapter-code Use correct ch02 title (#551) 2025-02-28 10:16:21 -06:00
02_bonus_bytepair-encoder add GPT2TokenizerFast to BPE comparison (#498) 2025-01-22 09:26:44 -06:00
03_bonus_embedding-vs-matmul minor spelling fix 2024-09-08 15:35:36 -05:00
04_bonus_dataloader-intuition fixed num_workers (#229) 2024-06-19 17:36:46 -05:00
05_bpe-from-scratch Improve BPE vocabulary saving and pair frequency handling (#539) 2025-02-19 09:51:04 -06:00
README.md Add BPE from scratch link (#550) 2025-02-28 09:57:41 -06:00

Chapter 2: Working with Text Data

 

Main Chapter Code

 

Bonus Materials

  • 02_bonus_bytepair-encoder contains optional code to benchmark different byte pair encoder implementations

  • 03_bonus_embedding-vs-matmul contains optional (bonus) code to explain that embedding layers and fully connected layers applied to one-hot encoded vectors are equivalent.

  • 04_bonus_dataloader-intuition contains optional (bonus) code to explain the data loader more intuitively with simple numbers rather than text.

  • 05_bpe-from-scratch contains (bonus) code that implements and trains a GPT-2 BPE tokenizer from scratch.