rasbt-LLMs-from-scratch/ch05
Sebastian Raschka 4bfbcd069d
Auto download DPO dataset if not already available in path (#479)
* Auto download DPO dataset if not already available in path

* update tests to account for latest HF transformers release in unit tests

* pep 8
2025-01-12 12:27:28 -06:00
..
01_main-chapter-code Add backup URL for gpt2 weights (#469) 2025-01-05 11:28:09 -06:00
02_alternative_weight_loading fixed num_workers (#229) 2024-06-19 17:36:46 -05:00
03_bonus_pretraining_on_gutenberg Update README.md 2024-08-10 07:54:51 -05:00
04_learning_rate_schedulers Add and link bonus material (#84) 2024-03-23 07:27:43 -05:00
05_bonus_hparam_tuning total training iters may equal to warmup_iters (#301) 2024-08-06 07:10:05 -05:00
06_user_interface Add user interface to ch06 and ch07 (#366) 2024-09-21 20:33:00 -05:00
07_gpt_to_llama Auto download DPO dataset if not already available in path (#479) 2025-01-12 12:27:28 -06:00
08_memory_efficient_weight_loading update mmap section 2024-10-14 14:27:19 -05:00
README.md Memory efficient weight loading (#401) 2024-10-14 10:30:25 -05:00

Chapter 5: Pretraining on Unlabeled Data

 

Main Chapter Code

 

Bonus Materials

  • 02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
  • 03_bonus_pretraining_on_gutenberg contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
  • 04_learning_rate_schedulers contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
  • 05_bonus_hparam_tuning contains an optional hyperparameter tuning script
  • 06_user_interface implements an interactive user interface to interact with the pretrained LLM
  • 07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI
  • 08_memory_efficient_weight_loading contains a bonus notebook showing how to load model weights via PyTorch's load_state_dict method more efficiently