mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2026-04-11 14:21:41 +08:00

History

Sebastian Raschka 4bfbcd069d Auto download DPO dataset if not already available in path (#479 ) * Auto download DPO dataset if not already available in path * update tests to account for latest HF transformers release in unit tests * pep 8		2025-01-12 12:27:28 -06:00
..
01_main-chapter-code	Add backup URL for gpt2 weights (#469 )	2025-01-05 11:28:09 -06:00
02_alternative_weight_loading	fixed num_workers (#229 )	2024-06-19 17:36:46 -05:00
03_bonus_pretraining_on_gutenberg	Update README.md	2024-08-10 07:54:51 -05:00
04_learning_rate_schedulers	Add and link bonus material (#84 )	2024-03-23 07:27:43 -05:00
05_bonus_hparam_tuning	total training iters may equal to warmup_iters (#301 )	2024-08-06 07:10:05 -05:00
06_user_interface	Add user interface to ch06 and ch07 (#366 )	2024-09-21 20:33:00 -05:00
07_gpt_to_llama	Auto download DPO dataset if not already available in path (#479 )	2025-01-12 12:27:28 -06:00
08_memory_efficient_weight_loading	update mmap section	2024-10-14 14:27:19 -05:00
README.md	Memory efficient weight loading (#401 )	2024-10-14 10:30:25 -05:00

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
03_bonus_pretraining_on_gutenberg contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
04_learning_rate_schedulers contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
05_bonus_hparam_tuning contains an optional hyperparameter tuning script
06_user_interface implements an interactive user interface to interact with the pretrained LLM
07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI
08_memory_efficient_weight_loading contains a bonus notebook showing how to load model weights via PyTorch's load_state_dict method more efficiently