rasbt-LLMs-from-scratch

mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2026-04-11 02:11:40 +08:00

History

casinca 9c4be478f8 Optional weight tying for Qwen3 and Llama3.2 pretraining (#949 ) * optional weight tying for Qwen3 and Llama3.2 * typo		2026-01-14 09:07:04 -06:00
..
__init__.py	Llama 3 KV Cache (#685 )	2025-06-21 10:55:20 -05:00
generate.py	Add defensive context trimming for multiturn (#815 )	2025-09-09 20:19:00 -05:00
gpt2.py	remove redundant next_cache (#817 )	2025-09-11 15:16:08 -05:00
llama3.py	Optional weight tying for Qwen3 and Llama3.2 pretraining (#949 )	2026-01-14 09:07:04 -06:00
qwen3.py	Improve MoE implementation (#841 )	2025-09-22 15:21:06 -05:00
utils.py	Improve KV cache code for torch.compile (#705 )	2025-06-23 18:08:49 -05:00