rasbt-LLMs-from-scratch/pkg/llms_from_scratch/kv_cache
casinca 9c4be478f8
Optional weight tying for Qwen3 and Llama3.2 pretraining (#949)
* optional weight tying for Qwen3 and Llama3.2

* typo
2026-01-14 09:07:04 -06:00
..
__init__.py Llama 3 KV Cache (#685) 2025-06-21 10:55:20 -05:00
generate.py Add defensive context trimming for multiturn (#815) 2025-09-09 20:19:00 -05:00
gpt2.py remove redundant next_cache (#817) 2025-09-11 15:16:08 -05:00
llama3.py Optional weight tying for Qwen3 and Llama3.2 pretraining (#949) 2026-01-14 09:07:04 -06:00
qwen3.py Improve MoE implementation (#841) 2025-09-22 15:21:06 -05:00
utils.py Improve KV cache code for torch.compile (#705) 2025-06-23 18:08:49 -05:00