mirror of https://github.com/rasbt/LLMs-from-scratch.git synced 2026-04-11 02:11:40 +08:00

History

casinca 9320a5e252 fix: added KVcache in `generate_text_basic_stream` (#981 )		2026-03-20 19:47:52 -05:00
..
tests	Qwen3.5 from scratch (#969 )	2026-03-03 16:31:16 -06:00
qwen3_5_transformers.py	Use full HF url	2026-03-03 16:38:05 -06:00
qwen3.5-plus-kv-cache.ipynb	fix: added KVcache in `generate_text_basic_stream` (#981 )	2026-03-20 19:47:52 -05:00
qwen3.5.ipynb	Qwen3.5 from scratch (#969 )	2026-03-03 16:31:16 -06:00
README.md	Add more analysis to qwen3.5 image	2026-03-04 08:47:06 -06:00

README.md

Qwen3.5 0.8B From Scratch

This folder contains a from-scratch style implementation of Qwen/Qwen3.5-0.8B.

Qwen3.5 is based on the Qwen3-Next architecture, which I described in more detail in section 2. (Linear) Attention Hybrids of my Beyond Standard LLMs article

Note that Qwen3.5 alternates linear_attention and full_attention layers.
The notebooks keep the full model flow readable while reusing the linear-attention building blocks from the qwen3_5_transformers.py, which contains the linear attention code from Hugging Face under an Apache version 2.0 open source license.

Files

qwen3.5.ipynb: Main Qwen3.5 0.8B notebook implementation.
qwen3.5-plus-kv-cache.ipynb: Same model with KV-cache decoding for efficiency.
qwen3_5_transformers.py: Some helper components from Hugging Face Transformers used for Qwen3.5 linear attention.