Commit Graph

  • 332f26328d
    Merge b6fff728aa into 8447d70b18 Amit Choubey 2026-04-09 22:28:38 +0100
  • b6fff728aa Fix BPE empty-pair edge case in find_freq_pair AmitChaubey 2026-04-09 22:26:45 +0100
  • cb8fe8c5f6 Clarify GELU input example by using randn AmitChaubey 2026-04-08 17:29:45 +0100
  • 8447d70b18
    Some gemma 3 improvements (#1000) main Sebastian Raschka 2026-04-05 22:05:05 -0400
  • b1df951a71
    update url rasbt 2026-04-05 20:56:40 -0500
  • fc8455cda7
    some gemma 3 improvements rasbt 2026-04-05 20:46:14 -0500
  • afc6a3da07
    Remove 7 days requirement to improve windows compatibility (#999) Sebastian Raschka 2026-04-04 13:31:43 -0400
  • 0967695ce2
    Remove 7 days requirement to improve windows compatibility rasbt 2026-04-04 12:30:02 -0500
  • fd11713ed9
    exclude torchvission from nightly debug-exclude-newer rasbt 2026-04-04 12:17:50 -0500
  • 7489500d90
    Make change in a code file rasbt 2026-04-04 12:05:25 -0500
  • f1668af932
    uv packages must be 7 days old (#994) Sebastian Raschka 2026-03-31 20:10:15 -0400
  • cd10e149a2
    uv packages must be 7 days old rasbt 2026-03-31 17:30:19 -0500
  • d977841fad
    Swap urllib.request with requests (#993) Sebastian Raschka 2026-03-30 22:03:13 -0400
  • 0c5bcaf511
    Swap urllib.request with requests rasbt 2026-03-30 19:48:39 -0500
  • e32a2a72bd
    Add GitHub Actions workflow to sync fork vijaychandar186 2026-03-30 02:47:34 -0400
  • 771ec0f4c1
    Delete .github/workflows/sync.yml vijaychandar186 2026-03-30 02:46:07 -0400
  • b5ed58c2c5
    Merge pull request #9 from rasbt/main vijaychandar186 2026-03-30 02:44:23 -0400
  • 8f18ff256a
    Update sync.yml vijaychandar186 2026-03-30 02:41:05 -0400
  • c56d15e83d
    Update sync.yml to include permissions vijaychandar186 2026-03-30 02:38:47 -0400
  • b546483d7d Fix the ssl issue for MacOS M1 when downloading gpt model related resources. Roc Marshal 2026-03-29 19:32:20 +0800
  • 6b9502056f
    fix: pin 1 unpinned action(s) (#987) dagecko 2026-03-26 12:49:44 -0400
  • f0086f5fba fix: pin 1 unpinned action(s) Chris Nyhuis 2026-03-26 12:18:12 -0400
  • ab9eb68434
    Merge pull request #8 from rasbt/main github-actions[bot] 2026-03-21 04:22:20 +0000
  • 9320a5e252
    fix: added KVcache in generate_text_basic_stream (#981) casinca 2026-03-21 01:47:52 +0100
  • 3ce2916a14 fix: added KVcache in generate_text_basic_stream casinca 2026-03-17 15:00:19 +0100
  • 7f810ace0f
    Merge pull request #7 from rasbt/main github-actions[bot] 2026-03-08 04:27:57 +0000
  • a1463c3ea5 feat:完成了torch注释的添加 ff 2026-03-08 11:57:47 +0800
  • 130cc1f63c
    harded the link checker rasbt 2026-03-07 17:05:41 -0600
  • 9ab6e894ac
    Minor typo fix (#974) Sebastian Raschka 2026-03-07 17:31:40 -0500
  • 052c2dea4f
    Bpe whitespace fixes (#975) Sebastian Raschka 2026-03-07 14:56:25 -0500
  • 60770c2b5f
    Bpe whitespace fixes rasbt 2026-03-07 13:46:45 -0600
  • a68e98caff
    Minor typo fix rasbt 2026-03-07 13:35:30 -0600
  • 27f39bcb3c
    Merge pull request #6 from rasbt/main github-actions[bot] 2026-03-05 04:28:50 +0000
  • 3a7b98a36a
    Add more analysis to qwen3.5 image Sebastian Raschka 2026-03-04 08:47:06 -0600
  • 8dc00b7299
    Merge pull request #5 from rasbt/main github-actions[bot] 2026-03-04 04:24:50 +0000
  • ae8eebf0d7
    Use full HF url Sebastian Raschka 2026-03-03 16:38:05 -0600
  • 7892ec9435
    Qwen3.5 from scratch (#969) Sebastian Raschka 2026-03-03 17:31:16 -0500
  • 8c0c0e3998
    update rasbt 2026-03-03 16:21:14 -0600
  • 497d5c50f1
    update rasbt 2026-03-03 16:15:18 -0600
  • 4188140465
    Qwen3.5 from scratch rasbt 2026-03-03 16:02:36 -0600
  • e8711b5af1
    Merge pull request #4 from rasbt/main github-actions[bot] 2026-03-02 04:30:47 +0000
  • 4612d20fa8
    User argpars utils to show default args on command line rasbt 2026-03-01 20:15:21 -0600
  • d838c1737b
    Merge pull request #3 from rasbt/main github-actions[bot] 2026-02-28 04:11:53 +0000
  • c079904491
    Jupyter scrolling glitch tips (#965) Sebastian Raschka 2026-02-27 18:33:39 -0500
  • 0bfefe642c
    Jupyter scrolling glitch tips rasbt 2026-02-27 17:19:18 -0600
  • fc833179cc
    Merge pull request #2 from rasbt/main github-actions[bot] 2026-02-20 04:30:46 +0000
  • ec78de32dc
    image size rasbt 2026-02-19 16:42:19 -0600
  • 10bffd62b7
    image size rasbt 2026-02-19 16:41:43 -0600
  • c745ded43d
    formatting fix rasbt 2026-02-19 16:40:28 -0600
  • 62f0356e0d
    Add Tiny Aya from scratch (#962) Sebastian Raschka 2026-02-19 17:33:22 -0500
  • 50be26871e
    Add Tiny Aya from scratch rasbt 2026-02-19 16:19:25 -0600
  • c195785faa
    Merge pull request #1 from rasbt/main github-actions[bot] 2026-02-19 04:46:51 +0000
  • 1ed48c2450
    remove redundant assignment (#961) Sebastian Raschka 2026-02-18 23:03:49 -0500
  • 28f4c831f3
    remove redundant assignment rasbt 2026-02-18 21:48:28 -0600
  • 2d600ccb5b
    Use correct input in layernorm example (#960) Sebastian Raschka 2026-02-18 22:35:57 -0500
  • f464faf609
    update rasbt 2026-02-18 21:26:01 -0600
  • 883f6d0702
    Use correct example in layernorm section rasbt 2026-02-18 21:24:20 -0600
  • 0383056139
    Add GitHub Actions workflow for syncing fork vijaychandar186 2026-02-18 13:41:25 -0500
  • 1ba7bd5c04
    Update CI rasbt 2026-01-27 15:26:02 -0600
  • be5e2a3331
    Readability and code quality improvements (#959) Sebastian Raschka 2026-02-17 19:44:56 -0500
  • 49610eb17a
    consistent section headers rasbt 2026-02-17 18:26:42 -0600
  • 9d7ca2c4ba
    Consistent dataset naming rasbt 2026-02-17 16:37:51 -0600
  • 318a42ef41 add training notebook Shahar Dickstein 2026-02-15 20:39:52 +0200
  • 6da0ba1e24 fix double printing Shahar Dickstein 2026-02-15 16:48:46 +0200
  • e720805d57 Allow safe imports (math, random, etc) in sandbox execution Shahar Dickstein 2026-02-15 16:46:08 +0200
  • f6b237af89 Add auto-run feature: detect function name and offer 'c' key to print output Shahar Dickstein 2026-02-15 16:24:33 +0200
  • 110878aac3 Update inference script to support cli args (model_size, max_length) Shahar Dickstein 2026-02-15 15:28:17 +0200
  • 2db8594df8 Fix positional embedding sizing mismatch when max_length < 1024 Shahar Dickstein 2026-02-15 15:06:28 +0200
  • 2ad253570b Add argparse for hyperparameters and support larger models Shahar Dickstein 2026-02-15 14:16:48 +0200
  • 4c5f9fadbb Optimize training: add gradient accumulation & increase batch size Shahar Dickstein 2026-02-15 14:10:16 +0200
  • 9e47e507dd Fix dataset_prep: add collation and padding for variable lengths Shahar Dickstein 2026-02-15 13:54:40 +0200
  • 4bd224ce1e Merge branch 'feature/tool-calling-experiment' of https://github.com/shaharprojs/LLMs-from-scratch-sd into feature/tool-calling-experiment Shahar Dickstein 2026-02-15 13:47:24 +0200
  • fa333cbfcc Fix config: set qkv_bias=True for pretrained checkpoint compatibility Shahar Dickstein 2026-02-15 13:46:29 +0200
  • b972161b79
    Merge branch 'rasbt:main' into feature/tool-calling-experiment shaharprojs 2026-02-15 13:24:17 +0200
  • 4af5986e8c updating train_colab Shahar Dickstein 2026-02-15 13:22:15 +0200
  • 1749b7fef7 Add tool calling experiment files Shahar Dickstein 2026-02-15 13:09:34 +0200
  • 7b1f740f74
    Fix flex attention in PyTorch 2.10 (#957) Sebastian Raschka 2026-02-09 15:12:40 -0500
  • e7ed5259ff
    Fix flex attention in PyTorch 2.10 rasbt 2026-02-09 13:51:32 -0600
  • 86b9818861
    Merge branch 'main' into download Tony 2026-02-01 12:55:31 -0500
  • 8f7fcfcc29 a Tony 2026-02-01 17:54:27 +0000
  • 4aa274bb96 a Tony 2026-02-01 15:18:21 +0000
  • 82010e2c77
    Fix docstring parameter names in compute_dpo_loss function (#953) Dawid Woźniak 2026-01-29 23:51:17 +0100
  • 41da1a4e47 Fix docstring parameter names in compute_dpo_loss function wozniakos10 2026-01-29 20:39:23 +0100
  • e155d1b02c
    Update unit tests for CI (#952) Sebastian Raschka 2026-01-27 17:44:55 -0600
  • ea8cf1fbc3
    update rasbt 2026-01-27 16:04:03 -0600
  • 341ff65beb
    update rasbt 2026-01-27 15:47:43 -0600
  • df5b809c7f
    Update rasbt 2026-01-27 15:42:42 -0600
  • 4cef442a3b
    Merge branch 'main' into transformer-5.0.0 Sebastian Raschka 2026-01-27 15:31:45 -0600
  • c172b838e8
    Revert submodule pointer update rasbt 2026-01-27 15:31:05 -0600
  • b0b48ff681
    Update CI rasbt 2026-01-27 15:26:02 -0600
  • 59d9262047
    chore: Update outdated GitHub Actions versions (#951) Pádraic Slattery 2026-01-19 19:22:29 +0100
  • efd64e061b chore: Update outdated GitHub Actions versions Padraic Slattery 2026-01-19 18:39:24 +0100
  • 47cfc61800
    link GRPO notebook (#950) Sebastian Raschka 2026-01-18 11:42:03 -0600
  • 0f6e9d9f22
    link GRPO notebook rasbt 2026-01-18 11:25:55 -0600
  • 9c4be478f8
    Optional weight tying for Qwen3 and Llama3.2 pretraining (#949) casinca 2026-01-14 16:07:04 +0100
  • 78f4492ab7 typo casinca 2026-01-14 09:59:05 +0100
  • 34891ed7c4 optional weight tying for Qwen3 and Llama3.2 casinca 2026-01-14 09:51:03 +0100
  • e0dbec3331
    Fix encoding of multiple preceding spaces in BPE tokenizer. (#945) Maxwell De Jong 2026-01-10 11:27:23 -0500
  • e81e3f2142
    Add test rasbt 2026-01-10 10:14:01 -0600
  • c5e6b543c5 Fix encoding of multiple preceding spaces in BPE tokenizer. MaxwellDeJong 2026-01-09 21:14:03 -0500