rasbt-LLMs-from-scratch/ch07/03_model-evaluation
Sebastian Raschka 7bd263144e
Switch from urllib to requests to improve reliability (#867)
* Switch from urllib to requests to improve reliability

* Keep ruff linter-specific

* update

* update

* update
2025-10-07 15:22:59 -05:00
..
scores add spearman and kendall-tau analysis 2024-07-02 07:55:32 -05:00
config.json Revert "Revert "newline"" 2024-05-27 07:32:45 -05:00
eval-example-data.json Add openai model eval utility code 2024-05-26 10:44:15 -05:00
llm-instruction-eval-ollama.ipynb Switch from urllib to requests to improve reliability (#867) 2025-10-07 15:22:59 -05:00
llm-instruction-eval-openai.ipynb Uv workflow improvements (#531) 2025-02-16 13:16:51 -06:00
README.md add instruction dataset 2024-06-08 10:38:41 -05:00
requirements-extra.txt Ollama-based model evaluation 2024-06-05 08:21:28 -05:00

Chapter 7: Finetuning to Follow Instructions

This folder contains utility code that can be used for model evaluation.

 

Evaluating Instruction Responses Using the OpenAI API

  • The llm-instruction-eval-openai.ipynb notebook uses OpenAI's GPT-4 to evaluate responses generated by instruction finetuned models. It works with a JSON file in the following format:
{
    "instruction": "What is the atomic number of helium?",
    "input": "",
    "output": "The atomic number of helium is 2.",               # <-- The target given in the test set
    "model 1 response": "\nThe atomic number of helium is 2.0.", # <-- Response by an LLM
    "model 2 response": "\nThe atomic number of helium is 3."    # <-- Response by a 2nd LLM
},

 

Evaluating Instruction Responses Locally Using Ollama