Skip to content

ai-for-edu/Evaluating-Large-Language-Models-with-Educational-Knowledge-Graphs-on-Prerequisite-Relationships

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✨ Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning ✨

Project Icon

Last Commit CC BY 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY 4.0

Project Website

This repo maintains and updates benchmark on evaluating LLMs with Educational KGs, with a focus on prerequistie relationships. 😄

Installation

Download the whole reporitory. Or clone:

$> git clone https://github.com/ai-for-edu/Evaluating-Large-Language-Models-with-Educational-Knowledge-Graphs-on-Prerequisite-Relationships

How to benchmark

After clone or download the Repositorym redirect to /benchmark/ folder:

$> cd benchmark/

1. Install requirements

$> pip install -r requirements.txt

2. Generate question queries

To generate questions on all of the tasks and on all of the datasets:

$> python generate_question_query.py

Feel free to play around the code to customize the query generation.

3. Set API connection

Please fill in the 'API_KEY' in /benchmark/Edu_KG_Eval/global_config.py. Besides that, also modify the following connection details in function generate_answer of class ApiFoxAnswer:

  • HTTPS path in 'conn = http.client.HTTPSConnection()'
  • 'User-Agent' in dictionary 'headers'

4. Get answers from LLMs

To get the answers on all of the queries generated in the last step:

$> python obtain_llm_answers.py

5. Evaluate LLM answers

As this step may require manual check, we provide some methods may be helpful to calculate accuracy, precision, recall, AUROC and AUPRC in the following script: 'auto_eval_test.py'.

Dataset

The KGs with KCs and prerequsites relationships dataset are in /data folder with each subfolder inside holding one GraphML for one KG. Or can also download all of them at once from /data/wrapup/ folder, which contains all GraphML files and corresponding JSON files.

The Croissant Metadata is at Link to File.

A duplicate of the GraphML dataset can also be found at HugginFace: Link to Data.

Citation

TBA

Contact

Authors:

Aoran Wang: aoran.wang@uni.lu, Chaoli Zhang: chaolizcl@zjnu.edu.cn, Jun Pang: jun.pang@uni.lu, Qingsong Wen: qingsongedu@gmail.com

About

Official Implementation of 'Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning'

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages