Demo

See the demo here.

About This Application

This application is a follow-up work on my master thesis at the UCLouvain: "Automatic Evaluation of the Pedagogical Effectiveness of Open-Domain Chatbots in a Language Learning Game." The repository can be cloned using [email protected]:C-bianc/ChatEvaluationDemo.git

Notes

The model is not shared yet, so you will not be able to use the app unless the model is provided.

Key Features:

Uses a unified model to evaluate user input and generate responses
Computes SEED (scoring educational effectiveness of dialogue) metric to evaluate conversational effectiveness
Adapts accordingly

Technical Details:

The SEED metric evaluates bot responses based on intent, output elicitation, and helpfulness labels from our evaluator
The unified model is a multi-head BERT for sequence classification
We implemented SEED, but also adaptive rules triggered by our evaluations to refine bot responses

About Evaluator Model

Benchmarking Results

Our research benchmarked several transformer-based models. The BERT model achieved the highest overall performance, outperforming all other architectures across the three evaluation dimensions:

Model	Intent (F1)	Output (F1)	Support (F1)	Overall (F1)
BERT	0.839	0.977	0.815	0.877
RoBERTa	0.807	0.961	0.810	0.860
DistilBERT	0.791	0.962	0.801	0.851
mBERT	0.790	0.966	0.781	0.846
DistilmBERT	0.784	0.964	0.778	0.842

BERT stood out particularly on the Interactional Support dimension, where it achieved the highest F1 score and was the only model to exceed 80% recall. While RoBERTa ranked second overall, it showed greater variation across tasks.

The distilled models performed well on Output Elicitation, where all architectures scored highly across metrics with only minor differences. Performance on Communicative Intent and Interactional Support varied more, especially in recall.

Unified Model Architecture

Bianca Ciobanica, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
app		app
test		test
README.md		README.md
compute_seed.py		compute_seed.py
main.py		main.py
unified_model_final.py		unified_model_final.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demo

About This Application

Notes

Key Features:

Technical Details:

About Evaluator Model

Benchmarking Results

Unified Model Architecture

About

Uh oh!

Releases

Packages

Languages

C-bianc/ChatEvaluationDemo

Folders and files

Latest commit

History

Repository files navigation

Demo

About This Application

Notes

Key Features:

Technical Details:

About Evaluator Model

Benchmarking Results

Unified Model Architecture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages