Pip compatible CodeBLEU metric implementation available for linux/macos/win
-
Updated
Mar 31, 2025 - Python
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Backend for automated evaluation of programming tasks in higher education
The SF Code Evaluator
An open-source Python library for code encryption, decryption, and safe evaluation using Python's built-in AST module, complete with allowed functions, variables, built-in imports, timeouts, and blocked access to attributes.
Gowlin: Open-source Secure autograder for LLM agent development and evaluation
Python library to interact synchronously and asynchronously with tio.run
SocratiQ AI uses socratic method of teaching to guide users through learning, asking questions that prompt critical thinking and problem-solving rather than providing direct answers.
Artha is a code evaluation system developed with Django and Django REST Framework that uses Judge0 as the code execution engine.
Python toolkit for automated evaluation and benchmarking of code efficiency, performance, and resource usage. Easily analyze, compare, and score scripts or code snippets in a fast, modular CLI workflow.
Add a description, image, and links to the code-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the code-evaluation topic, visit your repo's landing page and select "manage topics."