CyberPlaybookLLM

Main Description

CyberPlaybookLLM is a large language model (LLM) fine-tuned on a dataset of "cyber playbooks."
By "cyber playbooks," we refer to structured documents that outline the steps and procedures to be followed in response to specific cybersecurity incidents or scenarios.
By combining the capabilities of LLMs with the structured approach of playbooks, it aims to enhance the efficiency and effectiveness of incident response and security operations.
It is designed to assist cybersecurity professionals (SoCs) in creating, managing, and executing playbooks for various security incidents and scenarios.
Currently we use CACAO and MITRE ATT&CK as the main sources of playbooks and incidents
CACAO
MITRE ATT&CK

Features

Mitigation planning: Generate detailed mitigation plans for specific cybersecurity incidents.
Playbook Generation: Generate playbooks for specific cybersecurity incidents or scenarios.
Synthetic data generation: Generate synthetic data for training and testing purposes using the scripts in folder Dataset.

Technical Description

Dataset

The dataset used for fine-tuning the model is located in the Dataset folder. It contains playbooks and incident data in a structured format.
The subfolder Dataset/Samples contains samples generated by different models (GPT-based) and ground truth data from Atomic Red Team. https://github.com/redcanaryco/atomic-red-team
Dataset\MITRE_utils.py contains functions to extract data from the MITRE ATT&CK framework and convert it into a structured format suitable for training the model. Including filtering if needed. Note that we filtered as mentioned in the paper for incidents related to distributed systems
The main incidents dataset is located in Dataset/Main. It contains:
- Aggregated samples (ground truth) from Atomic Red Team (dataset_humain_in_the_loop.json)
- GPT-generated samples merged between (dataset_synthetic.json)
- Aggregated, final dataset used for training and evaluation (dataset_merged.json)
- The script Dataset/dataset_analyzer.py can be used to analyze the dataset and generate statistics, merge samples, and filter them based on specific criteria.

project/
│
├── model.py              # Model + LoRA + optimizer setup
├── data.py               # Dataset + dataloaders
├── train_low_level.py    # Training loop only
├── TrainingState.py      # State handling
├── utils/                # Logging, checkpoint, metrics

Training

The fine-tuning scripts and configurations are located in the Training folder. The training process involves fine-tuning the LLaMA model on the playbook dataset using PyTorch and Hugging Face Transformers.
Using SFT module from HuggingFace on the mentioned dataset above

Curriculum learning

The training process is divided into three stages:
- Mitigation training: The model is first trained on a dataset of mitigations, which are specific actions or steps taken to address security incidents.
- Playbook training: The model is then trained on a dataset of playbooks, which are structured documents outlining the steps to be followed in response to specific incidents.
- Incident training: Finally, the model is trained on a dataset of incidents, which are real-world cybersecurity events and scenarios.
The main script for training is train_low_level.py, which handles the training loop and model updates. The script uses the TrainingState class to manage the training state, including loading and saving checkpoints, logging metrics, and handling early stopping.
The curriculum learning approach is implemented in the TrainingState class, which manages the training process and allows for easy switching between different training stages. The class also handles the loading and saving of model checkpoints, logging of training metrics, and early stopping based on validation performance.
Use curriculum_trainer.py to train the model with curriculum learning. The script takes care of loading the dataset, initializing the model, and managing the training process.

Evaluation

The evaluation scripts are located in the Evaluation folder. The evaluation process involves testing the fine-tuned model on a separate test set of playbooks and incidents to assess its performance and accuracy.
Technical paper describe the custom metrics we adoped to evaluate the model's performance.
In principle we evaluate how the model performs in terms of:
- Playbook generation: The model's ability to generate accurate and relevant playbooks for specific incidents (recall of mitigations, nodes, and edges).
- Mitigation planning: The model's ability to generate detailed and actionable mitigation plans for specific incidents ()
- Synthetic data generation: The model's ability to generate realistic and useful synthetic data for training and testing purposes.

Test

Currently some tests are provided in the Test folder.
Run them with pytest tests/test_training_state.py

TODO/Future work:

Use CACAO's SOAR interface to execute playbooks for faster adoption and integration.
Playbook Execution: Execute playbooks and provide step-by-step guidance for incident response.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
Dataset		Dataset
Evaluation		Evaluation
Others		Others
Training		Training
tests		tests
.$Diagrams.drawio.bkp		.$Diagrams.drawio.bkp
.gitignore		.gitignore
Readme.md		Readme.md
TODO.txt		TODO.txt
Utils.py		Utils.py
cyber_incidents_dataset.json		cyber_incidents_dataset.json
finetune_cyberguardian_multitask.py		finetune_cyberguardian_multitask.py
prompt_ex.py		prompt_ex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CyberPlaybookLLM

Main Description

Features

Technical Description

Dataset

Training

Curriculum learning

Evaluation

Test

TODO/Future work:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

unibuc-cs/CyberPlaybookLLM

Folders and files

Latest commit

History

Repository files navigation

CyberPlaybookLLM

Main Description

Features

Technical Description

Dataset

Training

Curriculum learning

Evaluation

Test

TODO/Future work:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages