- CyberPlaybookLLM is a large language model (LLM) fine-tuned on a dataset of "cyber playbooks."
- By "cyber playbooks," we refer to structured documents that outline the steps and procedures to be followed in response to specific cybersecurity incidents or scenarios.
- By combining the capabilities of LLMs with the structured approach of playbooks, it aims to enhance the efficiency and effectiveness of incident response and security operations.
- It is designed to assist cybersecurity professionals (SoCs) in creating, managing, and executing playbooks for various security incidents and scenarios.
- Currently we use CACAO and MITRE ATT&CK as the main sources of playbooks and incidents
- CACAO
- MITRE ATT&CK
- Mitigation planning: Generate detailed mitigation plans for specific cybersecurity incidents.
- Playbook Generation: Generate playbooks for specific cybersecurity incidents or scenarios.
- Synthetic data generation: Generate synthetic data for training and testing purposes using the scripts in folder
Dataset
.
- The dataset used for fine-tuning the model is located in the
Dataset
folder. It contains playbooks and incident data in a structured format. - The subfolder
Dataset/Samples
contains samples generated by different models (GPT-based) and ground truth data from Atomic Red Team. https://github.com/redcanaryco/atomic-red-team Dataset\MITRE_utils.py
contains functions to extract data from the MITRE ATT&CK framework and convert it into a structured format suitable for training the model. Including filtering if needed. Note that we filtered as mentioned in the paper for incidents related to distributed systems- The main incidents dataset is located in
Dataset/Main
. It contains:- Aggregated samples (ground truth) from Atomic Red Team (
dataset_humain_in_the_loop.json
) - GPT-generated samples merged between (
dataset_synthetic.json
) - Aggregated, final dataset used for training and evaluation (
dataset_merged.json
) - The script
Dataset/dataset_analyzer.py
can be used to analyze the dataset and generate statistics, merge samples, and filter them based on specific criteria.
- Aggregated samples (ground truth) from Atomic Red Team (
project/
│
├── model.py # Model + LoRA + optimizer setup
├── data.py # Dataset + dataloaders
├── train_low_level.py # Training loop only
├── TrainingState.py # State handling
├── utils/ # Logging, checkpoint, metrics
- The fine-tuning scripts and configurations are located in the
Training
folder. The training process involves fine-tuning the LLaMA model on the playbook dataset using PyTorch and Hugging Face Transformers. - Using SFT module from HuggingFace on the mentioned dataset above
- The training process is divided into three stages:
- Mitigation training: The model is first trained on a dataset of mitigations, which are specific actions or steps taken to address security incidents.
- Playbook training: The model is then trained on a dataset of playbooks, which are structured documents outlining the steps to be followed in response to specific incidents.
- Incident training: Finally, the model is trained on a dataset of incidents, which are real-world cybersecurity events and scenarios.
- The main script for training is
train_low_level.py
, which handles the training loop and model updates. The script uses theTrainingState
class to manage the training state, including loading and saving checkpoints, logging metrics, and handling early stopping. - The curriculum learning approach is implemented in the
TrainingState
class, which manages the training process and allows for easy switching between different training stages. The class also handles the loading and saving of model checkpoints, logging of training metrics, and early stopping based on validation performance. - Use curriculum_trainer.py to train the model with curriculum learning. The script takes care of loading the dataset, initializing the model, and managing the training process.
- The evaluation scripts are located in the
Evaluation
folder. The evaluation process involves testing the fine-tuned model on a separate test set of playbooks and incidents to assess its performance and accuracy. - Technical paper describe the custom metrics we adoped to evaluate the model's performance.
- In principle we evaluate how the model performs in terms of:
- Playbook generation: The model's ability to generate accurate and relevant playbooks for specific incidents (recall of mitigations, nodes, and edges).
- Mitigation planning: The model's ability to generate detailed and actionable mitigation plans for specific incidents ()
- Synthetic data generation: The model's ability to generate realistic and useful synthetic data for training and testing purposes.
- Currently some tests are provided in the
Test
folder. - Run them with
pytest tests/test_training_state.py
- Use CACAO's SOAR interface to execute playbooks for faster adoption and integration.
- Playbook Execution: Execute playbooks and provide step-by-step guidance for incident response.