GPT-2 Transformer Model from Scratch

A sleek implementation of a GPT-2-based Large Language Model (LLM) built from scratch using Python and PyTorch, featuring transformer blocks, multi-head self-attention, and optimized text generation.

🌟 Features

GPT-2 Architecture: Transformer blocks with multi-head attention, GELU-activated feed-forward networks, and layer normalization.
Efficient Tokenization: Byte-pair encoding via TikToken, handling 5,104 tokens.
Custom Data Pipeline: Sliding window dataset and dataloader for seamless training.
Text Generation: Deterministic and probabilistic outputs with temperature scaling and top-k sampling.
Pretraining: 10 epochs with loss perplexity for robust performance.

🚀 Installation

Clone the repo:

git clone https://github.com/Rohitw3code/LLM-from-scratch.git
cd LLM-from-scratch

Install dependencies:

pip install -r requirements.txt

requirements.txt:

torch>=2.0.0
tiktoken>=0.7.0

Add your text dataset to the project directory.

🛠️ Usage

Prepare Data: Tokenize text using TikToken's GPT-2 encoder (see previous_chapters.py).
Train: Configure GPT_CONFIG_124M and run the training loop in 4_Pretraining_on_unlabeled_Data.ipynb.
Generate Text: Use generate_text_simple for text generation with customizable sampling.
Evaluate: Monitor loss perplexity during training.

📂 Project Structure

LLM-from-scratch/
├── previous_chapters.py          # Model, dataset, and dataloader
├── 1_Data-Tokenization.ipynb  # Multi-head attention details
├── 2_Self_Attention_mechanism.ipynb  # Multi-head attention details
├── 3_LLM_Architecture.ipynb      # Model architecture and generation demo
├── 4_Pretraining_on_unlabeled_Data.ipynb  # Training and evaluation
├── requirements.txt              # Dependencies
└── README.md                     # Documentation

🧠 Model Architecture

Embeddings: Token and positional embeddings for input processing.
Multi-Head Attention: Captures complex dependencies with 12 heads.
Feed-Forward Networks: GELU activation for non-linearity.
Transformer Blocks: 12-layer stack with 124M parameters.
Config: 50,257 vocab size, 1,024 context length.

🎨 Text Generation

Deterministic: Uses torch.argmax for highest-probability tokens.
Probabilistic: Temperature scaling and top-k sampling for diverse outputs.

📈 Training

Dataset: 5,104 tokens via TikToken.
Training: 10 epochs, batch size 4, sequence length 256, stride 128.
Evaluation: Loss perplexity.

🔮 Future Enhancements

Add top-p sampling.
Support fine-tuning for specific tasks.
Scale to larger model configurations.

🤝 Contributing

Fork the repo.
Create a feature branch (git checkout -b feature).
Commit changes (git commit -m "Add feature").
Push (git push origin feature).
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
3-LLM-Architecture		3-LLM-Architecture
4-Pretraining		4-Pretraining
1_text_data_tokenizer.ipynb		1_text_data_tokenizer.ipynb
2_Self_Attention_mechanism.ipynb		2_Self_Attention_mechanism.ipynb
README.md		README.md
the-verdict.txt		the-verdict.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-2 Transformer Model from Scratch

🌟 Features

🚀 Installation

🛠️ Usage

📂 Project Structure

🧠 Model Architecture

🎨 Text Generation

📈 Training

🔮 Future Enhancements

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Rohitw3code/LLM-from-scratch

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Transformer Model from Scratch

🌟 Features

🚀 Installation

🛠️ Usage

📂 Project Structure

🧠 Model Architecture

🎨 Text Generation

📈 Training

🔮 Future Enhancements

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages