This project focuses on the English-to-Korean transliteration task (e.g. english -> 잉글리시). Specifically, this project accelerates inference speed and reduces the large model size (~1.2GB) of the previous MT5-based Transliterator.
- 
Highlights
- Providing lightweight model size (~400MB) with faster, accurate transliteration results.
- Please check performance comparisons below.
 
 - Applying LoRA to the MarianMT translation model.
- The corresponding fine-tuned model is available in HuggingFace.
 
 
 - Providing lightweight model size (~400MB) with faster, accurate transliteration results.
 - 
Performance Comparisons
 
- 
Install dependencies (Assuming PyTorch is already installed):
pip install -r requirements.txt
 - 
Clone Repository:
git clone https://github.com/feVeRin/enko_transliterator.git
 
- 
Transliteration (w/ pre-trained model)
from transliteration import Transliterator model = Transliterator.from_pretrained('feVeRin/enko-transliterator') result = model.transliterate('LORA IS ALL YOU NEED') print(result) # 로라 이즈 올 유 니드
 - 
Model Training (from scratch)
- Training is linked to 
wandb. If not necessary, addreport_to=Noneto thetrain()function. 
from train import LoRATrainer trainer = LoRATrainer() trainer.set_lora(r=16, alpha=32, dropout=0.1) train_dataset, val_dataset = trainer.data_split('./data/data.txt', 0.2) trainer.train(train_dataset, val_dataset)
 - Training is linked to 
 
- This project utilized a dataset of the EngtoKor-Transliterator.
 - Opus-hplt-EN-KO was used as the base model for applying LoRA.
 
