English-to-Korean Transliterator | 영한 음역기

This project focuses on the English-to-Korean transliteration task (e.g. english -> 잉글리시). Specifically, this project accelerates inference speed and reduces the large model size (~1.2GB) of the previous MT5-based Transliterator.

Highlights
1. Providing lightweight model size (~400MB) with faster, accurate transliteration results.
  - Please check performance comparisons below.
2. Applying LoRA to the MarianMT translation model.
  - The corresponding fine-tuned model is available in HuggingFace.
Performance Comparisons

How to Start

Install dependencies (Assuming PyTorch is already installed):
```
pip install -r requirements.txt
```

Clone Repository:

git clone https://github.com/feVeRin/enko_transliterator.git

How to Use

Transliteration (w/ pre-trained model)

from transliteration import Transliterator

model = Transliterator.from_pretrained('feVeRin/enko-transliterator')
result = model.transliterate('LORA IS ALL YOU NEED')
print(result)  # 로라 이즈 올 유 니드

Model Training (from scratch)

Training is linked to wandb. If not necessary, add report_to=None to the train() function.

from train import LoRATrainer

trainer = LoRATrainer()
trainer.set_lora(r=16, alpha=32, dropout=0.1)
train_dataset, val_dataset = trainer.data_split('./data/data.txt', 0.2)
trainer.train(train_dataset, val_dataset)

References

This project utilized a dataset of the EngtoKor-Transliterator.
Opus-hplt-EN-KO was used as the base model for applying LoRA.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
transliteration.py		transliteration.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

English-to-Korean Transliterator | 영한 음역기

How to Start

How to Use

References

About

Uh oh!

Languages

License

feVeRin/enko_transliterator

Folders and files

Latest commit

History

Repository files navigation

English-to-Korean Transliterator | 영한 음역기

How to Start

How to Use

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages