Skip to content

feVeRin/enko_transliterator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English-to-Korean Transliterator | 영한 음역기

▶ README: ENGLISH | 한국어

This project focuses on the English-to-Korean transliteration task (e.g. english -> 잉글리시). Specifically, this project accelerates inference speed and reduces the large model size (~1.2GB) of the previous MT5-based Transliterator.

  • Highlights

    1. Providing lightweight model size (~400MB) with faster, accurate transliteration results.
      • Please check performance comparisons below.
    2. Applying LoRA to the MarianMT translation model.
      • The corresponding fine-tuned model is available in HuggingFace.
  • Performance Comparisons

    image

How to Start

  1. Install dependencies (Assuming PyTorch is already installed):

    pip install -r requirements.txt
  2. Clone Repository:

    git clone https://github.com/feVeRin/enko_transliterator.git

How to Use

  1. Transliteration (w/ pre-trained model)

    from transliteration import Transliterator
    
    model = Transliterator.from_pretrained('feVeRin/enko-transliterator')
    result = model.transliterate('LORA IS ALL YOU NEED')
    print(result)  # 로라 이즈 올 유 니드
  2. Model Training (from scratch)

    • Training is linked to wandb. If not necessary, add report_to=None to the train() function.
    from train import LoRATrainer
    
    trainer = LoRATrainer()
    trainer.set_lora(r=16, alpha=32, dropout=0.1)
    train_dataset, val_dataset = trainer.data_split('./data/data.txt', 0.2)
    trainer.train(train_dataset, val_dataset)

References

About

Lightweight English-to-Korean Transliterator

Topics

Resources

License

Stars

Watchers

Forks

Languages