Skip to content

aaronma2020/MSGO

Repository files navigation

MSGO

This repository provieds data and methods in the paper:
Pseudodata-based molecular structure generator to reveal unknown chemicals

Accepted for publication in Nature Machine Intelligence

Authors: Nanyang Yu†, Zheng Ma†, Qi Shao†, Laihui Li, Xuebing Wang, Bingcai Pan, Hongxia Yu and Si Wei‡

†: Equal contribution
‡: Corresponseing author

Setup

Environment

Python: 3.7
Torch: 1.7.1

Data

We provied For Training, we use 30k+ pseudo smiles-specturm pairs generated by cfmid (you can download the raw smiles lists file here). For evaluation, we use 300+ real specturm to verify our method (download here). For evaluation in real samples,we use one LC–QTOF dataset for wastewater samples to verify our model (download here, code: gmas).

Model weights

We provide the MSGO model (pfas, code: 0bfg; lipid, code: 37it) trained use pseudo smiles-specturm pairs with whole methods mentioned in paper. you also can train you own model with other methods.

Training

You can replicate our experiment, including all the techniques:

python tools/train.py --id all_trick --user_precurso 1 -- use_mask 1 --use_formual 1

More options can be viewed in opt.py

Evaluation

Download the model weights in ckpts/pfas or ckpts/lipid, run

python tools/eval.py --log_path [ckpt/pfas or ckpts/lipid]

Predict real data

We provide example data in data/example.

For pfas, run :

python tools/eval_standard.py --log_path ckpts/pfas --real_csv ./data/example/pfas.csv --out_csv ./pfas_results.csv --beam_size 500 --polar neg

For lipid, run:

python tools/eval_standard.py --log_path ckpts/lipid --real_csv ./data/example/lipid.csv --out_csv ./lipid_results.csv --beam_size 300 --polar pos

Then you can obatin a results csv file inluding top 10 predicts.


Todos

  • Release model weights
  • Release pseudo and real data
  • Release training process

Baseline models implementation

All the code is in baseline_models folders

For baseline_models/ms2mol

cd ms_bart 
python train.py

For massgenie and spec2mol

Training You can replicate our experiment with default settings, run

python tools/train.py

Evaluation You can run

python tools/utils_eval.py

Predict real data We provide an example.py for your reference. You can replace [data path] with your own data for prediction.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages