MSGO

This repository provieds data and methods in the paper:
Pseudodata-based molecular structure generator to reveal unknown chemicals

Accepted for publication in Nature Machine Intelligence

Authors: Nanyang Yu†, Zheng Ma†, Qi Shao†, Laihui Li, Xuebing Wang, Bingcai Pan, Hongxia Yu and Si Wei‡

†: Equal contribution
‡: Corresponseing author

Setup

Environment

Python: 3.7
Torch: 1.7.1

Data

We provied For Training, we use 30k+ pseudo smiles-specturm pairs generated by cfmid (you can download the raw smiles lists file here). For evaluation, we use 300+ real specturm to verify our method (download here). For evaluation in real samples，we use one LC–QTOF dataset for wastewater samples to verify our model (download here, code: gmas).

Model weights

We provide the MSGO model (pfas, code: 0bfg; lipid, code: 37it) trained use pseudo smiles-specturm pairs with whole methods mentioned in paper. you also can train you own model with other methods.

Training

You can replicate our experiment, including all the techniques:

python tools/train.py --id all_trick --user_precurso 1 -- use_mask 1 --use_formual 1

More options can be viewed in opt.py

Evaluation

Download the model weights in ckpts/pfas or ckpts/lipid, run

python tools/eval.py --log_path [ckpt/pfas or ckpts/lipid]

Predict real data

We provide example data in data/example.

For pfas, run :

python tools/eval_standard.py --log_path ckpts/pfas --real_csv ./data/example/pfas.csv --out_csv ./pfas_results.csv --beam_size 500 --polar neg

For lipid, run:

python tools/eval_standard.py --log_path ckpts/lipid --real_csv ./data/example/lipid.csv --out_csv ./lipid_results.csv --beam_size 300 --polar pos

Then you can obatin a results csv file inluding top 10 predicts.

Todos

Release model weights
Release pseudo and real data
Release training process

Baseline models implementation

All the code is in baseline_models folders

For baseline_models/ms2mol

cd ms_bart 
python train.py

For massgenie and spec2mol

Training You can replicate our experiment with default settings, run

python tools/train.py

Evaluation You can run

python tools/utils_eval.py

Predict real data We provide an example.py for your reference. You can replace [data path] with your own data for prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
baseline_models		baseline_models
data/example		data/example
models		models
scripts		scripts
tools		tools
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base.csv		base.csv
dataloader.py		dataloader.py
opt.py		opt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MSGO

Setup

Environment

Data

Model weights

Training

Evaluation

Predict real data

Todos

Baseline models implementation

About

Uh oh!

Releases 1

Packages

Languages

License

aaronma2020/MSGO

Folders and files

Latest commit

History

Repository files navigation

MSGO

Setup

Environment

Data

Model weights

Training

Evaluation

Predict real data

Todos

Baseline models implementation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages