DrugOOD: OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Drug AI OOD Dataset Curator and Benchmark

DrugOOD: OOD Dataset Curator and Benchmark for AI-aided Drug Discovery

Project Description

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. Inspite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with noise, which is inevitable in real world AIDD applications.

In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

Keywords: AI-aided drug discovery (AIDD), graph OOD learning, OOD generalization, learning under noise, binding affinity prediction, drug-target interaction, virtual screening

Dataset Curator

overview_dataset

DrugOOD provides large-scale, realistic, and diverse datasets for Drug AI OOD research. Specifically, DrugOOD focuses on the problem of domain generalization, in which we train and test the model on disjoint domains, e.g., molecules in a new assay environment. Top Left: Based on the ChEMBL database, we present an automated dataset curator for customizing OOD datasets flexibly. Top Right: DrugOOD releases realized exemplar datasets spanning different domain shifts. In each dataset, each data sample $(x, y, d)$ is associated with a domain annotation d. We use the background colours lightblue and green to denote the seen data and unseen test data. Bottom: Examples with different noise levels from the DrugOOD dataset. DrugOOD identifies and annotates three noise levels (left to right: core, refined, general) according to several criteria, and as the level increases, data volume increases and more noisy sources are involved.

We construct all the datasets based on ChEMBL, which is a large-scale, open-access drug discovery database that aims to capture medicinal chemistry dataand knowledge across the pharmaceutical research and development process. We use thelatest release in the SQLite format: ChEMBL 29. Moreover, we consider the setting of OOD and different noise levels, which is an inevitable problem when the machine learning model is applied to the drug development process.For example, when predicting SBAP bioactivity in practice, the target protein used in the model inference could be very different from that in the training set and even does not belong to the same protein family. The real-world domain gap will invoke challenges to the accuracy of the model. On the other hand, the data used in the wild often have various kinds of noise, e.g. activities measured through experiments often have different confidence levels and different “cut-off” noise. Therefore, it is necessary to construct data sets with varying levels of noise inorder to better align with the real scenarios. curator Overview of the automated dataset curator is shown above. We mainly implement three major steps based on the ChEMBL data source: noise filtering, uncertainty processing, and domain splitting. We have built-in 96 configuration files to generate the realized datasets with the configuration of two tasks, three noise levels, four measurement types, and five domains.

Benchmarking

benchmark DrugOOD conducts a comprehensive benchmark for developing and evaluating OOD generalization algorithms for AIDD. After loading any of the datasets generated by the data curator, users can flexibly combine different types of modules, including algorithms, backbones, etc., to develop OOD generalization algorithms in a flexible and disciplined manner.

Experiments

In-distribution (ID) results correspond to the train-to-train setting. Parentheses show standard deviation across 3 replicates.


Baseline results on the dataset drugood-lbap-core-ic50-assay for the six OOD algorithms are shown below.

Algos Val(ID)-ACC Val(ID)-AUC Val(OOD)-ACC Val(OOD)-AUC Test(ID)-ACC Test(ID)-AUC Test(OOD)-ACC Test(OOD)-AUC
ERM 89.05 (0.35) 89.91 (1.78) 88.79 (2.23) 70.32 (0.80) 89.34 (0.38) 89.62 (2.04 82.14 (0.86) 71.98 (0.29)
IRM 88.14 (0.17) 82.82 (0.87) 90.67 (0.07) 68.23 (0.31) 88.39 (0.25) 83.10 (0.46 82.41 (0.20) 69.22 (0.51)
DeepCoral 88.59 (0.10) 88.10 (1.42) 91.09 (0.15) 70.26 (1.04) 88.88 (0.15) 88.23 (1.42 83.04 (0.08) 71.76 (0.60)
DANN 88.47 (0.20) 83.39 (1.15) 91.32 (0.38) 68.30 (0.22) 88.72 (0.16) 83.20 (1.28 83.22 (0.10) 70.08 (0.65)
Mixup 88.80 (0.52) 89.01 (2.06) 88.76 (1.92) 69.14 (0.56) 89.01 (0.43) 88.95 (2.17 81.65 (1.06) 71.34 (0.41)
GroupDro 88.80 (0.15) 89.42 (0.43) 89.81 (0.41) 70.34 (0.91) 88.96 (0.15) 89.24 (0.82 82.62 (0.23) 71.54 (0.46)

Baseline results on dataset drugood-lbap-core-ic50-scaffold for the six OOD algorithms:

Algos Val(ID)-ACC Val(ID)-AUC Val(OOD)-ACC Val(OOD)-AUC Test(ID)-ACC Test(ID)-AUC Test(OOD)-ACC Test(OOD)-AUC
ERM 97.04 (0.13) 94.84 (0.60) 85.11 (0.40) 78.96 (0.67) 90.51 (0.14) 87.15 (0.48 76.33 (0.64) 69.54 (0.52)
IRM 92.51 (4.34) 77.66 (0.79) 81.52 (4.25) 72.77 (0.66) 86.99 (3.93) 77.22 (0.32 72.96 (4.32) 64.94 (0.30)
DeepCoral 95.76 (0.29) 81.60 (1.45) 85.37 (0.47) 77.09 (0.26) 89.96 (0.08) 81.13 (0.49 76.90 (0.37) 68.54 (0.01)
DANN 95.89 (0.09) 77.09 (0.64) 85.13 (0.81) 75.04 (0.65) 89.86 (0.10) 77.30 (0.65 77.11 (0.66) 66.37 (0.20)
Mixup 97.19 (0.08) 95.51 (0.44) 85.55 (0.08) 79.42 (0.62) 90.74 (0.11) 87.35 (0.33 77.18 (0.19) 69.29 (0.24)
GroupDro 96.02 (0.12) 78.67 (2.75) 85.01 (0.64) 74.57 (0.60) 89.87 (0.18) 78.32 (1.09 76.18 (0.86) 66.67 (0.67)

Baseline results on dataset drugood-lbap-core-ic50-size for the six OOD algorithms:

Algos Val(ID)-ACC Val(ID)-AUC Val(OOD)-ACC Val(OOD)-AUC Test(ID)-ACC Test(ID)-AUC Test(OOD)-ACC Test(OOD)-AUC
ERM 93.99 (0.04) 92.91 (0.16) 84.22 (0.22) 81.50 (0.19) 93.75 (0.06) 92.35 (0.15 71.46 (0.67) 67.48 (0.47)
IRM 86.77 (4.05) 66.41 (1.79) 76.49 (4.36) 60.59 (0.29) 87.08 (3.72) 69.80 (1.74 66.39 (3.92) 57.00 (0.39)
DeepCoral 92.40 (0.06) 70.70 (1.68) 83.34 (0.27) 61.90 (0.39) 92.28 (0.20) 73.08 (0.98 72.36 (0.32) 57.31 (0.44)
DANN 91.31 (2.12) 80.13 (3.59) 81.99 (2.46) 73.73 (0.49) 91.12 (2.12) 78.53 (3.71 70.08 (3.50) 63.45 (0.18)
Mixup 94.23 (0.21) 92.89 (0.30) 84.64 (0.26) 81.79 (0.20) 93.88 (0.17) 92.48 (0.55 72.73 (0.68) 67.73 (0.27)
GroupDro 92.73 (0.17) 78.46 (0.91) 83.54 (0.52) 67.68 (0.42) 92.67 (0.27) 80.02 (0.44 72.64 (0.33) 60.90 (1.23)

Code and Document

Code: https://github.com/tencent-ailab/DrugOOD

Paper

Preprint: https://arxiv.org/abs/2201.09637

Contact

Email: DrugAIOOD@gmail.com