Computational Models for

                                                   Predicting Human Toxicities




      Sean Ekins, M.Sc, Ph.D., D.Sc.

       Collaborations in Chemistry,
            Fuquay-Varina, NC.

  Collaborative Drug Discovery, Burlingame, CA.
School of Pharmacy, Department of Pharmaceutical
        Sciences, University of Maryland.
                   215-687-1320
              ekinssean@yahoo.com
Outline

• Key enablers
• What has been modeled – a quick review
• What will be modeled
• Future
Why Use Computational Models For Toxicology?



Goal of a model – Alert you to potential toxicity, enable you to
focus efforts on best molecules – reduce risk

Selection of model – trade off between interpretability,
insights for modifying molecules, speed of calculation and
coverage of chemistry space – applicability domain

Models can be built with proprietary, open and commercial
tools

software (descriptors + algorithms) + data = model/s

Human operator decides whether a model is acceptable
Key enablers: Hardware is getting smaller
                                                Laptop
                 1930’s

            Room size


                                                Netbook


             1980s
                                                Phone

          Desktop size


                                                Watch
               1990s
       Not to scale and not equivalent computing power – illustrates mobility
Key Enablers: More data available and open tools

 • Details
 • Details
What has been modeled
• Physicochemical properties, LogP, logD,
  Solubility, boiling point, melting point
• QSAR for various proteins, complex properties
• Homology models, Docking
• Expert systems
• Hybrid methods – combine different approaches
• Mutagenicity (Ames, micronucleus, clastogenicity,
  and DNA damage, developmental tox.. )
• Environmental Tox – Aquatic, dermatotoxicology
• Mixtures – using PBPK
Physicochemical properties
• Solubility data – 1000’s data in Literature
• Models median error ~0.5 log = experimental error
• LogP –tens of 1000’s data available
• Fragmental or whole molecule predictors
• All logP predictors are not equal. Median error ~ 0.3 log = experimental
  error
• People now accept solubility and LogP predictions as if real
ACD predictions + EpiSuite
predictions in
                                  •   Mobile molecular data
www.chemspider.com
                                      sheet
                                  •   Links to melting point
                                      predictor from open
                                      notebook science
                                  •   Required curation of data
Simple Rules
•   Rule of 5




•   Lipinski, Lombardo, Dominy, Feeney Adv. Drug Deliv. Rev. 23: 3-25 (1997).

•   AlogP98 vs PSA
•   Egan, Merz, Baldwin, J. Med. Chem. 43: 3867-3877 (2000)




•   Greater than ten rotatable bonds correlates with decreased rat oral bioavailability
•   Veber, Johnson, Cheng, Smith, Ward, Kopple. J Med Chem 45: 2515–2623, (2002)

•   Compounds with ClogP < 3 and total polar surface area > 75A2 fewer animal toxicity
    findings.
•   Hughes, et al. Bioorg Med Chem Lett 18, 4872-4875 (2008).
MetaPrint 2D in Bioclipse- free metabolism site predictor




Uses fingerprint
descriptors and
metabolite
database to learn
frequencies of
metabolites in
various
substructures




              L. Carlsson,et al., BMC Bioinformatics 2010, 11:362
QSAR for Various Proteins
• Enzymes – predominantly Cytochrome
  P450s - for drug-drug interactions
• Transporters – predominantly P-gp but some
  others e.g. OATP, BCRP -
• Receptors – PXR, CAR, for hepatotoxicity
• Ion Channels – predominantly hERG for
  cardiotoxicity
• Issues – initially small training sets – public
  data is a fraction of what drug companies
  have
Pharmacophores
                                                       CYP2B6
Ideal when we have few molecules for training          CYP2C9
                                                       CYP2D6
In silico database searching                           CYP3A4
                                                       CYP3A5
Accelrys Catalyst in Discovery Studio                  CYP3A7
                                                       hERG
Geometric arrangement of functional groups necessary   P-gp
                                                       OATPs
for a biological response
                                                       OCT1
                                                       OCT2
•Generate 3D conformations                             BCRP
•Align molecules                                       hOCTN2
•Select features contributing to activity              ASBT
•Regress hypothesis                                    hPEPT1
•Evaluate with new molecules                           hPEPT2
                                                       FXR
                                                       LXR
•Excluded volumes – relate to inactive molecules       CAR
                                                       PXR etc
hOCTN2 – Organic Cation transporter
                Pharmacophore
•   High affinity cation/carnitine transporter - expressed in kidney, skeletal muscle, heart,
    placenta and small intestine

•   Inhibition correlation with muscle weakness       - rhabdomyolysis
•   A common features pharmacophore developed with 7 inhibitors
•   Searched a database of over 600 FDA approved drugs - selected drugs for in vitro testing.
•   33 tested drugs predicted to map to the pharmacophore, 27 inhibited hOCTN2 in vitro
•   Compounds were more likely to cause rhabdomyolysis if the Cmax/Ki ratio was higher than
    0.0025




        Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
hOCTN2 – Organic Cation transporter
               Pharmacophore




Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
• QSAR Examples




Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
• Examples – P-gp
 Open source descriptors CDK and C5.0 algorithm
 ~60,000 molecules with P-gp efflux data from Pfizer
 MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820)
 Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972)

                    CDK +fragment descriptors              MOE 2D +fragment descriptors
 Kappa                       0.65                                     0.67
sensitivity                  0.86                                     0.86
specificity                  0.78                                      0.8
  PPV                        0.84                                     0.84

 Could facilitate model sharing?

Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
Time dependent inhibition for P450 3A4
•   Pfizer generated a large dataset (~2000 compounds) and went through sequential Bayesian model
    generation and testing cycles




     Test set 2 20 active in 156 compounds Combined both model predictions




                                              Zientek et al., Chem Res Toxicol 23: 664-676 (2010)
• 3A4 TDI


Indazole ring, the pyrazole,
and the methoxy-
aminopyridine rings are
important for TDI

Approach decreased in
vitro screening 30%

Helps identify reactive
metabolite forming
compounds
Zientek et al., Chem Res Toxicol 23: 664-676 (2010)
• Drug Induced Liver Injury Models
•   74 compounds - classification models (linear discriminant analysis, artificial neural
    networks, and machine learning algorithms (OneR))
     – Internal cross-validation (accuracy 84%, sensitivity 78%, and specificity 90%). Testing
       on 6 and 13 compounds, respectively > 80% accuracy.

                   (Cruz-Monteagudo et al., J Comput Chem 29: 533-549, 2008).

•   A second study used binary QSAR (248 active and 283 inactive) Support vector
    machine models –
     – external 5-fold cross-validation procedures and 78% accuracy for a set of 18
       compounds

                    (Fourches et al., Chem Res Toxicol 23: 171-183, 2010).

•   A third study created a knowledge base with structural alerts from 1266 chemicals.
     – Alerts created were used to predict results for 626 Pfizer compounds (sensitivity of
       46%, specificity of 73%, and concordance of 56% for the latest version)

                   (Greene et al., Chem Res Toxicol 23: 1215-1222, 2010).
• DILI Model - Bayesian
•   Laplacian-corrected Bayesian classifier models were generated using Discovery
    Studio (version 2.5.5; Accelrys).
•   Training set = 295, test set = 237 compounds

•   Uses two-dimensional descriptors to distinguish between compounds that are
    DILI-positive and those that are DILI-negative
     –   ALogP
     –   ECFC_6
     –   Apol
     –
     –
         logD
         molecular weight
                                                                Extended
     –   number of aromatic rings                               connectivity
     –   number of hydrogen bond acceptors
     –   number of hydrogen bond donors                         fingerprints
     –   number of rings
     –   number of rotatable bonds
     –   molecular polar surface area
     –   molecular surface area
     –   Wiener and Zagreb indices


Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
• DILI Bayesian

   Features in DILI +                      Features in DILI -




Avoid===Long aliphatic chains, Phenols, Ketones, Diols, α-methyl styrene,
           Conjugated structures, Cyclohexenones, Amides
Test set analysis




•    compounds of most interest
      – well known hepatotoxic drugs (U.S. Food and Drug Administration
        Guidance for Industry “Drug-Induced Liver Injury: Premarketing Clinical
        Evaluation,” 2009), plus their less hepatotoxic comparators, if clinically
        available.
    Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
What will be modeled

• Mitochondrial toxicity, hepatotoxicity,
• More Transporters – MATE, OATPs, BSEP..bigger datasets – driven by
  academia
• Screening centers – more data – more models
• Understanding differences between ligands for Nuclear Receptors
   – CAR vs PXR

• Models will become replacements for data as datasets expand (e.g. like
  logP)
• Toxicity Models used for Green Chemistry




             Chem Rev. 2010 Oct 13;110(10):5845-82
….Near Future
Wider use of models
New methods
Free tools – need good validation studies
Free databases – need to ensure structures / data are correct (DDT editorial Sept
2011)
Concepts perfected on desktop may migrate to apps e.g. collaboration
(MolSync+DropBox) Selective sharing of models
Computational ADME/Tox mobile apps?
More efficient tools




  Williams et al DDT in press 2011          Bunin & Ekins DDT 16: 643-645, 2011
Acknowledgments
•   University of Maryland
     – Lei Diao
     – James E. Polli
•   Pfizer
     –   Rishi Gupta
     –   Eric Gifford
     –   Ted Liston
     –   Chris Waller
•   Merck
     – Jim Xu

•   Antony J. Williams (RSC)

•   Accelrys
•   CDD

•   Email: ekinssean@yahoo.com

•   Slideshare: http://www.slideshare.net/ekinssean

•   Twitter: collabchem

•   Blog: http://www.collabchem.com/

•   Website:
    http://www.collaborations.com/CHEMISTRY.HTM

SOT short course on computational toxicology

  • 1.
    Computational Models for Predicting Human Toxicities Sean Ekins, M.Sc, Ph.D., D.Sc. Collaborations in Chemistry, Fuquay-Varina, NC. Collaborative Drug Discovery, Burlingame, CA. School of Pharmacy, Department of Pharmaceutical Sciences, University of Maryland. 215-687-1320 ekinssean@yahoo.com
  • 2.
    Outline • Key enablers •What has been modeled – a quick review • What will be modeled • Future
  • 3.
    Why Use ComputationalModels For Toxicology? Goal of a model – Alert you to potential toxicity, enable you to focus efforts on best molecules – reduce risk Selection of model – trade off between interpretability, insights for modifying molecules, speed of calculation and coverage of chemistry space – applicability domain Models can be built with proprietary, open and commercial tools software (descriptors + algorithms) + data = model/s Human operator decides whether a model is acceptable
  • 4.
    Key enablers: Hardwareis getting smaller Laptop 1930’s Room size Netbook 1980s Phone Desktop size Watch 1990s Not to scale and not equivalent computing power – illustrates mobility
  • 5.
    Key Enablers: Moredata available and open tools • Details • Details
  • 6.
    What has beenmodeled • Physicochemical properties, LogP, logD, Solubility, boiling point, melting point • QSAR for various proteins, complex properties • Homology models, Docking • Expert systems • Hybrid methods – combine different approaches • Mutagenicity (Ames, micronucleus, clastogenicity, and DNA damage, developmental tox.. ) • Environmental Tox – Aquatic, dermatotoxicology • Mixtures – using PBPK
  • 7.
    Physicochemical properties • Solubilitydata – 1000’s data in Literature • Models median error ~0.5 log = experimental error • LogP –tens of 1000’s data available • Fragmental or whole molecule predictors • All logP predictors are not equal. Median error ~ 0.3 log = experimental error • People now accept solubility and LogP predictions as if real ACD predictions + EpiSuite predictions in • Mobile molecular data www.chemspider.com sheet • Links to melting point predictor from open notebook science • Required curation of data
  • 8.
    Simple Rules • Rule of 5 • Lipinski, Lombardo, Dominy, Feeney Adv. Drug Deliv. Rev. 23: 3-25 (1997). • AlogP98 vs PSA • Egan, Merz, Baldwin, J. Med. Chem. 43: 3867-3877 (2000) • Greater than ten rotatable bonds correlates with decreased rat oral bioavailability • Veber, Johnson, Cheng, Smith, Ward, Kopple. J Med Chem 45: 2515–2623, (2002) • Compounds with ClogP < 3 and total polar surface area > 75A2 fewer animal toxicity findings. • Hughes, et al. Bioorg Med Chem Lett 18, 4872-4875 (2008).
  • 9.
    MetaPrint 2D inBioclipse- free metabolism site predictor Uses fingerprint descriptors and metabolite database to learn frequencies of metabolites in various substructures L. Carlsson,et al., BMC Bioinformatics 2010, 11:362
  • 10.
    QSAR for VariousProteins • Enzymes – predominantly Cytochrome P450s - for drug-drug interactions • Transporters – predominantly P-gp but some others e.g. OATP, BCRP - • Receptors – PXR, CAR, for hepatotoxicity • Ion Channels – predominantly hERG for cardiotoxicity • Issues – initially small training sets – public data is a fraction of what drug companies have
  • 11.
    Pharmacophores CYP2B6 Ideal when we have few molecules for training CYP2C9 CYP2D6 In silico database searching CYP3A4 CYP3A5 Accelrys Catalyst in Discovery Studio CYP3A7 hERG Geometric arrangement of functional groups necessary P-gp OATPs for a biological response OCT1 OCT2 •Generate 3D conformations BCRP •Align molecules hOCTN2 •Select features contributing to activity ASBT •Regress hypothesis hPEPT1 •Evaluate with new molecules hPEPT2 FXR LXR •Excluded volumes – relate to inactive molecules CAR PXR etc
  • 12.
    hOCTN2 – OrganicCation transporter Pharmacophore • High affinity cation/carnitine transporter - expressed in kidney, skeletal muscle, heart, placenta and small intestine • Inhibition correlation with muscle weakness - rhabdomyolysis • A common features pharmacophore developed with 7 inhibitors • Searched a database of over 600 FDA approved drugs - selected drugs for in vitro testing. • 33 tested drugs predicted to map to the pharmacophore, 27 inhibited hOCTN2 in vitro • Compounds were more likely to cause rhabdomyolysis if the Cmax/Ki ratio was higher than 0.0025 Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
  • 13.
    hOCTN2 – OrganicCation transporter Pharmacophore Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)
  • 14.
    • QSAR Examples GuptaRR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
  • 15.
    • Examples –P-gp Open source descriptors CDK and C5.0 algorithm ~60,000 molecules with P-gp efflux data from Pfizer MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820) Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972) CDK +fragment descriptors MOE 2D +fragment descriptors Kappa 0.65 0.67 sensitivity 0.86 0.86 specificity 0.78 0.8 PPV 0.84 0.84 Could facilitate model sharing? Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
  • 16.
    Time dependent inhibitionfor P450 3A4 • Pfizer generated a large dataset (~2000 compounds) and went through sequential Bayesian model generation and testing cycles Test set 2 20 active in 156 compounds Combined both model predictions Zientek et al., Chem Res Toxicol 23: 664-676 (2010)
  • 17.
    • 3A4 TDI Indazolering, the pyrazole, and the methoxy- aminopyridine rings are important for TDI Approach decreased in vitro screening 30% Helps identify reactive metabolite forming compounds Zientek et al., Chem Res Toxicol 23: 664-676 (2010)
  • 18.
    • Drug InducedLiver Injury Models • 74 compounds - classification models (linear discriminant analysis, artificial neural networks, and machine learning algorithms (OneR)) – Internal cross-validation (accuracy 84%, sensitivity 78%, and specificity 90%). Testing on 6 and 13 compounds, respectively > 80% accuracy. (Cruz-Monteagudo et al., J Comput Chem 29: 533-549, 2008). • A second study used binary QSAR (248 active and 283 inactive) Support vector machine models – – external 5-fold cross-validation procedures and 78% accuracy for a set of 18 compounds (Fourches et al., Chem Res Toxicol 23: 171-183, 2010). • A third study created a knowledge base with structural alerts from 1266 chemicals. – Alerts created were used to predict results for 626 Pfizer compounds (sensitivity of 46%, specificity of 73%, and concordance of 56% for the latest version) (Greene et al., Chem Res Toxicol 23: 1215-1222, 2010).
  • 19.
    • DILI Model- Bayesian • Laplacian-corrected Bayesian classifier models were generated using Discovery Studio (version 2.5.5; Accelrys). • Training set = 295, test set = 237 compounds • Uses two-dimensional descriptors to distinguish between compounds that are DILI-positive and those that are DILI-negative – ALogP – ECFC_6 – Apol – – logD molecular weight Extended – number of aromatic rings connectivity – number of hydrogen bond acceptors – number of hydrogen bond donors fingerprints – number of rings – number of rotatable bonds – molecular polar surface area – molecular surface area – Wiener and Zagreb indices Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
  • 20.
    • DILI Bayesian Features in DILI + Features in DILI - Avoid===Long aliphatic chains, Phenols, Ketones, Diols, α-methyl styrene, Conjugated structures, Cyclohexenones, Amides
  • 21.
    Test set analysis • compounds of most interest – well known hepatotoxic drugs (U.S. Food and Drug Administration Guidance for Industry “Drug-Induced Liver Injury: Premarketing Clinical Evaluation,” 2009), plus their less hepatotoxic comparators, if clinically available. Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
  • 22.
    What will bemodeled • Mitochondrial toxicity, hepatotoxicity, • More Transporters – MATE, OATPs, BSEP..bigger datasets – driven by academia • Screening centers – more data – more models • Understanding differences between ligands for Nuclear Receptors – CAR vs PXR • Models will become replacements for data as datasets expand (e.g. like logP) • Toxicity Models used for Green Chemistry Chem Rev. 2010 Oct 13;110(10):5845-82
  • 23.
    ….Near Future Wider useof models New methods Free tools – need good validation studies Free databases – need to ensure structures / data are correct (DDT editorial Sept 2011) Concepts perfected on desktop may migrate to apps e.g. collaboration (MolSync+DropBox) Selective sharing of models Computational ADME/Tox mobile apps? More efficient tools Williams et al DDT in press 2011 Bunin & Ekins DDT 16: 643-645, 2011
  • 24.
    Acknowledgments • University of Maryland – Lei Diao – James E. Polli • Pfizer – Rishi Gupta – Eric Gifford – Ted Liston – Chris Waller • Merck – Jim Xu • Antony J. Williams (RSC) • Accelrys • CDD • Email: ekinssean@yahoo.com • Slideshare: http://www.slideshare.net/ekinssean • Twitter: collabchem • Blog: http://www.collabchem.com/ • Website: http://www.collaborations.com/CHEMISTRY.HTM