T. Grear, D. Jacobs, C. Avery
The University of North Carolina at Charlotte,
United States
Keywords: Drug Discovery; Molecular Design; Function Recognition; Classification; Discriminant Analysis
Summary:
A novel machine learning (ML) method for discriminant analysis called Supervised Projective Learning for Orthogonal Congruences (SPLOC) will be described in the context of identifying functional dynamics within a molecule. The utility of SPLOC is benchmarked using synthetic data, compared with current ML methods such as Support Vector Machines, Orthogonal Projections to Latent Structures, Quadratic Discriminant Analysis, and Linear Discriminant Analysis. The synthetic data consists of molecular dynamics simulation trajectories of 24 small molecules with features that define functionality as well as features added to create decoy molecules. The question posed is whether dynamical motions can be identified from the simulation trajectories that distinguish functional from non-functional molecules. The procedure and protocols followed in this work parallels the rapidly growing quest in science for an integrated analysis scheme that combines data from computational models and experiment. A grand challenge for ML in bioinformatics is to recognize the causal connections between microscopic molecular properties of a system and its macroscopic outcomes and consequences. Once critical features that make a system functional are identified, computational modeling and classification can be employed in a bottom up approach for rational experimental design strategies. Posed as an inverse problem, questions about how to alter molecular properties to enhance functional attributes and mitigate undesirable attributes can be optimized. The primary challenges for ML that SPLOC addresses are dimensionality reduction of data without losing critical information from features hidden by many irrelevant degrees of freedom with respect to the function of interest, and how to identify critical features that do not separate into well-defined categories. Positive results obtained on the synthetic data suggest SPLOC is well suited for a wide array of applications such as protein engineering using iterative site-directed mutagenesis where successes and failures train an underlying neural network. With positive results on our synthetic data we have also employed SPLOC on some molecular dynamics simulations of three mutations of beta-lactamase. Beta-lactamase enzymes are the most common cause of antibiotic resistance in beta lactam antibiotics such as penicillin. Several point mutations to the wild type enzyme can confer a wider resistance profile to the molecule, and this results in the Extended Spectrum Resistance which poses a major challenge to antibiotic drug development. We have seen with SPLOC that several motions can be identified as different between the wild type and Extend Spectrum enzyme. KEYWORDS: Drug Discovery; Molecular Design; Function Recognition; Classification; Discriminant Analysis