Molecular Geometry

A redux of molecular geometry intended for writing a software module for describing the properties.

Key Principles

Mathematical Chemistry

wikipedia: Mathematical Chemistry

Topology

wikipedia: Topology (chemistry)

In chemistry, topology provides a convenient way of describing and predicting the molecular structure within the constraints of three-dimensional (3-D) space. Given the determinants of chemical bonding and the chemical properties of the atoms, topology provides a model for explaining how the atoms ethereal wave functions must fit together. Molecular topology is a part of mathematical chemistry dealing with the algebraic description of chemical compounds so allowing a unique and easy characterization of them.

Topology is insensitive to the details of a scalar field, and can often be determined using simplified calculations. Scalar fields such as electron density, Madelung field, covalent field and the electrostatic potential can be used to model topology.[1]

Each scalar field has its own distinctive topology and each provides different information about the nature of chemical bonding and structure. The analysis of these topologies, when combined with simple electrostatic theory and a few empirical observations, leads to a quantitative model of localized chemical bonding. In the process, the analysis provides insights into the nature of chemical bonding.

Applied topology explains how large molecules reach their final shapes and how biological molecules achieve their activity.

Circuit topology is a topological property of folded linear polymers. This notion has been applied to structural analysis of biomolecules such as proteins and RNAs.

Topological Index

wikipedia: Topological Index

In the fields of chemical graph theory, molecular topology, and mathematical chemistry, a topological index also known as a connectivity index is a type of a molecular descriptor that is calculated based on the molecular graph of a chemical compound.[1] Topological indices are numerical parameters of a graph which characterize its topology and are usually graph invariant. Topological indices are used for example in the development of quantitative structure-activity relationships (QSARs) in which the biological activity or other properties of molecules are correlated with their chemical structure.[2]

Chemical Graph Theory

wikipedia: Chemical Graph Theory

Chemical graph theory is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena.[1] The pioneers of the chemical graph theory are Alexandru Balaban, Ante Graovac, Ivan Gutman, Haruo Hosoya, Milan Randić and Nenad Trinajstić[2] (also Harry Wiener and others). In 1988, it was reported that several hundred researchers worked in this area producing about 500 articles annually. A number of monographs have been written in the area, including the two-volume comprehensive text by Trinajstic, Chemical Graph Theory, that summarized the field up to mid-1980s.[3]

The adherents of the theory maintain that the properties of a chemical graph (i.e., a graph-theoretical representation of a molecule) give valuable insights into the chemical phenomena. The opponents contend that graphs play only a fringe role in chemical research.[4] One variant of the theory is the representation of materials as infinite Euclidean graphs, particularly crystals by periodic graphs.

Cheminformatics

wikipedia: Cheminformatics

Cheminformatics (also known as chemoinformatics, chemioinformatics and chemical informatics) is the use of computer and informational techniques applied to a range of problems in the field of chemistry. These in silico techniques are used, for example, in pharmaceutical companies in the process of drug discovery. These methods can also be used in chemical and allied industries in various other forms.

Cheminformatics combines the scientific working fields of chemistry, computer science and information science for example in the areas of topology, chemical graph theory, information retrieval and data mining in the chemical space.[5][6][7][8] Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.

Chemical Space

Concept: Theoretical Space vs. Concrete Space.

wikipedioa: Chemical Space

Chemical space is a concept in cheminformatics referring to the property space spanned by all possible molecules and chemical compounds adhering to a given set of construction principles and boundary conditions.

Molecule Mining

This article lists 20 different computer programming models for mining molecules.

wikipedia: molecule mining

Molecular query methods: Warmr[12][13], AGM[14][15], PolyFARM[16], FSG[17][18], MolFea[19], MoFa/MoSS[20][21][22], Gaston[23], LAZAR[24], ParMol[25] (contains MoFa, FFSM, gSpan, and Gaston), optimized gSpan[26][27], SMIREP[28], DMax[29], SAm/AIm/RHC[30], AFGen[31], gRed[32], G-Hash[33]

Methods based on special architectures of neural networks BPZ[34][35], ChemNet[36], CCS[37][38], MolNet[39], Graph machines[40]

This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.

Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.

Molecular Query Language

wikipedia: Molecular Query Language

The Molecular Query Language (MQL) was designed to allow more complex, problem-specific search methods in chemoinformatics. In contrast to the widely used SMARTS queries, MQL provides for the specification of spatial and physicochemical properties of atoms and bonds. Additionally, it can easily be extended to handle non-atom-based graphs, also known as "reduced feature" graphs. The query language is based on an extended Backus–Naur form (EBNF) using JavaCC.

SMARTS: Smiles arbitrary target specification

The article has tons of detail about SMARTS which is apparently a contrasting method to MQL (Molecular Query Language).

wikipedia: Smiles arbitrary target specification

SMiles ARbitrary Target Specification (SMARTS) is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing.

SMARTS is related to the SMILES line notation[1] that is used to encode molecular structures and like SMILES was originally developed by David Weininger and colleagues at Daylight Chemical Information Systems. The most comprehensive descriptions of the SMARTS language can be found in Daylight's SMARTS theory manual,[2] tutorial [3] and examples.[4] OpenEye Scientific Software has developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor (see cyclicity below) is defined. The standard may require updates to accommodate for new the elements named in 2016.[5]

Transferability

Both bond length and bond angle are transferable.

The opposite of transferable is conserved e.g. standard atomic weight.

From: wikipedia: Tranferability (chemistry)

Transferability, in chemistry, is the assumption that a chemical property that is associated with an atom or a functional group in a molecule will have a similar (but not identical) value in a variety of different circumstances.

My ideas about constraints

you could use experimental bond angle ranges predictively
how much synthesis/prediction can really be done
how often do symmetric molecules 'flip flop' or are they really static? what effect from/on brownian motion?

Resources

wikipedia: Molecular Geometry