Google Tech Talks November 2, 2006
ABSTRACT The central token of information in Chemistry is a chemical substance, an entity that can often be represented as a well-defined chemical structure. With InChI we have a means of representing this entity as a unique string of characters, which is otherwise represented by various of 2-D and 3-D chemical drawings, ‘connection tables’ and synonyms. InChI therefore represents a discrete physical entity, to which is associated as array of chemical properties and data. NIST has long been involved in disseminating chemical reference data associated with such discrete substances. A InChI is therefore the key index to this data. Many other types of data and information are also naturally tied to it, including biological information, commercial availability, toxicity, drug effectiveness and so forth. Because of the diversity of properties and interactions of a chemical substance, effective location of chemical information generally requires further qualifiers, which may be represented coarsely as a key word, but more precisely using a controlled vocabulary. There are no simple separations between information sought by difference disciplines and for different objectives. However, reference data may be organized according the disciplines most directly involved in making the measurements: -isolated substance – mass, infrared, NMR, spectra; physical properties -substance in the context of others – solubility, affinity, .. -properties of a mixture containing the substance The desired data can be a number, vector or image, usually associated with dimensions and links to source information. In some cases, this information is typically converted to a curve or diagram for use by an expert and may be further processed by specialized software. In other cases, a single numerical values is the target. Also, some complexities of structure that must be dealt with in practical search is represented in InChI, but must be decoded for use in searching.