ACS Logo
229th ACS National Meeting, in San Diego, CA, March 13-17, 2005

COMP 316

Automatic discovery and annotation of organic chemical names in patents

James W Cooper1, Stephen Boyer2, Alex Nevidomsky3, and Anni R Coden1. (1) Text Analytics, IBM T J Watson Research Center, PO Box 704, Yorktown Heights, NY 10598, (2) Life Sciences, IBM Corporation, 18710 Vista de Almaden, San Jose, CA 95120, (3) Languageware, IBM Ireland, UNIT 12, AIRWAYS INDUSTL ESTATE, CLOGHRAN COUNTY, Dublin 20, Ireland
We have designed a series of algorithms to recognize and annotate organic chemical names in technical documents, and have applied this system to 1 year of US patents. The system uses only two small dictionaries and is primarily rule-based. Once we have extracted these names, we can use one of several commercial products to convert these names to SMILES strings, which can then be loaded into a database. We can then use this database to allow searches of the patents by chemical substructure rather than by chemical name, thus providing a much more thorough search of the compounds mentioned in the patents. We will present evaluation data and demonstrate the search system in action.

 


© 2005, American Chemical Society. All rights reserved.