![]() |
· COMP Symposium Grid · COMP Session Listing |
COMP 316 |
Automatic discovery and annotation of organic chemical names in patents |
| James W Cooper1, Stephen Boyer2, Alex Nevidomsky3, and Anni R Coden1. (1) Text Analytics, IBM T J Watson Research Center, PO Box 704, Yorktown Heights, NY 10598, (2) Life Sciences, IBM Corporation, 18710 Vista de Almaden, San Jose, CA 95120, (3) Languageware, IBM Ireland, UNIT 12, AIRWAYS INDUSTL ESTATE, CLOGHRAN COUNTY, Dublin 20, Ireland |
| We have designed a series of algorithms to recognize and annotate organic chemical names in technical documents, and have applied this system to 1 year of US patents. The system uses only two small dictionaries and is primarily rule-based. Once we have extracted these names, we can use one of several commercial products to convert these names to SMILES strings, which can then be loaded into a database. We can then use this database to allow searches of the patents by chemical substructure rather than by chemical name, thus providing a much more thorough search of the compounds mentioned in the patents. We will present evaluation data and demonstrate the search system in action.
|