Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by GUERBET, M.
Right arrow Articles by GUYODO, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by GUERBET, M.
Right arrow Articles by GUYODO, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Ann. occup. Hyg., Vol. 46, No. 2, pp. 261-268, 2002
© 2002 British Occupational Hygiene Society
Published by Oxford University Press


Article

Efficiency of 22 Online Databases in the Search for Physicochemical, Toxicological and Ecotoxicological Information on Chemicals

MICHEL GUERBET* and GAETAN GUYODO

Rouen University, Faculty of Pharmacy, Laboratory of Toxicology, 22 boulevard Gambetta, F-76183 Rouen Cedex 1, France

Received 9 July 2001; in final form 12 September 2001.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
The objective of this study was to evaluate the efficiency of 22 free online databases that could be used for an exhaustive search of physicochemical, toxicological and/or ecotoxicological information about various chemicals. Twenty-two databases with free access on the Internet were referenced. We then selected 27 major physicochemical, toxicological and ecotoxicological criteria and 14 compounds belonging to seven different chemical classes which were used to interrogate all the databases. Two indices were successively calculated to evaluate the efficiency with taking or not taking account of their specialization. More than 50% of the 22 databases ‘knew’ all of the 14 chemicals, but the quantity of information provided is very different from one to the other and most are poorly documented. Two categories clearly appear with specialized and non-specialized databases. The HSDB database is the most efficient general database to be searched first, because it is well documented for most of the 27 criteria. However, some specialized databases (i.e. EXTOXNET, SOLVEDB, etc.) must be searched secondarily to find additional information.

Keywords: database; physicochemical; toxicological; ecotoxicological


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
In the field of environmental and industrial safety, it is necessary to be able to obtain quickly, reliable and complete data on very different chemicals to determine their physicochemical properties, toxicity and ecotoxicity, behavior in the environment and other characteristics. Many databases are accessible free of charge on the Internet and so can be easily consulted (Dassler, 2001). This multiplicity of databases is sometimes puzzling for users who are accustomed to one or a few familiar databases and who miss others which could be much more useful for their query (Cox et al., 1992; Voigt and Breuggermann, 1995; Ludl et al., 1996; Gehanno et al., 1998). It is indeed difficult to judge the efficiency of various databases, because criteria are often different from one to the other, some being very general and others much more specialized. Usually, the descriptions of databases are not enough to evaluate their efficiency and it is not easy for the user to find what is sought for (South, 2001). Only a concrete search for information in a database allows the user to judge its efficiency. The objective of our study is to list a series of databases with free access on the Internet and to compare them in an objective way to determine one or more effective databases to supply toxicological and/or ecotoxicological information. It is advisable, however, to note that the objective of this work is not, in the strict sense of the term, to validate the data supplied in these databases, because the validation of data is a difficult task which can only be achieved by a group of experts. Besides, it is not possible to make a complete comparison of the 22 databases we selected for this study. We therefore decided to choose 14 chemicals belonging to various chemical classes and to interrogate databases about a series of 27 major physicochemical, toxicological and ecotoxicological criteria. From the results of this inquiry, two indices are calculated to evaluate the efficiency of databases.


    MATERIAL AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Referenced databases
For this study, we chose 22 databases with free access on the Internet (Table 1). All databases are factual databases, specializing in toxicology and/or in ecotoxicology, and supervised by public or research institutions. This list is, nevertheless, not exhaustive but takes into account some major databases of which most are maintained by US organizations. There are many electronic sources of information available on the Internet and we are conscious that some major databases could have been omitted in this study. It would have been interesting to include other databases, especially from European organizations, but unfortunately there is not the same openness as is found in the USA with their Freedom of Information legislation. Finally, some databases are regularly inclined to appear or to disappear, so the update of this list should be continuous.


View this table:
[in this window]
[in a new window]
 
Table 1. List of the 22 tested online databases
 
Global evaluation
Recognizing the fact that these databases can be useful for different users, primarily hygienists and toxicologists but also safety practitioners, information specialists, trade union staff and others, it would be interesting to have a total evaluation of the database search facilities which enhance the quality of service in use. However, this global evaluation is difficult because it is principally based on more-or-less subjective criteria.

The first criterion which is useful to know is the organization, government department or agency from which the data emanate, although this is not a guarantee of accuracy.

The number of chemicals referenced in each database is another early criterion which is usually noted in a quick evaluation. However, the quantity and quality of information provided by a database is not necessarily related to this number. This is why we have evaluated and compared the databases after searching each of them for selected criteria and chemicals.

The system search was noted as good or poor. This subjective appreciation takes into account the search facilities (with chemical names, synonyms, CAS Nos, browse index, etc.) for a non-customary user.

The indication of bibliographic references in a database could be an interesting criterion for going further in the information query.

Most databases are constantly updated, but some do not alter. This criterion is not always easy to evaluate because, unfortunately, most databases do not specify when or if they update their information. A continuous study with regular querying of each database over a long period would be necessary to confirm this evaluation.

The last criterion noted was ‘user friendliness’. This is certainly a very subjective criterion, taking into account the ease of query and use and the clarity of result presentation. The result could differ from one user to another one, so this criterion must be carefully interpreted.

Searched criteria and chemicals
To compare the 22 databases, we selected 27 criteria which appear to be the most important environmental data for risk assessment. The selected criteria were:

physicochemical (11 criteria)—mol. wt, decomposition temperature, density, vapor pressure, water solubility, octanol/water partition coefficient (log Pow), Henry constant, biodegradability, Koc, time of environmental half-life and bioconcentration factor (BCF);

toxicological (eight criteria)—acute mammal toxicity, chronic mammal toxicity, genotoxicity, carcinogenicity, teratology, irritation, public health data (ADI, etc.) and industrial hygiene data (IDLH, TLV, etc.);

ecotoxicological (eight criteria)—aquatic toxicity (algae, crustaceans, fish), atmospheric toxicity (insects, birds) and ground toxicity (bacteria, molluscs, plants).

It is not possible to interrogate all the 22 databases about all the products that they contain. The selection of test chemicals is obviously essential and should be sensitive in comparing the efficiency of databases. For such a comparison, the usual way is to draw up a series of chemicals and to query databases for them. This random approach did not appear to be the most effective in our study, because about half of databases are of general interest while the others are specialized in a particular field (e.g. genotoxicity or ecotoxicity), or in a category of chemicals (e.g. solvents or pesticides). So a rational choice of test chemicals was needed to limit the risk of obtaining no positive response with some specialized databases which, nevertheless, give interesting results in some particular cases. Our investigation of the 22 databases was based on 14 compounds belonging to seven chemical classes (Table 2). Ten chemicals for the five first categories were selected to cover various industrial areas, with two pesticides (atrazine and malathion), two metallic salts (cadmium and tin), two solvents (chlorinated and non-chlorinated), two chemical synthesis compounds and two drugs. The sixth category, denoted ‘chemical family’, included two groups of chemical compounds—organic (xylenes) and mineral (mercury compounds)—often referenced in bibliographies under general and inexactly defined terms and including mixtures or similar products. The last category of products did not correspond to a precise class of compounds. It included products which were selected at random to take into account the relevance of databases to be used in all situations.


View this table:
[in this window]
[in a new window]
 
Table 2. List of the 14 selected test chemicals
 
Comparison of the 22 databases
The 22 referenced databases were interrogated about the 14 test chemicals using the 27 selected criteria. The presence (1) or the absence (0) of information for every criterion was noted for every database in a contingency table.

In most studies, the efficiency of a database is estimated by calculating the percentage of information found. This measure has the major inconvenience of penalizing specialized databases in comparison with non-specialized databases. That is why we used two different and additional indices to evaluate the efficiency of databases. These two indices we called ‘quality of the information’ (Qi index) and ‘power of the information’ (Pi index). They were calculated by taking or not taking into account the number of products present in each of the 22 databases.

The Qi index corresponds to the ratio of documented criteria for the chemicals which are found in a database:

Qi = no. of documented criteria/(27 x no. of products present).

The Qi index compares databases independently of the number of products present in each database. This index therefore allows us to equate specialized and non-specialized databases.

The Pi index corresponds to the ratio of documented criteria for all the 14 test chemicals. It corresponds to the classic measure used in comparative studies.

Pi = no. of documented criteria/(27 x 14).

For example, a database with information about nine products for which 180 data are found on the 27 criteria gives Qi = 180/(27 x 9) = 0.74 and Pi = 180/(27 x 14) = 0.48. Qi and Pi indices are relatively close to one another in their principles of calculation, the only difference comes from the consideration of the number of referenced products in a database. The Pi index is less than or equal to the Qi index, but the closer the index is to one, the more successful is the database. These two indices are interesting because they allow us to assess overall each of 22 listed databases. Four cases can arise, as follows.

Low Qi and low Pi: poorly documented databases, referencing only a small number of chemicals for which they provide little information. This category of database is, a priori, not very interesting or useful.

High Qi and low Pi: specialized databases which are well documented but on a small number of chemicals (e.g. pesticides, solvents, etc.). These databases are interesting as a second step to provide precise information. However, these databases cannot be considered as universal sources of information.

Low Qi and high Pi: databases referencing many chemicals on which they provide limited information. These databases are of limited interest because they are poor sources of information. They are, however, simple to use and are interesting in the early stages of searching for data.

High Qi and high Pi: databases referencing many chemicals with many documented criteria. These are powerful databases and are to be recommended.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
Global evaluation
The global evaluation of databases is noted in Table 3. Most of the 22 databases have a search system which is noted as good. However, four databases (CPDB, HCD, IARC, TOX-IN) did not include a browse index or a system search based on chemical name, synonym and CAS No., which could be useful for the user.


View this table:
[in this window]
[in a new window]
 
Table 3. Global evaluation of the 22 online databases
 
Nine of the databases provided bibliographic references. This is not an essential criterion because this study was based on factual databases. However, bibliographic references can be interesting to validate data and useful to carry forward in search for information.

Only five databases (EXTOXNET, ICSC, NTP, OSHAPEL, SOLVEDB) seemed to us not to be regularly updated. This important criterion is however difficult to assess for certain because up-to-dateness is not always clearly noted in the database introduction and could be only partial.

The last criterion noted in the table is ‘user friendliness’. This is the most subjective parameter of this global evaluation. It appears that 10 databases were noted as good, while 12 appeared not really friendly in terms of searching system or result presentation. But this criterion depends on the user’s opinion, which might be modified with training.

Searched criteria and chemicals
The 22 databases have been searched on the 27 criteria for the 14 test chemicals. Each positive response was scored 1 to obtain a global result table for the 22 databases (Table 4).


View this table:
[in this window]
[in a new window]
 
Table 4. Results of information search for the 27 criteria and 14 test chemicals with the 22 databases
 
It is interesting to compare the number of referenced products which are found in every database. None of the 22 databases documented all the 14 chemicals. The best score with 13 positive answers is obtained with HSDB, CCRIS and IPCS-INCHEM. HSDB and IPCS-INCHEM are non-specialized databases which have information about almost all criteria, even though CCRIS is a specialized database for genotoxicity and carcinogenicity. At the opposite extreme, the weakest results were obtained with EXTOXNET, NIOSHDOE and TOX-IN, which documented only two products.

An average of 7.2 documented chemicals per database was obtained, which is only a little more than half (51.4%) of the test compounds. This confirms that it is impossible to find an universal database able to provide all toxicological and/or ecotoxicological and/or physicochemical information on the 14 test compounds which cover different chemical groups. Moreover, information is often dissimilar. For example, data concerning algae may be available for one chemical, whereas information concerning plants is not available for this chemical but for another. We have chosen not to take account of this fact because it would have needed a multidimensional table with three axes for 22 databases, 27 criteria and 14 chemicals. This kind of table could be interesting, but it was not possible to exploit it in this study, whose main objective was to compare databases.

We also analysed the available information in the 22 databases searched, by calculating Qi and Pi indices for each of them.

Quality of the information (Qi index)
The Qi index is a measure of the quality of information in each database for the referenced chemicals. The closer the Qi index is to one, the more informative is the database in all physicochemical, toxicological and ecotoxicological domains. It does not take into account the number of documented chemicals, so that two databases can be scored with an equal Qi index even when they do not have an equal number of referenced chemicals.

Only four databases obtained a Qi index score >0.50 and the average quality index for all 22 databases was ~0.27, which is quite low (Fig. 1). The four databases which produced the best results (Qi > 0.50) were ATSDR, EXTOXNET, HSDB and SOLVEDB. ATSDR (Agency Toxic Substances and Disease Registry) is a database of the US Department of Health and Human Services. This database achieved a very good Qi index because it well documented for physicochemical and toxicological criteria. Unfortunately, ATSDR is documented for only 275 chemicals and only three test compounds were found in this database, which is not a good result. However, because the three documented chemicals (benzene, mercury and xylenes) are well documented for physicochemistry and toxicology, the high Qi index shown by this database does not correspond to its real efficiency. The same remark could be made for EXTOXNET (Extension TOXicology NETwork), which obtained a very good Qi index. EXTOXNET is a database maintained by the University of California–Davis, Oregon State University, Michigan State University, Cornell University and the University of Idaho. It is a specialized database which provides much physicochemical, toxicological and ecotoxicological information, but on only ~200 pesticides. In the same way, SOLVEDB is also a specialized database for ~300 commercially available solvents. HSDB (Hazardous Substances Data Bank) is a database on the National Library of Medicine’s (NLM) Toxicology Data Network (TOXNET). It is a general database which covers physicochemistry, toxicology and ecotoxicology for ~4500 various potentially hazardous chemicals. It is enhanced with information on human exposure, industrial hygiene, emergency handling procedures, environmental fate, regulatory requirements and related areas. In our query, HSDB was documented for 13 test chemicals and it is the only database for which all criteria are fulfilled, although it did not have information on all chemicals and all criteria.



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 1. Global comparison of the 22 databases according to their Qi and Pi indices.

 
As previously stated, the utility of the Qi index score lies not in the number of documented chemicals (this is included in the Pi index), but in estimation of the efficiency of databases for each of the three classes of criteria. Thus it is difficult to compare specialized and non-specialized databases with only a global Qi index. The NIOSH databases are obviously databases from which physicochemical and ecotoxicological information is excluded, but they are quite complete for toxicological data. The same remark could be made, in an opposite sense, about EPA databases, which are clearly specialized for ecotoxicological information. Therefore, as a global Qi index could be a biased estimation of the real quality of tested databases, we decided to determine a Qi index for each of the three categories of criteria (physicochemical, toxicological and ecotoxicological). It appears that only EXTOXNET, HSDB and IPCS-INCHEM provide data in all three categories (Fig. 2). Most of tested databases are documented for toxicology and some of them (IRIS, OSHAPEL, IARC, etc.) are clearly specialized for information on human health effects. On the other hand, ECOTOX is an EPA database that contains selected ecotoxicity information. This differential approach allows the user to select the most appropriate database for the information required. However, it is necessary to remember that the Qi index does not take into account the number of chemicals found. This is the reason for which some databases, such as ATSDR and EXTOXNET, appear to be very high-performing. In fact, these databases are well documented, but only for a very few of the 14 searched chemicals. These databases could be interesting, but there is a high probability that the searched-for chemical will not be found because of the small number of referenced chemicals. This is why our study has been completed with the determination of a Pi index which takes into account this parameter.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 2. Global comparison of the 22 databases according to their Qi indices for physicochemical, toxicological and ecotoxicological criteria.

 
Power of the information (Pi index)
The Pi index is a measure which corresponds to the percentage of documented criteria in a database for the 27 criteria, over the 14 test chemicals (Fig. 1). HSDB is the only database which obtains a score >0.50. As previously noted, HSDB is a general database able to supply complete information on a large range of 4500 chemicals. Next comes the IPCS-INCHEM database, which contains data from various sources which are pertinent to ~5500 major chemicals. Third is NTP (National Toxicology Program). This is a database on ~2000 chemicals, which is well documented for 11 products but only for physicochemical and toxicological data. Then are ICSC (International Chemical Safety Cards) of the NIOSH (National Institute for Occupational Safety and Health) and HSFS (Hazardous Substances Fact Sheets) of the New Jersey Department of Health, but these two databases are specialized in toxicological data. The Pi index is especially punishing for databases which are specialized on a category of chemicals (i.e. TELETOX, SOLVEDB, etc.) or on some precise criteria (i.e. GENETOX, GAP, etc.).


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 
The judicious choice of database is fundamental in the search for any information on the Internet. An ideal database would be one which referenced a large number of chemicals and physicochemical, toxicological and ecotoxicological information and which therefore obtains simultaneously high Pi and Qi indices. During this comparative study, HSDB was at the head of the database evaluation, with results of 0.71 (Qi) and 0.62 (Pi). HSDB is the most efficient database and so can be recommended for a global search for information about any chemical. The other databases either are poorly documented (Qi low) or are specialized databases (Qi high, Pi low) which, however, can be useful for searching for data on a very precise chemical category (e.g. SOLVEDB for solvents) or on particular criteria (e.g. IARC for carcinogen data). In conclusion, it appears that an efficient search for information on the Internet must be conducted on several databases. It is advisable to begin the search by interrogating a non-specialized database such as HSDB and then to look for more complete and/or more precise data on other specialized databases. The simultaneous exploitation of several databases can be made on the Internet through various websites, or via some Internet service providers who pool several databases. For this, the most interesting system of databases is TOXNET (Toxicology Data Network) of the National Library of Medicine, which can be found at http://toxnet.nlm.nih.gov/ (Wexler, 2001). TOXNET firstly provides a free access to HSDB (Hazardous Substances Data Bank), which is the most efficient of the 22 tested databases. It is also a free interface search for several specialized databases (IRIS, CCRIS, GENETOX), bibliographical databases (TOXLINE, EMIC, DART/ETIC), physicochemical data (ChemIDplus, HSDB Structures, NCI-3D) and environmental data (TRI). TOXNET doubtless constitutes one of the best and most efficient database resources for physicochemical, toxicological and ecotoxicological information on the Internet.


    FOOTNOTES
 
* To whom correspondence should be addressed. Tel: +33-2-35-14-86-11; fax: +33-2-35-14-86-11; e-mail: michel.guerbert{at}univ-rouen.fr Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 REFERENCES
 

Cox JJ, Dawson K J, Hobbs KEF. (1992) The electronic information revolution and how to exploit it. Br J Surg; 79: 1004–10.

Dassler WL. (2001) Using internet search engines and library catalogs to locate toxicology information. Toxicology; 157: 121–39.

Gehanno JF, Paris C, Thirion B, Caillard JF. (1998) Assessment of bibliographic databases’ performance in information retrieval for occupational and environmental toxicology. Occup Environ Med; 55: 562–6.[Abstract/Free Full Text]

Ludl H, Schope LH, Mangelsdorf I. (1996) Searching for information on toxicological data of chemical substances in selected bibliographic databases. Selection of essential databases for toxicological researches. Chemosphere; 32: 867–80.

South JC. (2001) Online resources for news about toxicology and other environmental topics. Toxicology; 157: 153–64.

Voigt K, Breuggermann R. (1995) Toxicology databases in the metadatabank of online databases. Toxicology; 100: 225–40.

Wexler P. (2001) TOXNET: an evolving web resource for toxicology and environmental health information. Toxicology; 157: 3–10.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by GUERBET, M.
Right arrow Articles by GUYODO, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by GUERBET, M.
Right arrow Articles by GUYODO, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?