Answering Family Physicians’ Clinical Questions Using Electronic Medical Databases
Testing
Two family physicians with experience in computer searching (B.A., D.W.) independently searched for answers using each of the included databases. In the case of DynaMed, for which Dr Alper is the medical director, another family physician was substituted as a searcher, and Dr Alper had no input or control over the testing or arbitration process for answers from DynaMed. Testing took place in April and May 2000.
Searching was performed using computers with Pentium III processors with a 100 megabyte-per-second network connection to the Internet and server-mounted CD-ROMs.
Each searcher used the same 20 questions to evaluate each database. The order of evaluation of the databases was at the discretion of the searchers, but the testing of a database was completed before starting the testing of another database. Searchers became familiar with each database before testing it by using the 5 screening questions.
A maximum of 10 minutes was allowed per question. Each answer was rated as adequate or inadequate. An answer was considered adequate if it contained sufficient information to guide clinical practice. For example, for the question “How do I determine the cause of chronic pruritus?”, the answer from the University of Iowa Family Practice Handbook (www.vh.org/Providers/ClinRef/FPHandbook/Chapter13/01-13.html) was considered adequate, because it included clinically useful recommendations: History should include details about (1) any skin lesions preceding the pruritus; (2) history of weight loss, fatigue, fever, malaise; (3) any recent stress emotionally; and (4) recent medications and travel. Physical examination with emphasis on the skin and its appendages — xerosis, excoriation, lichenification, hydration. Laboratory tests as suggested by the PE, which may include CBC, ESR, fasting glucose, renal or liver function tests, hepatitis panel, thyroid tests, stool for parasites, CXR.
Sources that provided general recommendations without information that could specifically guide clinical practice were considered inadequate. For example: “The cause of generalized pruritus should be sought and corrected. If no skin disease is apparent, a systemic disorder or drug-related cause should be sought.” The searcher recorded the answer and the time it took to obtain it rounded to the nearest number of minutes (1-10).
Scoring and Arbitration
The 2 physician searchers judged the adequacy of the answers to each question for each database. If the searchers both found adequate answers, the result was accepted as adequate, and the average time required to find and interpret the answer was recorded. If neither searcher found an adequate answer, then the answer was deemed inadequate. If only one searcher found an adequate answer, the second searcher evaluated that answer. If the answer was acceptable to the second searcher, it was considered an adequate answer, and the time for the first searcher was recorded.
When searchers disagreed on the adequacy of identified answers, an arbitration panel consisting of 3 family physicians who were not affiliated with any of the databases met independently from the searchers to determine the adequacy of the answers by consensus.
Analysis
Our primary outcome was the proportion of questions adequately answered by each database. We calculated 95% confidence limits for the proportions of adequate answers.7 Means and medians were determined for the time to reach adequate answers for each database. We calculated the k statistic for the independent findings of the 2 searchers and for the results after the searchers reviewed each other’s searches.8 We combined the results of individual databases to determine the proportion of questions answered by all combinations of 2, 3, and 4 databases. We considered the question adequately answered if any of the individual databases adequately answered the question.
Results
Thirty-eight databases were nominated, and 24 did not meet our inclusion criteria Table W1.* Fourteen databases met the inclusion criteria Table 3 and were evaluated with the set of 20 questions (280 answer assessments) by 2 searchers. The Figure summarizes the process of evaluating the answers. The initial agreement between searchers was good k=0.69). Discussion between the searchers resolved 21 (52.5%) of the 40 discrepant answer assessments. These were due to inadequate searching or timing out (searching for 10 minutes) by one searcher, who agreed with the adequacy of the answer found by the other searcher. The agreement between searchers at this stage was excellent (k= 0.94).
The remaining 19 discrepant assessments (for which the searchers had different opinions regarding the adequacy of the answers identified) were referred to the arbitration panel for determination of the final results. Ten of these were deemed adequate.
Results for individual databases in rank order of proportion of questions answered followed by average time to identify adequate answers are reported in Table 3. The combination of STAT!Ref and MDConsult could answer 85% of our set of 20 questions. Four combinations of 2 databases (STAT!Ref and either MAXX, MDChoice.com, Primary Care Guidelines, or Medscape) could answer 80% of our questions. Two combinations of 3 databases (STAT!Ref, MDConsult, and either DynaMed or MAXX) could answer 90% of our questions. Combinations of 4 databases answered the most sample questions (95%, 19/20). These combinations consisted of STAT!Ref, DynaMed, MAXX, and either MDConsult or American Family Physician.