USMLE Session Biostatistics March 12, 2014 The city of Cancerville had a population of 10,000,000 (50% women) in 1995. In 1995, there were 80,000 women with previously diagnosed ovarian cancer in Cancerville. Twenty thousand new cases of ovarian cancer were diagnosed in 1995. What was the incidence rate of ovarian cancer in Cancerville in 1995? (A) 2000 per One hundred thousand population (B) 4000 per One hundred thousand population (C) 200 per One hundred thousand population (D) 400 per One hundred thousand population (E) 1,000 per One hundred thousand population D 400 per One hundred thousand population The incidence rate is the number of new cases of a disease during a specific period per population at risk. Twenty thousand divided by 5 million women gives a rate of 1 case per 250 women, or 400 cases per One hundred thousand populations. A laboratory has developed a new test for rapid ascertainment of serum parathyroid hormone levels. The test is repeated twenty times on the same sample with a resulting coefficient of variation of one percent. This is a measure of (A) Accuracy (B) Reliability (C) Precision (D) Validity (E) Mode B Reliability -The mode is the most commonly occurring value in a series of data. -Reliability is a measure of the reproducibility of a test over different conditions. -Accuracy is a measure of the extent to which a test approximates the real value of that which is measured. New tests are measured against the gold standard, if one exists. -Validity is the assessment of the degree to which a test measures that for which it was designed. In other words, you need to determine whether it reflect the outcome of interest or other outcomes. -Precision is the degree to which a measurement is not subject to random variation. At a large university, a study of pulse rates at rest was conducted on 5000 students. The mean pulse rate was 70, with a standard deviation of 10. Which of the following statements is true? (A) Approximately 95% of the students had pulses between 60 and 80 (B) Approximately 68% of the students had pulses between 60 and 80 (C) Approximately 99.7% of the students had pulses between 50 and 90 (D) Approximately 95% of the students had pulses between 40 and 100 (E) Approximately 68% of the students had pulses between 50 and 90 B Approximately 68% of the students had pulses between 60 and 80 When a test is conducted on a normally distributed population, 68% of the population will have values within one standard deviation of the mean, 95% of the population will have values within two standard deviations of the mean, and 99.7% of the population will have values within three standard deviations of the mean. Therefore, in this population, 68% of the pulses will be between 60 and 80, 95% between 50 and 90, and 99.7% between 40 and 100. A statistician analyzes data for several academic departments. She is free to choose the appropriate methodology to her perform her analyses. Which of the following data would best be analyzed by non-parametric statistical methods? (A) Results of a study on the effect of a new lipid-lowering drug on LDL cholesterol (B) Results of a study on the effect of asbestos exposure on forced vital capacity (C) Results of a study on the relationship between gender and lung cancer (D) Results of a study on the differences in weight distributions between children in different countries (E) Results of a study on the relationship between hemoglobin and reticulocyte count C Results of a study on the relationship between gender and lung cancer Parametric techniques can be used to analyze data where at least one of the variables is quantitative (interval or ratio) and where the data is distributed normally. If the data is not distributed normally or both variables are qualitative (nominal or ordinal), non-parametric techniques must be used. Gender and lung cancer are both qualitative variables, so nonparametric techniques, such as chi-square, are used to determine the relationship between them. The public health officials of a particular city wish to evaluate the lead levels of its constituents. In order to develop a sample population, they choose every 10th family in the city for the study. This is an example of what kind of population sample? (A) Stratified selected sample (B) Cluster selected sample (C) Simple random sample (D) Systematically selected sample (E) Nonrandom selected sample B Cluster selected sample In cluster selected samples, the population of interested is divided into subunits, such as families, and a random sample of these units is used. In simple random samples, each individual member of a population has an equal probability of being chosen. In stratified selected samples, individuals are chosen randomly from within stratified groups, such as age groups. In systematically selected samples, the population is ordered by some characteristic, such as age, a starting point for selection is randomly selected, and then the remainder of the sample is collected by a predetermined scheme, such as choosing every x number of people. In nonrandom selected samples, some predetermined scheme is used, such as the first x number of people presenting for a certain disease to a clinic. In reporting the results from a clinical study of a new anti-inflammatory drug for the treatment of post-operative pain, the study's authors present data comparing the total days of hospitalization for comparable groups of patients who have received either the investigative anti-inflammatory drug or a placebo. The attached table appears in their report. Which of the following would be a valid interpretation of the data presented in this table? (A) The p-value is greater than 0.05, indicating that there is no true treatment effect upon total days of post-operative hospitalization (B) The treatment group and placebo groups have unequal numbers of participants, and therefore the statistical test results are not interpretable (C) The results are suggestive of a true treatment effect, but the study has limited power to detect the effect due to the relatively small number of study subjects (D) Statistical testing of two group means yields a t-value, not a p-value C The results are suggestive of a true treatment effect, but the study has limited power to detect the effect due to the relatively small number of study subjects While the p-value for the differences between the mean days of post-operative hospitalization is not below the conventional level of 0.05, it is relatively close to that value. The values of the treatment group and placebo group means (3.0 and 4.5 days, respectively) do suggest that there is an effect of treatment. It is likely that the statistical power of the study is rather limited, given the modest number of people enrolled in each group. Ideally, this study would be repeated with larger numbers of study subjects in each of the two groups. While it would be a mistake to conclude that there was definitively a treatment effect, it would also be a mistake to conclude that there was no evidence for a treatment effect, as well. In clinical trials, it is not necessary that the comparison groups have identical numbers of subjects, although there should be a sufficient number of participants in each study group to effectively evaluate the treatment being considered. While statistical testing of two group means may use the t-test, it is possible to derive a p-value from the use of this test. Week 7 USMLE Step 1 Review: Biostatistics, Behavioral Science, and Nutrition, Steven Katz MSIV en.wikipedia.org