Statistics

Calculating the positive predictive value of a diagnostic test

Will a diagnostic test have the same predictive value regardless of how it is used? My intuitive answer is “It should”, but hey, this is why we do statistics before implementing large screening programs. The predictive value of a test is different when the test is used in a high-risk population compared to when it is used in a low-risk population. This means that the positive predictive value of a test differs if the test is used as a screening test versus when it is used as a confirmatory test, or used in two populations with different prevalence. Sensitivity, specificity and accuracy are important principles to consider when evaluating how and when diagnostic tests should be used, such as the mammography screening program, or when evaluating the differences in policies regarding COVID testing during the early and late stages of the pandemic.

Plot coxcomb diagrams like Florence Nightingale

Florence Nightingale (1820-1910) is best known as a pioneer in modern nursing, but she was also a pioneer in statistics and the use of statistical graphics in data analysis. In her work during the Crimean War, she tended to the wounded soldiers in the hospitals and helped to improve the conditions in which they were treated. She collected data on patients and their outcomes, and used a coxcomb diagram to visually display the causes of death in soldiers. Nightingales coxcomb plot “Diagram of the Causes of Mortality in the Army in the East” illustrated that the main cause of death among the British troops in the Crimean War was preventable disease rather than injuries from fighting. The plot also shows that the death rate decreased when a Sanitary Commissioner arrived to aid in improving hygiene and sanitation. The coxcomb plot was later used by Nightingale to lobby for improved sanitation and hygiene in hospitals. This eventually led to a reduction in the death rate from disease in hospitals. She was a firm believer that statistical data presented as charts and diagrams is a powerful tool to make complex data more understandable. It help people see relationships between data and enables us to make informed decisions. I wanted to recreate Nightingales historical plot using R, and at the same time give a tutorial on “How to” make a coxcomb/polar-area plot/rose diagram

For loop for Multiple Trend in Proportions

When you have only one parameter to test, following my previous tutorial for test for trends in proportions will be sufficient. However, if you have many independent variables to be tested across several dependent variables, it may become quite tedious to do them all one by one. Therefore, I wrote a for-loop that will create all the summary-tables, perform the test for trends in proportions for each table, add the test result to the count matrix and save the output neatly in a csv/excel format. Here I explain each step in the process.

Calculate Z-Score and plot heatmaps

Z-score is a measure for how values deviates from the mean in a given population. Calculating z-score is a handy way to standardize, or normalize data. This kind of normalization is frequently used in gene expression studies to visualize heat maps of differential expressed genes.

Test for trend in proportions

The test for trends in proportions is also known as the Cochran Armitage test. It performs Chi-squared test for trend in proportions and is used to test whether there is a difference between groups considering the size of the groups. It takes count data from contingency tables where you have one nominal variable with two levels (i.e “Mutated”, “Wild-type”) and the other variable is an ordinal value with minimum 3 values where the variables is naturally ranked

Chi-square in R

The Chi-square test is used to compare differences between two or more categorical variables. All variables must be ordinal or nominal and summarized as a frequency table. It is a non-parametric test, meaning that it is suitable also for data that is not normally distributed. Some of the assumptions for performing a Chi-square test are: Each observation is independent of all the others (one observation per subject), and the categories must be mutually exclusive so that a subject fits into only one of the categories.

Fisher-Exact in R

Disregarding the problematic side of Fisher, the statistical methods he developed are still very useful. Read any clinical paper, and I guarantee you that a Fisher exact test has been performed. Fisher-Exact is a statistical test used for 2x2 contingency tables of categorical data. It is particularity useful for small sample sizes where other tests, like the Chi square test would be unsuitable. Fisher-Exact from a 2x2 table: First you need to enter your data, and I will use some real life examples.