Fisher-Exact in R

A short guide on performing the Fisher-Exact test and Summarizing a 2x2 table of categorical values.

By Synnøve Yndestad in R Statistics Clinical science

July 10, 2021

Disregarding the problematic side of Fisher, the statistical methods he developed are still very useful. Read any clinical paper, and I guarantee you that a Fisher exact test has been performed.

Fisher-Exact is a statistical test used for 2x2 contingency tables of categorical data. It is particularity useful for small sample sizes where other tests, like the Chi square test would be unsuitable.

Fisher-Exact from a 2x2 table:

First you need to enter your data, and I will use some real life examples.

The research group that I am a part of, have performed a clinical study where patients with triple negative breast cancer have received olaparib mono therapy. Olaparib is a PARP inhibitor, and by inhibiting PARP, cells become dependent on repairing DNA through homologous recombination (HR). If they lack the ability to repair DNA, they will die. Using Fisher exact, we can test if patients that have tumors with HR deficiency responds better to treatment than tumors that can repair DNA by HR. HR deficiency in this context may be caused either by having an inheritable germline HR mutation, by having a mutation in HR genes in the tumor tissue, or epigenetically silenced by methylation of BRCA1 in the tumor tissue.

Using data listed in table two (Subgroup analysis nr 4) we can create a 2x2 table from vectors.

Make two vectors of count data in the desired order like this; c(“Responder,” “Non-Responder”)

HR_deficient = c(16, 4)
HR_proficient = c(2, 10)

Turn the two vectors into a Data Frame, nested inside a transpose t().

aTable = t(data.frame(HR_deficient,HR_proficient))

Have a look at your table

aTable                                     
##               [,1] [,2]
## HR_deficient    16    4
## HR_proficient    2   10

You can change column or row-names on your table using dimnames(). Use [[1]] to specify row-names and [[2]] for column names.

Add/change column-names:
dimnames(aTable)[[2]] = c(“New Column-name A” , “New Column-name B”)
Add/change row-names:
dimnames(aTable)[[1]] = c(“New Row-name A” , “New Row-name B”)

dimnames(aTable)[[2]] = c("Responder", "Non-responder")  #Add/change column-names

Have a look

aTable
##               Responder Non-responder
## HR_deficient         16             4
## HR_proficient         2            10

Run the Fisher exact test by fisher.test() and inspect your result.

fisher.test(aTable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  aTable
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    2.459807 228.297235
## sample estimates:
## odds ratio 
##   17.57049

This test is significant with a p value = 0.0008!
This tells us that the null hypothesis is false, and that there there is a difference in distribution. We can then conclude that patients that have breast cancer tumors with HR-deficiency responds to olaparib treatment.

The fisher.test() is two tailed as default setting.
You can use fisher.test(aTable, alternative = “less”) or fisher.test(aTable, alternative = “greater”) for one-sided test.




Starting from a dataset; Summarize data + Fisher

If you haven`t summarized the 2x2 table yet, it is very handy to directly create the summary table from your raw data.

If you have your data in a tidy excel sheet, or csv file, you can read your data by

MyData <- readxl::read_xlsx("Path_to_your_file/Name_of_your_file.xlsx")
MyData <-read("Path_to_your_file/Name_of_your_file.csv")

But here we will make it from scratch, using the data from supplementary table 4 that was the raw data for the count table.

Where :
CR= Complete response
PR= Partial response
SD= Stable Disease
PD= Progressive disease

This is then dichotomized to responders (CR+PR) vs non-responders (SD+PD)

# Read vectors
Condition <- c("HR deficient", "HR proficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR proficient", "HR proficient", "HR deficient", "HR proficient", "HR deficient", "HR deficient", "HR proficient", "HR deficient","HR deficient", "HR deficient", "HR proficient","HR proficient", "HR proficient", "HR proficient", "HR proficient", "HR proficient", "HR deficient","HR deficient", "HR deficient","HR deficient","HR deficient","HR deficient","HR proficient")
Outcome <- c("Responder", "Non-Responder", "Responder","Responder","Non-Responder", "Responder", "Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Responder", "Non-Responder","Non-Responder", "Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Responder", "Non-Responder")

# Turn vectors into a date frame
MyData <- data.frame(Condition,Outcome)

# Optional:
# Turn characters into factor to set the order in which the different categories are listed in the summary table
MyData$Outcome <- factor(MyData$Outcome, levels = c("Responder","Non-Responder"))

# Print data frame
DT::datatable(MyData, options = list(searching = FALSE))

Now that you have your data in the form of a data frame, you may:

1-Perform Fisher by columns inside the fisher.test() function call
2-Make a summary table and perform Fisher on the table.

1-Perform Fisher by columns Condition vs Outcome

fisher.test(MyData$Condition, MyData$Outcome)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  MyData$Condition and MyData$Outcome
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    2.459807 228.297235
## sample estimates:
## odds ratio 
##   17.57049

2-Make a summary table

Summarize categorical data using the table() function:

anotherTable = table(MyData)
anotherTable
##                Outcome
## Condition       Responder Non-Responder
##   HR deficient         16             4
##   HR proficient         2            10

Then Perform Fisher using the summary table

fisher.test(anotherTable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  anotherTable
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    2.459807 228.297235
## sample estimates:
## odds ratio 
##   17.57049

Does the two methods give the same result?

## Fetch value in the first list = p-value
PvalA = fisher.test(MyData$Condition, MyData$Outcome)[[1]]    
PvalB = fisher.test(anotherTable)[[1]] 

PvalA == PvalB
## [1] TRUE

Yes they do!

To get details on data and method used, i.e alternative = “two.sided,” assign the test results to an object and inspect the list elements in the R studio global environment.

getDetails = fisher.test(anotherTable)

or just type

unlist(fisher.test(anotherTable))
##                              p.value                            conf.int1 
##               "0.000789927616836742"                   "2.45980654095026" 
##                            conf.int2                  estimate.odds ratio 
##                   "228.297235459106"                   "17.5704859283499" 
##                null.value.odds ratio                          alternative 
##                                  "1"                          "two.sided" 
##                               method                            data.name 
## "Fisher's Exact Test for Count Data"                       "anotherTable"

Type ??fisher.test for more details on the method.

Posted on:
July 10, 2021
Length:
5 minute read, 1026 words
Categories:
R Statistics Clinical science
Tags:
Categoricals Non-parametric fisher.test() Crosstab 2x2-table
See Also:
Plotting bar charts in R, geom_bar vs geom_col
For loop for Multiple Trend in Proportions
How to 'Pivot Wider' when you have only character values