Fisher-Exact in R
A short guide on performing the Fisher-Exact test and Summarizing a 2x2 table of categorical values.
By Synnøve Yndestad in R Statistics Clinical science
July 10, 2021
Disregarding the problematic side of Fisher, the statistical methods he developed are still very useful. Read any clinical paper, and I guarantee you that a Fisher exact test has been performed.
Fisher-Exact is a statistical test used for 2x2 contingency tables of categorical data. It is particularity useful for small sample sizes where other tests, like the Chi square test would be unsuitable.
Fisher-Exact from a 2x2 table:
First you need to enter your data, and I will use some real life examples.
The research group that I am a part of, have performed a clinical study where patients with triple negative breast cancer have received olaparib mono therapy. Olaparib is a PARP inhibitor, and by inhibiting PARP, cells become dependent on repairing DNA through homologous recombination (HR). If they lack the ability to repair DNA, they will die. Using Fisher exact, we can test if patients that have tumors with HR deficiency responds better to treatment than tumors that can repair DNA by HR. HR deficiency in this context may be caused either by having an inheritable germline HR mutation, by having a mutation in HR genes in the tumor tissue, or epigenetically silenced by methylation of BRCA1 in the tumor tissue.
Using data listed in table two (Subgroup analysis nr 4) we can create a 2x2 table from vectors.
Make two vectors of count data in the desired order like this; c(“Responder,” “Non-Responder”)
HR_deficient = c(16, 4)
HR_proficient = c(2, 10)
Turn the two vectors into a Data Frame, nested inside a transpose t().
aTable = t(data.frame(HR_deficient,HR_proficient))
Have a look at your table
aTable
## [,1] [,2]
## HR_deficient 16 4
## HR_proficient 2 10
You can change column or row-names on your table using dimnames(). Use [[1]] to specify row-names and [[2]] for column names.
Add/change column-names:
dimnames(aTable)[[2]] = c(“New Column-name A” , “New Column-name B”)
Add/change row-names:
dimnames(aTable)[[1]] = c(“New Row-name A” , “New Row-name B”)
dimnames(aTable)[[2]] = c("Responder", "Non-responder") #Add/change column-names
Have a look
aTable
## Responder Non-responder
## HR_deficient 16 4
## HR_proficient 2 10
Run the Fisher exact test by fisher.test() and inspect your result.
fisher.test(aTable)
##
## Fisher's Exact Test for Count Data
##
## data: aTable
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.459807 228.297235
## sample estimates:
## odds ratio
## 17.57049
This test is significant with a p value = 0.0008!
This tells us that the null hypothesis is false, and that there there is a difference in distribution. We can then conclude that patients that have breast cancer tumors with HR-deficiency responds to olaparib treatment.
The fisher.test() is two tailed as default setting.
You can use fisher.test(aTable, alternative = “less”) or fisher.test(aTable, alternative = “greater”) for one-sided test.
Starting from a dataset; Summarize data + Fisher
If you haven`t summarized the 2x2 table yet, it is very handy to directly create the summary table from your raw data.
If you have your data in a tidy excel sheet, or csv file, you can read your data by
MyData <- readxl::read_xlsx("Path_to_your_file/Name_of_your_file.xlsx")
MyData <-read("Path_to_your_file/Name_of_your_file.csv")
But here we will make it from scratch, using the data from supplementary table 4 that was the raw data for the count table.
Where :
CR= Complete response
PR= Partial response
SD= Stable Disease
PD= Progressive disease
This is then dichotomized to responders (CR+PR) vs non-responders (SD+PD)
# Read vectors
Condition <- c("HR deficient", "HR proficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR deficient", "HR proficient", "HR proficient", "HR deficient", "HR proficient", "HR deficient", "HR deficient", "HR proficient", "HR deficient","HR deficient", "HR deficient", "HR proficient","HR proficient", "HR proficient", "HR proficient", "HR proficient", "HR proficient", "HR deficient","HR deficient", "HR deficient","HR deficient","HR deficient","HR deficient","HR proficient")
Outcome <- c("Responder", "Non-Responder", "Responder","Responder","Non-Responder", "Responder", "Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Responder", "Non-Responder","Non-Responder", "Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Non-Responder", "Responder", "Responder", "Non-Responder", "Responder", "Responder", "Responder", "Non-Responder")
# Turn vectors into a date frame
MyData <- data.frame(Condition,Outcome)
# Optional:
# Turn characters into factor to set the order in which the different categories are listed in the summary table
MyData$Outcome <- factor(MyData$Outcome, levels = c("Responder","Non-Responder"))
# Print data frame
DT::datatable(MyData, options = list(searching = FALSE))
Now that you have your data in the form of a data frame, you may:
1-Perform Fisher by columns inside the fisher.test() function call
2-Make a summary table and perform Fisher on the table.
1-Perform Fisher by columns Condition vs Outcome
fisher.test(MyData$Condition, MyData$Outcome)
##
## Fisher's Exact Test for Count Data
##
## data: MyData$Condition and MyData$Outcome
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.459807 228.297235
## sample estimates:
## odds ratio
## 17.57049
2-Make a summary table
Summarize categorical data using the table() function:
anotherTable = table(MyData)
anotherTable
## Outcome
## Condition Responder Non-Responder
## HR deficient 16 4
## HR proficient 2 10
Then Perform Fisher using the summary table
fisher.test(anotherTable)
##
## Fisher's Exact Test for Count Data
##
## data: anotherTable
## p-value = 0.0007899
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.459807 228.297235
## sample estimates:
## odds ratio
## 17.57049
Does the two methods give the same result?
## Fetch value in the first list = p-value
PvalA = fisher.test(MyData$Condition, MyData$Outcome)[[1]]
PvalB = fisher.test(anotherTable)[[1]]
PvalA == PvalB
## [1] TRUE
Yes they do!
To get details on data and method used, i.e alternative = “two.sided,” assign the test results to an object and inspect the list elements in the R studio global environment.
getDetails = fisher.test(anotherTable)
or just type
unlist(fisher.test(anotherTable))
## p.value conf.int1
## "0.000789927616836742" "2.45980654095026"
## conf.int2 estimate.odds ratio
## "228.297235459106" "17.5704859283499"
## null.value.odds ratio alternative
## "1" "two.sided"
## method data.name
## "Fisher's Exact Test for Count Data" "anotherTable"
Type ??fisher.test for more details on the method.
- Posted on:
- July 10, 2021
- Length:
- 5 minute read, 1026 words
- Categories:
- R Statistics Clinical science