tidyverse

Read and merge multiple files by folder

Often the data we need is spread out across multiple files, and we need a way to read all the files and merge the content.The goal is to generate a tidy data frame.Tidy data have variables in columns and observations in rows. Here I demonstrate how to gather data from multiple files into a tidy dataset from; 1- A folder with one file pr measurement. 2- A folder where you have one file pr sample with multiple measurements. 3- A folder using regex to select specific files. 4- Show off some superpowers by applying this in creating a function to merge a folder of VCF files into one long data frame.

Plotting bar charts in R, geom_bar vs geom_col

Plotting the Nightingale data made me realize that there are more to plotting a bar chart than first meets the eye. While a histogram visualize the distribution of a numerical variable, a bar plot visualize the relationship between a categorical variable and a numerical variable. ggplot has two functions for plotting bar charts, geom_bar and geom_col. In short, geom_bar() counts the categorical values for you, while geom_col() takes the summarized numerical value as input.

For loop for Multiple Trend in Proportions

When you have only one parameter to test, following my previous tutorial for test for trends in proportions will be sufficient. However, if you have many independent variables to be tested across several dependent variables, it may become quite tedious to do them all one by one. Therefore, I wrote a for-loop that will create all the summary-tables, perform the test for trends in proportions for each table, add the test result to the count matrix and save the output neatly in a csv/excel format. Here I explain each step in the process.

Plotting categorical values as a tiled chart

Plotting your variables as a tiled map, can visualize interactions between them very efficiently. Here is a “How to” plot categorical values as a tiled chart with fixed squares.

Extract tables from pdf files with tabulizer

Far too often i find myself in a situation where I need to fetch lists of genes, expression data or similar from journal articles, only to to realize that the data is only to be found buried somewhere deep within the supplementary in the form of a giant pdf. (The horror!) Here is a how to to scrape data from a linked pdf file (by url) using the tabulizer R package.

How to ‘Pivot Wider’ when you have only character values

Reshaping your data by pivoting from long to wide, or wide to long is used frequently when wrangling your data. The pivot_longer() and pivot_wider() function from the tidyr package does this job excellent in most cases. I have however had some issues when reshaping data containing only characters. This is my solution to this issue.