VCF

Read and merge multiple files by folder

Often the data we need is spread out across multiple files, and we need a way to read all the files and merge the content.The goal is to generate a tidy data frame.Tidy data have variables in columns and observations in rows. Here I demonstrate how to gather data from multiple files into a tidy dataset from; 1- A folder with one file pr measurement. 2- A folder where you have one file pr sample with multiple measurements. 3- A folder using regex to select specific files. 4- Show off some superpowers by applying this in creating a function to merge a folder of VCF files into one long data frame.

NGS sequencing file formats

Sequencing data comes in a wide variety of formats and they contain very specific information. This is a collection of notes on different formats, and how to interact with some of them using command line tools like samtools, or R. Next Generation Sequencing (NGS) technology in brief: NGS and Sanger sequencing is similar in principle. DNA polymerase adds fluorescent nucleotides to a growing DNA template strand. DNA bases are identified when each base C, T, G, A, emits a fluorescent signal as they are added to a nucleic acid chain.