fasta

NGS sequencing file formats

Sequencing data comes in a wide variety of formats and they contain very specific information. This is a collection of notes on different formats, and how to interact with some of them using command line tools like samtools, or R. Next Generation Sequencing (NGS) technology in brief: NGS and Sanger sequencing is similar in principle. DNA polymerase adds fluorescent nucleotides to a growing DNA template strand. DNA bases are identified when each base C, T, G, A, emits a fluorescent signal as they are added to a nucleic acid chain.

How to make R and Python scripts, and make them executable

While practicing my R and Python skills, I have written a script in both R and Python that will perform the same task. Namely take a DNA sequence from a fasta file, count it’s length, and the number of G’s, C’s and N’s while taking upper and lowercase format into account. Then it prints a message with the calculation results. I also included a step that will count how many seconds it takes to do the the calculations, so we can compare which script runs fastest.