Have you ever wondered how scientists investigate human disease at a molecular level? Transcriptome profiling is an important insight into the expression of genes within a cell and is a common methodology used in cancer and other biomedical research. RNA sequencing (RNA-Seq) is a method that investigates transcriptional activity and can be used to discover changes in expression which may be associated with disease and cancer. But how exactly is RNA-Seq done? By the end of this article, you will have covered the basics of RNA sequencing giving you an understanding of the process from the lab all the way to the bioinformatic analysis.
What is RNA?
To understand what RNA-Seq is, it’s best to fully understand the type of nucleic acid we’re working with first. Ribonucleic Acid (RNA) is a type of nucleic acid that is made from the nucleotide bases, adenine, guanine, cytosine, and uracil. It is formed from DNA in a process called transcription and translated into to a protein in a process called translation. RNA is present in all human cells in varying amounts and can be sequenced to measure the expression of genes unlocking key insights into a cells biological meaning. Earlier methods for sequencing RNA molecules consist of Sanger sequencing and microarray assay however, both methods have now been preceded by RNA sequencing due to its high- throughput and sensitive process providing more accurate results.
How its Done..
In the lab…
Many types of RNA can be sequenced including mRNA, pre-mRNA, rRNA, microRNA and long-ncRNA. Total RNA will contain all types of RNA found in the cell, making it important to select an appropriate preparation protocol for the type you want to sequence. Ribosomal RNA or rRNA makes up the bulk (95%) of all RNA found in the cell therefore removing rRNA allows the detection of other less abundant RNA molecules. Removal of rRNA can be done using commercially available kits or selecting for polyadenylated RNA which includes all mRNA. Once the required RNA is collected from the sample it then needs to be converted into cDNA. This is done using reverse transcription and may require fragmentation of the RNA before or after conversion into cDNA. This is dependent on the limit of the fragment size for the sequencing platform used. Finally, the RNA library is amplified using PCR (polymerase chain reaction) before being sequenced using next generation sequencing (NGS). Multiplex sequencing is an efficient way to sequence multiple libraries at the same time by the addition of a unique barcode to each read (these are the sequences from the sample).
On the Computer…
Once the data has been collected the reads will need to be mapped to a reference genome or transcriptome and then quantified by counts per gene. There are many command line tools that enable this analysis with the current gold standards being the use of RSEM and STAR tools but many alternatives such as Bowtie2 are available. Quality checks are carried out as part of pre-processing to ensure the reads are of high quality, this is usually done using the command line tool FASTQC. Following this, the downstream analysis can begin using R packages such as DESeq2. This package can be useful for normalisation, principal component analysis (PCA) and differential gene expression analysis. Creating a PCA plot is an effective way to visualise grouping among samples while a differential expression analysis shows the varying expression levels if any among the groups. Upon completion of the transcriptomic analysis, you will be able to infer key findings and conclusions from your results.
Conclusion
In summary RNA-Seq is an efficient and high-throughput method for gene expression analysis and for investigating the molecular function of the cell. Previous methods lack of sensitivity have made RNA-Seq the preferred method. Emerging sequencing techniques such as single cell RNA-Seq (scRNA-Seq) are the future of RNA-Seq, providing an even clearer picture of gene expression looking at a single cell rather than the bulk population of cells in a sample. RNA-Seq continues to provide crucial information in human disease and as it continuous to evolve, it remains an indispensable tool to unravel the complexities of human disease.
To read more on this topic Click Here.
References
Koch, C.M. et al. (2018) ‘A Beginner’s Guide to Analysis of RNA Sequencing Data’, American Journal of Respiratory Cell and Molecular Biology, 59(2), pp. 145–157. Available at: https://doi.org/10.1165/rcmb.2017-0430TR.
Owens, N.D.L., Domenico, E.D. and Gilchrist, M.J. (2019) ‘An RNA-Seq Protocol for Differential Expression Analysis’, Cold Spring Harbor Protocols, 2019(6), p. pdb.prot098368. Available at: https://doi.org/10.1101/pdb.prot098368.
Wang, D. and Farhana, A. (2024) ‘Biochemistry, RNA Structure’, in StatPearls. Treasure Island (FL): StatPearls Publishing. Available at: http://www.ncbi.nlm.nih.gov/books/NBK558999/ (Accessed: 7 August 2024).