count data designed for use with differential expression
1
and differential exon usage tools
2
, as well as
individual-sample and/or group-summary genome track files suitable for use with the UCSC genome
browser (or any compatible browser).
The QoRTs package is composed of two parts: a java jar-file (for data processing) and a companion R
package (for generating tables, figures, and plots). The java utility is written in the Scala programming
language (v2.11.1), however, it has been compiled to java byte-code and does not require an installation
of Scala (or any other external libraries) in order to function. The entire QoRTs toolkit can be used in
almost any operating system that supports java and R.
The most recent release of QoRTs is available on the QoRTs github page.
The latest version of this walkthrough is available online, along with a full example dataset (file is
400mb) with example bam files (file is 2gb).
2 Using this Walkthrough
This walkthrough demonstrates how the use of this pipeline on one particular example dataset. How-
ever, many of the scripts and commands used here could be used on any RNA-Seq dataset with minimal
modification. File locations will have to be modified, as well as the path to the java jar-file (in the
example scripts, it is softwareRelease/QoRTs.jar).
Additionally, in this example walkthrough all commands are carried out in series. In actual use,
it is generally recommended that separate runs be executed in separate threads simultaniously, or, if
available, separate jobs run on a cluster job-queuing engine (such as SGE). Example SGE scripts are
provided for this purpose.
3 Requirements
Hardware: The QoRTs [5] java utility does the bulk of the data processing, and will generally require
at least 4gb of RAM. In general at least 8gb is recommended, if available. The QoRTs R package is
only responsible for some light data processing and for plotting/visualization, and thus has much lower
resource requirements. It should run adequately on any reasonably-powerful workstation. In general,
it is preferable to run bioinformatic analysis on a dedicated linux-based cluster running some sort of
job-queuing engine.
Software: The QoRTs software package requires R version 3.0.2 or higher, as well as java 6 or higher.
It does not require any other software. Some of the other software packages used in this walkthrough
may have their own individual dependencies, which may be subject to change over time.
Annotation: QoRTs requires transcript annotations in the form of a gtf file. If you are using a
annotation guided aligner (which is STRONGLY recommended) it is likely you already have a transcript
gtf file for your reference genome. We recommend you use the same annotation gtf for alignment, QC,
and downstream analysis. We have found the Ensembl ”Gene Sets” gtf
3
suitable for these purposes.
However, any format that adheres to the gtf file specification
4
will work.
Additional QC metrics can be generated with the use of additional (optional) annotation files. To
generate reference mismatch rates you will also need to supply a genome fasta file. In order to generate
1
Such as DESeq, DESeq2 [1] or edgeR [8]
2
Such as DEXSeq [2] or JunctionSeq
3
Which can be acquired from the Ensembl website at http://www.ensembl.org
4
See the gtf file specification here
3