1 Introduction
The gmapR packages provides users with a way to access GSNAP, bam_tally, and other utilities from the GMAP
suite of tools within an R session. In this vignette, we briefly look at the GMAP suite of tools available through the
gmapR package and work through an example.
2 What is GMAP, GSNAP, and bam_tally?
The GMAP suite offers useful tools for the following:
• Genomic mapping: Given a cDNA, find where it best aligns to an entire genome
• Genomic alignment: Given a cDNA and a genomic segment, provide a nucleotide-level correspondence for the
exons of the cDNA to the genomic segment
• Summarization via coverage plus reference and allele nucleotide polymorphism counts for an aligned set of
sequencing reads over a given genomic location
GMAP (Genomic Mapping and Alignment Program) is particularly suited to relatively long mRNA and EST
sequences such as those that are obtained from Roche 454 or Pacific Biosciences sequencing technologies. (At present,
only GSNAP is available through the gmapR. GMAP integration is scheduled for the near future.)
GSNAP (Genomic Short-read Nucleotide Alignment Program) also provides users with genomic mapping and
alignment capabilities, but is optimized to handle issues that arise when dealing with the alignment of short reads
generated from sequencing technologies such as those from Illumina/Solexa or ABI/SOLiD. GSNAP offers the fol-
lowing functionality, as mentioned in Fast and SNP-tolerant detection of complex variants and splicing in short reads
by Thomas D. Wu and Serban Nacu:
• fast detection of complex variants and splicing in short reads, based on a successively constrained search process
of merging and filtering position lists from a genomic index
• alignment of both single- and paired-end reads as short as 14 nt and of arbitrarily long length
• detection of short- and long-distance splicing, including interchromosomal splicing, in individual reads, using
probabilistic models or a database of known splice sites
• SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles
• alignment of reads from bisulfite-treated DNA for the study of methylation state
bam_tally provides users with the coverage as well as counts of both reference alleles, alternative alleles, and
indels over a genomic region.
For more detailed information on the GMAP suite of tools including a detailed explication of algorithmic specifics,
see http://research-pub.gene.com/gmap/.
3 Create a GmapGenome Object
To align reads with GMAP or GSNAP, or to use bam_tally, you will need to either obtain or create a GmapGenome
object. GmapGenome objects can be created from FASTA files or BSgenome objects, as the following example demon-
strates:
2