Comparative ncRNA Gene and Structure Prediction Using Foldalign and FoldalignM

互联网2013-12-31

698

Abstract
Table of Contents
Figures
Literature Cited

Abstract

This unit describes how to use Foldalign and FoldalignM to make structural alignments of non?protein?coding?RNA (ncRNA). These tools can be used to find new ncRNAs, to find the structure of novel ncRNAs, and to improve alignments for known ncRNAs. Curr. Protoc. Bioinform. 39:12.11.1?12.11.15. © 2012 by John Wiley & Sons, Inc.

Keywords: ncRNA; RNA structural alignment; RNA structure prediction; RNA multiple sequence alignment; ncRNA gene finding; non?coding RNA

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Searching for New RNA Structures Using Foldalign
Basic Protocol 2: Pairwise Global Structural Alignment Using Foldalign
Alternate Protocol 1: Pairwise Local Structural Alignment Using the Foldalign Web Server
Basic Protocol 3: Multiple Alignment Using FoldalignM and the Foldalign Scoring Method
Alternate Protocol 2: Multiple Alignment Using FoldalignM and the McCaskill Scoring Method
Basic Protocol 4: AlignToStructure
Support Protocol 1: Installing Foldalign
Support Protocol 2: Installing FoldalignM
Guidelines for Understanding Results
Commentary
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 12.11.1 The summary section of the Foldalign output. This section of the output summarizes the information about the best alignment found. Here, the score and sequence identity of the alignment can be seen. It is also possible to see the alignment and the common structure. The structure is drawn using the normal dot‐bracket RNA secondary structure annotation. In this annotation, unpaired nucleotides are indicated by a dot “.” and base‐paired nucleotides by matching parentheses, “(“ and “)”. Foldalign uses matching angle brackets, “<” and “>”, to indicate base‐pairing nucleotides in one of the sequences aligned to gaps in the other sequence. Other parts of the output show the parameters used by Foldalign, making it always possible to see how a given alignment was made. For a more detailed description of the output, see the Documentation part of the Foldalign Web site (http://foldalign.ku.dk).

View Image

Figure 12.11.2 The first line contains the information used to calculate the significance of the alignment. This line is usually too long to be shown on one line in the display and is therefore most often wrapped into several lines as in this example. The following lines contain information about the alignments. The first fields are the name, start, and stop coordinates of the first sequence, followed by the same for the second sequence. Then comes the alignment score, Z‐score, P‐value, and alignment rank. If the P‐value of an alignment is significant, then there is an asterisk ~undefined) at the end of the alignment line. The significance level is set using the option ‐p=<level>. The P‐value is the probability of observing an alignment with this score by chance between those two sequences. It is calculated using the extreme value distribution, as in BLAST. It is highly recommended to ignore the Z‐score, since it is calculated assuming that the scores have a Gauss distribution, but it is known that the scores have an extreme value distribution.

View Image

Figure 12.11.3 Multiple alignment using FoldalignM. The first line shows the aligned sequence; the second line contains the predicted common structure in dot‐bracket notation. The alignment score can be seen on the bottom line.

View Image

Figure 12.11.4 Cluster output file (output.cluster.info) from FoldalignM describing the total number of clusters created and number of members in each cluster. In addition to this, a multiple sequence alignment (see Figure ) for each cluster is written in the default output directory.

View Image

Videos

Literature Cited

	Gardner, P., Wilm, A., and Washietl, S. 2005. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33:2433‐2439.
	Gorodkin, J., Heyer, L., and Stormo, G. 1997a. Finding common sequence and structure motifs in a set of RNA sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5:120‐123.
	Gorodkin, J., Heyer, L., and Stormo, G. 1997b. Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. 25:3724‐3732.
	Gorodkin, J., Lyngso, R., and Stormo, G. 2001a. A mini‐greedy algorithm for faster structural RNA stem‐loop search. Genome Inform. Ser. Workshop Genome Inform. 12:184‐193.
	Gorodkin, J., Stricklin, S., and Stormo, G. 2001b. Discovering common stem‐loop motifs in unaligned RNA sequences. Nucleic Acids Res. 29:2135‐2144.
	Havgaard, J., Lyngso, R., Stormo, G., and Gorodkin, J. 2005a. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40. Bioinformatics 21:1815‐1824.
	Havgaard, J., Lyngsø, R., and Gorodkin, J. 2005b. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucleic Acids Res. 33:W650‐W653.
	Havgaard, J., Torarinsson, E., and Gorodkin, J. 2007. Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput. Biol. 3:1896‐1908.
	Klein, R. and Eddy, S. 2003. RSEARCH: Finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44.
	Mathews, D., Sabina, J., Zuker, M., and Turner, D. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288:911‐940.
	McCaskill, J. 1990. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105‐1119.
	Sankoff, D. 1985. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45:810‐825.
	Torarinsson, E., Sawera, M., Havgaard, J., Fredholm, M., and Gorodkin, J. 2006. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res. 16:885‐889.
	Torarinsson, E., Havgaard, J., and Gorodkin, J. 2007. Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23:926‐932.
Key References
	Havgaard et al., 2007. See above.
	Describes the latest implementation of the Foldalign algorithm.
	Havgaard et al., 2005a. See above.
	A more detailed description of the energy‐ and substitution‐ models used by Foldalign.
	Havgaard et al., 2005b. See above.
	Describes the Foldalign Web server.
	Torarinsson et al., 2006. See above.
	A large‐scale search for novel ncRNA structures in human and mouse using Foldalign.
	Torarinsson et al., 2007. See above.
	Describes the FoldalignM algorithm.
	Torarinsson, E. and Lindgreen, S. 2008. WAR: Webserver for aligning structural RNAs. Nucleic Acids Res. 36:W79‐W84.
	Describes the Web server for aligning structural RNAs, which includes FoldalignM as well as other methods.
Internet Resources
	http://foldalign.ku.dk
	The Foldalign Web site, the Foldalign Web server, and extra documentation, as well as the source code for Foldalign and FoldalignM can be found here.
	http://genome.ku.dk/resources/war/
	The Web server for aligning structural RNAs, which among many other methods includes FoldalignM.