Submitting a Sequence to GenBank

互联网2013-12-31

1262

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

In the post?genomic era, more and more research projects involve the generation of molecular sequence data. How should these newly obtained DNA/protein sequences be analyzed, and how should they be prepared for submission to sequence databases? In this unit, we provide guidelines and a flow chart to help first?time users process new sequence data using third?party freeware programs and give a step?by?step demonstration on the preparation of a sequence file for submission to GenBank using the Sequin program. Curr. Protoc. Essential Lab. Tech. 1:11.2.1?11.2.20. © 2009 by John Wiley & Sons, Inc.

Keywords: Sequin; BankIt; bioinformatics; BLAST; BioEdit; Jalview; coding sequence

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Overview and Principles
Strategic Planning
Basic Protocol 1: Submitting a Novel Sequence to GenBank
Basic Protocol 2: Updating an Existing GenBank Record
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Submitting a Novel Sequence to GenBank

Materials

Computer running Sequin (see )

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 11.2.1 Example of a sequence file written in the GenBank format.

View Image

Figure 11.2.2 Nucleotide sequence alignment of a genomic sequence (top) with its corresponding cDNA sequence (bottom) created using Jalview. An intron bounded by the spliceosomal intron sequence (GT… AG) can be readily spotted on this alignment. The intron spans positions 1487 to 1516 of the genomic sequence.

View Image

Figure 11.2.3 Beginning of a chromatogram generated by Sanger's sequencing method. The vector and the cloned sequences are separated by an Eco RI site, indicated by a divider. In addition to the “N” bases, there are also miscalled bases, such as an additional G and a missed A, both indicated by arrows.

View Image

Figure 11.2.4 Graphic display of a Mega BLAST search of the nucleotide sequence AY463803 against GenBank. The only positive match found was the AY463803 sequence itself. Mega BLAST is an optimized BLASTN protocol for searching for highly similar nucleotide sequences in the database.

View Image

Figure 11.2.5 Graphic display of a BLASTX search of the nucleotide sequence AY463803 against GenBank. Many strong matches were detected (E‐values <10^–20 ) and indicate that this nucleotide sequence encodes for a protein belonging to the casein kinase protein family.

View Image

Figure 11.2.6 Alignment result of a BLASTX search of the nucleotide sequence AY463803 against GenBank. The two arrows indicate translational frameshift and hence the presence of intron(s).

View Image

Figure 11.2.7 Flow chart of processes involved in submitting sequences to GenBank.

View Image
Figure 11.2.8 Sequin's program and Help windows.

View Image
Figure 11.2.9 Sequin's information window.

View Image
Figure 11.2.10 Sequin's Sequence Format window.

View Image
Figure 11.2.11 Sequin's Organism and Sequences window.

View Image
Figure 11.2.12 Sequin's Organism Editor .

View Image

Figure 11.2.13 Correcting a feature in Sequin. Clicking on a feature opens a window that allows for feature correction.

View Image

Figure 11.2.14 Sequin's sequence editing window.

View Image
Figure 11.2.15 Sequin's gene information window.

View Image

Figure 11.2.16 New gene window. The newly added gene is shown as a black line below the sequence.

View Image

Figure 11.2.17 Coding sequence editing. Coding sequences are displayed on both the sequence editing window and the main editing window.

View Image

Figure 11.2.18 Examples of sequence validation errors.

View Image

Figure 11.2.19 Web page showing a sequence from the GenBank database. This screenshot shows the nucleotide sequence with accession number L39906.

View Image

Figure 11.2.20 Sequin's Network Configuration window.

View Image
Figure 11.2.21 Sequin's program window with direct sequence download function enabled.

View Image
Figure 11.2.22 Sequin's direct sequence download window.

View Image

Videos

Literature Cited

	Baxevanis, A.D. 2004. An overview of gene identification: Approaches, strategies, and considerations. Curr. Protoc. Bioinform. 6:4.1.1‐4.1.9.
	Benson, D.A., Karsch‐Mizrachi, I., Lipman, D.J., Ostell, J., and Sayers, E.W. 2009. GenBank. Nucleic Acids Res. 37:D26‐D31.
	Clamp, M., Cuff, J., Searle, S.M., and Barton, G.J. 2004. The Jalview Java alignment editor Bioinformatics 20:426‐427.
	Hall, T.A. 1999. BioEdit: A user‐friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95‐98.
	Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948.
	Maddison, W.P. and Maddison, D.R. 2009. Mesquite: A modular system for evolutionary analysis. Version 2.6. http://mesquiteproject.org.
	Marchler‐Bauer, A., Anderson, J.B., Derbyshire, M.K., DeWeese‐Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Krylov, D., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Thanki, N., Yamashita, R.A., Yin, J.J., Zhang, D., and Bryant, S.H. 2007. CDD: A conserved domain database for interactive domain family analysis. Nucleic Acids Res.. 35:D237‐D240.
Internet Resources
	http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
	Conserved domain database (CDD; Marchler‐Bauer et al., ). Search for conserved motifs on your protein sequences.
	http://www.ncbi.nlm.nih.gov/
	National Center for Biotechnology Information (NCBI) homepage. Gateway linked to numerous useful resources, such as GenBank, BLAST, and Entrez, among others.
	http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm
	Sequin quick guide. Get detailed information on how to use Sequin.