Submitting a Sequence to GenBank
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
In the post?genomic era, more and more research projects involve the generation of molecular sequence data. How should these newly obtained DNA/protein sequences be analyzed, and how should they be prepared for submission to sequence databases? In this unit, we provide guidelines and a flow chart to help first?time users process new sequence data using third?party freeware programs and give a step?by?step demonstration on the preparation of a sequence file for submission to GenBank using the Sequin program. Curr. Protoc. Essential Lab. Tech. 1:11.2.1?11.2.20. © 2009 by John Wiley & Sons, Inc.
Keywords: Sequin; BankIt; bioinformatics; BLAST; BioEdit; Jalview; coding sequence
Table of Contents
- Overview and Principles
- Strategic Planning
- Basic Protocol 1: Submitting a Novel Sequence to GenBank
- Basic Protocol 2: Updating an Existing GenBank Record
- Literature Cited
- Figures
Materials
Basic Protocol 1: Submitting a Novel Sequence to GenBank
Materials
|
Figures
-
Figure 11.2.1 Example of a sequence file written in the GenBank format. View Image -
Figure 11.2.2 Nucleotide sequence alignment of a genomic sequence (top) with its corresponding cDNA sequence (bottom) created using Jalview. An intron bounded by the spliceosomal intron sequence (GT… AG) can be readily spotted on this alignment. The intron spans positions 1487 to 1516 of the genomic sequence. View Image -
Figure 11.2.3 Beginning of a chromatogram generated by Sanger's sequencing method. The vector and the cloned sequences are separated by an Eco RI site, indicated by a divider. In addition to the “N” bases, there are also miscalled bases, such as an additional G and a missed A, both indicated by arrows. View Image -
Figure 11.2.4 Graphic display of a Mega BLAST search of the nucleotide sequence AY463803 against GenBank. The only positive match found was the AY463803 sequence itself. Mega BLAST is an optimized BLASTN protocol for searching for highly similar nucleotide sequences in the database. View Image -
Figure 11.2.5 Graphic display of a BLASTX search of the nucleotide sequence AY463803 against GenBank. Many strong matches were detected (E‐values <10–20 ) and indicate that this nucleotide sequence encodes for a protein belonging to the casein kinase protein family. View Image -
Figure 11.2.6 Alignment result of a BLASTX search of the nucleotide sequence AY463803 against GenBank. The two arrows indicate translational frameshift and hence the presence of intron(s). View Image -
Figure 11.2.7 Flow chart of processes involved in submitting sequences to GenBank. View Image -
Figure 11.2.8 Sequin's program and Help windows. View Image -
Figure 11.2.9 Sequin's information window. View Image -
Figure 11.2.10 Sequin's Sequence Format window. View Image -
Figure 11.2.11 Sequin's Organism and Sequences window. View Image -
Figure 11.2.12 Sequin's Organism Editor . View Image -
Figure 11.2.13 Correcting a feature in Sequin. Clicking on a feature opens a window that allows for feature correction. View Image -
Figure 11.2.14 Sequin's sequence editing window. View Image -
Figure 11.2.15 Sequin's gene information window. View Image -
Figure 11.2.16 New gene window. The newly added gene is shown as a black line below the sequence. View Image -
Figure 11.2.17 Coding sequence editing. Coding sequences are displayed on both the sequence editing window and the main editing window. View Image -
Figure 11.2.18 Examples of sequence validation errors. View Image -
Figure 11.2.19 Web page showing a sequence from the GenBank database. This screenshot shows the nucleotide sequence with accession number L39906. View Image -
Figure 11.2.20 Sequin's Network Configuration window. View Image -
Figure 11.2.21 Sequin's program window with direct sequence download function enabled. View Image -
Figure 11.2.22 Sequin's direct sequence download window. View Image
Videos
Literature Cited
Baxevanis, A.D. 2004. An overview of gene identification: Approaches, strategies, and considerations. Curr. Protoc. Bioinform. 6:4.1.1‐4.1.9. | |
Benson, D.A., Karsch‐Mizrachi, I., Lipman, D.J., Ostell, J., and Sayers, E.W. 2009. GenBank. Nucleic Acids Res. 37:D26‐D31. | |
Clamp, M., Cuff, J., Searle, S.M., and Barton, G.J. 2004. The Jalview Java alignment editor Bioinformatics 20:426‐427. | |
Hall, T.A. 1999. BioEdit: A user‐friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95‐98. | |
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948. | |
Maddison, W.P. and Maddison, D.R. 2009. Mesquite: A modular system for evolutionary analysis. Version 2.6. http://mesquiteproject.org. | |
Marchler‐Bauer, A., Anderson, J.B., Derbyshire, M.K., DeWeese‐Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Krylov, D., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Thanki, N., Yamashita, R.A., Yin, J.J., Zhang, D., and Bryant, S.H. 2007. CDD: A conserved domain database for interactive domain family analysis. Nucleic Acids Res.. 35:D237‐D240. | |
Internet Resources | |
http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml | |
Conserved domain database (CDD; Marchler‐Bauer et al., ). Search for conserved motifs on your protein sequences. | |
http://www.ncbi.nlm.nih.gov/ | |
National Center for Biotechnology Information (NCBI) homepage. Gateway linked to numerous useful resources, such as GenBank, BLAST, and Entrez, among others. | |
http://www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequin.htm | |
Sequin quick guide. Get detailed information on how to use Sequin. |