You are designing algorithms for the bioinformatic prediction of gene sequences. How might algorithms differ for predicting genes in bacterial versus eukaryotic genomic sequence?
Table of contents
- 1. Introduction to Genetics51m
- 2. Mendel's Laws of Inheritance3h 37m
- 3. Extensions to Mendelian Inheritance2h 41m
- 4. Genetic Mapping and Linkage2h 28m
- 5. Genetics of Bacteria and Viruses1h 21m
- 6. Chromosomal Variation1h 48m
- 7. DNA and Chromosome Structure56m
- 8. DNA Replication1h 10m
- 9. Mitosis and Meiosis1h 34m
- 10. Transcription1h 0m
- 11. Translation58m
- 12. Gene Regulation in Prokaryotes1h 19m
- 13. Gene Regulation in Eukaryotes44m
- 14. Genetic Control of Development44m
- 15. Genomes and Genomics1h 50m
- 16. Transposable Elements47m
- 17. Mutation, Repair, and Recombination1h 6m
- 18. Molecular Genetic Tools19m
- 19. Cancer Genetics29m
- 20. Quantitative Genetics1h 26m
- 21. Population Genetics50m
- 22. Evolutionary Genetics29m
15. Genomes and Genomics
Bioinformatics
Problem 15
Textbook Question
In the course of the Drosophila melanogaster genome project, the following genomic DNA sequences were obtained. Try to assemble the sequences into a single contig.
5' TTCCAGAACCGGCGAATGAAGCTGAAGAAG 3'
5' GAGCGGCAGATCAAGATCTGGTTCCAGAAC 3'
5' TGATCTGCCGCTCCGTCAGGCATAGCGCGT 3'
5' GGAGAATCGAGATGGCGCACGCGCTATGCC 3'
5' GGAGAATCGAGATGGCGCACGCGCTATGCC 3'
5' CCATCTCGATTCTCCGTCTGCGGGTCAGAT 3'
Go to the URL provided in Problem 14, and using the sequence you have just assembled, perform a blastn search in the 'Nucleotide collection (nr/nt)' database. Does the search produce sequences similar to your assembled sequence, and if so, what are they? Can you tell if your sequence is transcribed, and if it represents protein-coding sequence? Perform a tblastx search, first choosing the 'Nucleotide collection (nr/nt)' database and then limiting the search to human sequences by typing Homo sapiens in the organism box. Are homologous sequences found in the human genome? Annotate the assembled sequence.

1
Begin by examining each provided DNA sequence carefully, noting their 5' to 3' orientation. Since these are genomic fragments, your goal is to find overlapping regions between the sequences to assemble them into a continuous stretch, called a contig.
Compare the end of one sequence with the beginning of another to identify overlaps. For example, look for a substring at the 3' end of one sequence that matches the 5' start of another sequence. This overlap indicates that these sequences are adjacent in the genome.
Once you identify overlapping sequences, merge them by aligning the overlapping regions, ensuring that the overlapping nucleotides match perfectly. Continue this process iteratively, joining sequences step-by-step until all sequences are assembled into a single contig.
After assembling the contig, use the assembled sequence to perform a blastn search against the 'Nucleotide collection (nr/nt)' database. This will help you find similar sequences and determine if your assembled sequence corresponds to known genomic regions, genes, or other elements.
Next, perform a tblastx search with your assembled sequence, first against the entire nucleotide database and then limiting the search to Homo sapiens sequences. This will translate your nucleotide sequence in all six reading frames and compare it to protein sequences, helping you identify potential protein-coding regions and homologous sequences in humans. Use these results to annotate your assembled sequence, indicating whether it is transcribed and if it likely encodes a protein.

This video solution was recommended by our tutors as helpful for the problem above
Video duration:
2mPlay a video:
Was this helpful?
Key Concepts
Here are the essential concepts you must grasp in order to answer the question correctly.
Sequence Assembly and Contig Formation
Sequence assembly involves aligning and merging overlapping DNA fragments to reconstruct the original sequence, forming a continuous stretch called a contig. Understanding how to identify overlaps and correctly order sequences is essential for genome projects and accurate downstream analysis.
Recommended video:
Guided course
Sequencing Difficulties
BLAST Searches and Sequence Similarity
BLAST (Basic Local Alignment Search Tool) compares a query sequence against databases to find regions of similarity, indicating homology or functional relationships. Different BLAST programs (blastn, tblastx) serve specific purposes, such as nucleotide-nucleotide or translated nucleotide searches, helping identify related sequences and infer function.
Recommended video:
Guided course
Sequencing Difficulties
Gene Annotation and Functional Inference
Gene annotation involves identifying features like coding regions, transcriptional activity, and homologous genes within a sequence. By analyzing BLAST results and sequence characteristics, one can determine if a sequence is transcribed, protein-coding, and conserved across species, providing insights into its biological role.
Recommended video:
Guided course
Functional Genomics
Related Videos
Related Practice
Textbook Question
429
views