1. How many hits will you get if you search genes associated with colon cancer in human genome?
In the homepage of NCBI, we can find a box titled:"Draft Human Genome" with a link :human genome resource. get in. Then just on the top of the next page shown we can just key in "colon cancer" and put in the search box "GenBank", then we can get the answer 2384.
2. How many loci will you find if you search locus link for human in Genebank?
In the homepage of NCBI, we can find a box titled:"Draft Human Genome" with a link :human genome resource. get in. Then just on the top of the next page shown we can just key in "colon cancer" and keep the search box on "LocusLink", then we can get the answer 21 with the infromation about these locus like symbols and locations.
3. Give the locus ID and position of MLH1.
On the previous test, we can find the MLH1 symbol on the fourth one with LocusID 4292. It's position is on 3p21.3.
4. Find the %ID of nucleotide sequence for its possible orthologs in mouse.
On the previous test, we can click in the "4292" and get in to the LocusLink website of MLH1. We can find that there are many color boxes below the mRNA Genomic Alignment. Now we click the blue green box "Homol" intering thesite that tells the homologies to corresponding unigene clusters. The first line shows:
Hs.57301
MLH1 mutL (E. coli) homolog 1 (colon cancer, nonpolyposis type 2)
Click the box "more" behind the line, and we can find the informations about the orthologs of MLH1 gene. In the calculated orthologs, the answer is 89.3. Beware of the gene explanation that the one
Mm.196006
AF250844
LocusLink 17350 Mlh1
is really what we need instead of the one
Mm.4438
AK010617
LocusLink 15361 Hmga1
giving the answer 90.5.
5. Find the total number of mutations of MLH1 reported in human gene mutation database.
On the previouw test, we are now in the LocusLink website of MLH1. Now we click the green box "HGMD" and we can find that we can get the number of many types of mutations. The total is 147 mutations.
6. Give the DNA sequence of MLH1.
In the solved problem 4., we can find there is a "NCBI Reference Sequence". We click the GenBank Sourse:U07343.
We click in, and change the "GENEBANK" mode to the "FASTA" mode, then display. The DNA sequence is shown as below:
U07343 Related Sequences,
Human DNA mismatch repair protein homolog (hMLH1) mRNA, complete cds
gi|463988|gb|U07343.1|HSU07343[463988]
>gi|463988|gb|U07343.1|HSU07343 Human DNA mismatch repair protein homolog
(hMLH1) mRNA, complete cds
CTTGGCTCTTCTGGCGCCAAAATGTCGTTCGTGGCAGGGGTTATTCGGCGGCTGGACGAGACAGTGGTGAACCGCATCGC
GGCGGGGGAAGTTATCCAGCGGCCAGCTAATGCTATCAAAGAGATGATTGAGAACTGTTTAGATGCAAAATCCACAAGTA
TTCAAGTGATTGTTAAAGAGGGAGGCCTGAAGTTGATTCAGATCCAAGACAATGGCACCGGGATCAGGAAAGAAGATCTG
GATATTGTATGTGAAAGGTTCACTACTAGTAAACTGCAGTCCTTTGAGGATTTAGCCAGTATTTCTACCTATGGCTTTCG
AGGTGAGGCTTTGGCCAGCATAAGCCATGTGGCTCATGTTACTATTACAACGAAAACAGCTGATGGAAAGTGTGCATACA
GAGCAAGTTACTCAGATGGAAAACTGAAAGCCCCTCCTAAACCATGTGCTGGCAATCAAGGGACCCAGATCACGGTGGAG
GACCTTTTTTACAACATAGCCACGAGGAGAAAAGCTTTAAAAAATCCAAGTGAAGAATATGGGAAAATTTTGGAAGTTGT
TGGCAGGTATTCAGTACACAATGCAGGCATTAGTTTCTCAGTTAAAAAACAAGGAGAGACAGTAGCTGATGTTAGGACAC
TACCCAATGCCTCAACCGTGGACAATATTCGCTCCATCTTTGGAAATGCTGTTAGTCGAGAACTGATAGAAATTGGATGT
GAGGATAAAACCCTAGCCTTCAAAATGAATGGTTACATATCCAATGCAAACTACTCAGTGAAGAAGTGCATCTTCTTACT
CTTCATCAACCATCGTCTGGTAGAATCAACTTCCTTGAGAAAAGCCATAGAAACAGTGTATGCAGCCTATTTGCCCAAAA
ACACACACCCATTCCTGTACCTCAGTTTAGAAATCAGTCCCCAGAATGTGGATGTTAATGTGCACCCCACAAAGCATGAA
GTTCACTTCCTGCACGAGGAGAGCATCCTGGAGCGGGTGCAGCAGCACATCGAGAGCAAGCTCCTGGGCTCCAATTCCTC
CAGGATGTACTTCACCCAGACTTTGCTACCAGGACTTGCTGGCCCCTCTGGGGAGATGGTTAAATCCACAACAAGTCTGA
CCTCGTCTTCTACTTCTGGAAGTAGTGATAAGGTCTATGCCCACCAGATGGTTCGTACAGATTCCCGGGAACAGAAGCTT
GATGCATTTCTGCAGCCTCTGAGCAAACCCCTGTCCAGTCAGCCCCAGGCCATTGTCACAGAGGATAAGACAGATATTTC
TAGTGGCAGGGCTAGGCAGCAAGATGAGGAGATGCTTGAACTCCCAGCCCCTGCTGAAGTGGCTGCCAAAAATCAGAGCT
TGGAGGGGGATACAACAAAGGGGACTTCAGAAATGTCAGAGAAGAGAGGACCTACTTCCAGCAACCCCAGAAAGAGACAT
CGGGAAGATTCTGATGTGGAAATGGTGGAAGATGATTCCCGAAAGGAAATGACTGCAGCTTGTACCCCCCGGAGAAGGAT
CATTAACCTCACTAGTGTTTTGAGTCTCCAGGAAGAAATTAATGAGCAGGGACATGAGGTTCTCCGGGAGATGTTGCATA
ACCACTCCTTCGTGGGCTGTGTGAATCCTCAGTGGGCCTTGGCACAGCATCAAACCAAGTTATACCTTCTCAACACCACC
AAGCTTAGTGAAGAACTGTTCTACCAGATACTCATTTATGATTTTGCCAATTTTGGTGTTCTCAGGTTATCGGAGCCAGC
ACCGCTCTTTGACCTTGCCATGCTTGCCTTAGATAGTCCAGAGAGTGGCTGGACAGAGGAAGATGGTCCCAAAGAAGGAC
TTGCTGAATACATTGTTGAGTTTCTGAAGAAGAAGGCTGAGATGCTTGCAGACTATTTCTCTTTGGAAATTGATGAGGAA
GGGAACCTGATTGGATTACCCCTTCTGATTGACAACTATGTGCCCCCTTTGGAGGGACTGCCTATCTTCATTCTTCGACT
AGCCACTGAGGTGAATTGGGACGAAGAAAAGGAATGTTTTGAAAGCCTCAGTAAAGAATGCGCTATGTTCTATTCCATCC
GGAAGCAGTACATATCTGAGGAGTCGACCCTCTCAGGCCAGCAGAGTGAAGTGCCTGGCTCCATTCCAAACTCCTGGAAG
TGGACTGTGGAACACATTGTCTATAAAGCCTTGCGCTCACACATTCTGCCTCCTAAACATTTCACAGAAGATGGAAATAT
CCTGCAGCTTGCTAACCTGCCTGATCTATACAAAGTCTTTGAGAGGTGTTAAATATGGTTATTTATGCACTGTGGGATGT
GTTCTTCTTTCTCTGTATTCCGATACAAAGTGTTGTATCAAAGTGTGATATACAAAGTGTACCAACATAAGTGTTGGTAG
CACTTAAGACTTATACTTGCCTTCTGATAGTATTCCTTTATACACAGTGGATTGATTATAAATAAATAGATGTGTCTTAA
CATA
7. Give the DNA sequence of E. coli mismatch repair gene mutS.
Back to the page of the solved problem 2., we can find the locus No.5, which is:
4436 Hs MSH2 mutS (E. coli) homolog 2 (colon cancer, nonpolyposis type 1) 2p22-p21
Then click in, and follow the procedure of the question No.6 .
U04045 Related Sequences,
Human (hMSH2) mRNA, complete cds
gi|432997|gb|U04045.1|HSU04045[432997]
>gi|432997|gb|U04045.1|HSU04045 Human (hMSH2) mRNA, complete cds
GGCGGGAAACAGCTTAGTGGGTGTGGGGTCGCGCATTTTCTTCAACCAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAG
CCGAAGGAGACGCTGCAGTTGGAGAGCGCGGCCGAGGTCGGCTTCGTGCGCTTCTTTCAGGGCATGCCGGAGAAGCCGAC
CACCACAGTGCGCCTTTTCGACCGGGGCGACTTCTATACGGCGCACGGCGAGGACGCGCTGCTGGCCGCCCGGGAGGTGT
TCAAGACCCAGGGGGTGATCAAGTACATGGGGCCGGCAGGAGCAAAGAATCTGCAGAGTGTTGTGCTTAGTAAAATGAAT
TTTGAATCTTTTGTAAAAGATCTTCTTCTGGTTCGTCAGTATAGAGTTGAAGTTTATAAGAATAGAGCTGGAAATAAGGC
ATCCAAGGAGAATGATTGGTATTTGGCATATAAGGCTTCTCCTGGCAATCTCTCTCAGTTTGAAGACATTCTCTTTGGTA
ACAATGATATGTCAGCTTCCATTGGTGTTGTGGGTGTTAAAATGTCCGCAGTTGATGGCCAGAGACAGGTTGGAGTTGGG
TATGTGGATTCCATACAGAGGAAACTAGGACTGTGTGAATTCCCTGATAATGATCAGTTCTCCAATCTTGAGGCTCTCCT
CATCCAGATTGGACCAAAGGAATGTGTTTTACCCGGAGGAGAGACTGCTGGAGACATGGGGAAACTGAGACAGATAATTC
AAAGAGGAGGAATTCTGATCACAGAAAGAAAAAAAGCTGACTTTTCCACAAAAGACATTTATCAGGACCTCAACCGGTTG
TTGAAAGGCAAAAAGGGAGAGCAGATGAATAGTGCTGTATTGCCAGAAATGGAGAATCAGGTTGCAGTTTCATCACTGTC
TGCGGTAATCAAGTTTTTAGAACTCTTATCAGATGATTCCAACTTTGGACAGTTTGAACTGACTACTTTTGACTTCAGCC
AGTATATGAAATTGGATATTGCAGCAGTCAGAGCCCTTAACCTTTTTCAGGGTTCTGTTGAAGATACCACTGGCTCTCAG
TCTCTGGCTGCCTTGCTGAATAAGTGTAAAACCCCTCAAGGACAAAGACTTGTTAACCAGTGGATTAAGCAGCCTCTCAT
GGATAAGAACAGAATAGAGGAGAGATTGAATTTAGTGGAAGCTTTTGTAGAAGATGCAGAATTGAGGCAGACTTTACAAG
AAGATTTACTTCGTCGATTCCCAGATCTTAACCGACTTGCCAAGAAGTTTCAAAGACAAGCAGCAAACTTACAAGATTGT
TACCGACTCTATCAGGGTATAAATCAACTACCTAATGTTATACAGGCTCTGGAAAAACATGAAGGAAAACACCAGAAATT
ATTGTTGGCAGTTTTTGTGACTCCTCTTACTGATCTTCGTTCTGACTTCTCCAAGTTTCAGGAAATGATAGAAACAACTT
TAGATATGGATCAGGTGGAAAACCATGAATTCCTTGTAAAACCTTCATTTGATCCTAATCTCAGTGAATTAAGAGAAATA
ATGAATGACTTGGAAAAGAAGATGCAGTCAACATTAATAAGTGCAGCCAGAGATCTTGGCTTGGACCCTGGCAAACAGAT
TAAACTGGATTCCAGTGCACAGTTTGGATATTACTTTCGTGTAACCTGTAAGGAAGAAAAAGTCCTTCGTAACAATAAAA
ACTTTAGTACTGTAGATATCCAGAAGAATGGTGTTAAATTTACCAACAGCAAATTGACTTCTTTAAATGAAGAGTATACC
AAAAATAAAACAGAATATGAAGAAGCCCAGGATGCCATTGTTAAAGAAATTGTCAATATTTCTTCAGGCTATGTAGAACC
AATGCAGACACTCAATGATGTGTTAGCTCAGCTAGATGCTGTTGTCAGCTTTGCTCACGTGTCAAATGGAGCACCTGTTC
CATATGTACGACCAGCCATTTTGGAGAAAGGACAAGGAAGAATTATATTAAAAGCATCCAGGCATGCTTGTGTTGAAGTT
CAAGATGAAATTGCATTTATTCCTAATGACGTATACTTTGAAAAAGATAAACAGATGTTCCACATCATTACTGGCCCCAA
TATGGGAGGTAAATCAACATATATTCGACAAACTGGGGTGATAGTACTCATGGCCCAAATTGGGTGTTTTGTGCCATGTG
AGTCAGCAGAAGTGTCCATTGTGGACTGCATCTTAGCCCGAGTAGGGGCTGGTGACAGTCAATTGAAAGGAGTCTCCACG
TTCATGGCTGAAATGTTGGAAACTGCTTCTATCCTCAGGTCTGCAACCAAAGATTCATTAATAATCATAGATGAATTGGG
AAGAGGAACTTCTACCTACGATGGATTTGGGTTAGCATGGGCTATATCAGAATACATTGCAACAAAGATTGGTGCTTTTT
GCATGTTTGCAACCCATTTTCATGAACTTACTGCCTTGGCCAATCAGATACCAACTGTTAATAATCTACATGTCACAGCA
CTCACCACTGAAGAGACCTTAACTATGCTTTATCAGGTGAAGAAAGGTGTCTGTGATCAAAGTTTTGGGATTCATGTTGC
AGAGCTTGCTAATTTCCCTAAGCATGTAATAGAGTGTGCTAAACAGAAAGCCCTGGAACTTGAGGAGTTTCAGTATATTG
GAGAATCGCAAGGATATGATATCATGGAACCAGCAGCAAAGAAGTGCTATCTGGAAAGAGAGCAAGGTGAAAAAATTATT
CAGGAGTTCCTGTCCAAGGTGAAACAAATGCCCTTTACTGAAATGTCAGAAGAAAACATCACAATAAAGTTAAAACAGCT
AAAAGCTGAAGTAATAGCAAAGAATAATAGCTTTGTAAATGAAATCATTTCACGAATAAAAGTTACTACGTGAAAAATCC
CAGTAATGGAATGAAGGTAATATTGATAAGCTATTGTCTGTAATAGTTTTATATTGTTTTATATTAA