1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.

No significant similarity was found .


2. Translate the above two gene sequences to protein sequences.

MLH1---

22 atgtcgttcgtggcaggggttattcggcggctggacgagacagtg
M S F V A G V I R R L D E T V
67 gtgaaccgcatcgcggcgggggaagttatccagcggccagctaat
V N R I A A G E V I Q R P A N
112 gctatcaaagagatgattgagaactgtttagatgcaaaatccaca
A I K E M I E N C L D A K S T
157 agtattcaagtgattgttaaagagggaggcctgaagttgattcag
S I Q V I V K E G G L K L I Q
202 atccaagacaatggcaccgggatcaggaaagaagatctggatatt
I Q D N G T G I R K E D L D I
247 gtatgtgaaaggttcactactagtaaactgcagtcctttgaggat
V C E R F T T S K L Q S F E D
292 ttagccagtatttctacctatggctttcgaggtgaggctttggcc
L A S I S T Y G F R G E A L A
337 agcataagccatgtggctcatgttactattacaacgaaaacagct
S I S H V A H V T I T T K T A
382 gatggaaagtgtgcatacagagcaagttactcagatggaaaactg
D G K C A Y R A S Y S D G K L
427 aaagcccctcctaaaccatgtgctggcaatcaagggacccagatc
K A P P K P C A G N Q G T Q I
472 acggtggaggaccttttttacaacatagccacgaggagaaaagct
T V E D L F Y N I A T R R K A
517 ttaaaaaatccaagtgaagaatatgggaaaattttggaagttgtt
L K N P S E E Y G K I L E V V
562 ggcaggtattcagtacacaatgcaggcattagtttctcagttaaa
G R Y S V H N A G I S F S V K
607 aaacaaggagagacagtagctgatgttaggacactacccaatgcc
K Q G E T V A D V R T L P N A
652 tcaaccgtggacaatattcgctccatctttggaaatgctgttagt
S T V D N I R S I F G N A V S
697 cgagaactgatagaaattggatgtgaggataaaaccctagccttc
R E L I E I G C E D K T L A F
742 aaaatgaatggttacatatccaatgcaaactactcagtgaagaag
K M N G Y I S N A N Y S V K K
787 tgcatcttcttactcttcatcaaccatcgtctggtagaatcaact
C I F L L F I N H R L V E S T
832 tccttgagaaaagccatagaaacagtgtatgcagcctatttgccc
S L R K A I E T V Y A A Y L P
877 aaaaacacacacccattcctgtacctcagtttagaaatcagtccc
K N T H P F L Y L S L E I S P
922 cagaatgtggatgttaatgtgcaccccacaaagcatgaagttcac
Q N V D V N V H P T K H E V H
967 ttcctgcacgaggagagcatcctggagcgggtgcagcagcacatc
F L H E E S I L E R V Q Q H I
1012 gagagcaagctcctgggctccaattcctccaggatgtacttcacc
E S K L L G S N S S R M Y F T
1057 cagactttgctaccaggacttgctggcccctctggggagatggtt
Q T L L P G L A G P S G E M V
1102 aaatccacaacaagtctgacctcgtcttctacttctggaagtagt
K S T T S L T S S S T S G S S
1147 gataaggtctatgcccaccagatggttcgtacagattcccgggaa
D K V Y A H Q M V R T D S R E
1192 cagaagcttgatgcatttctgcagcctctgagcaaacccctgtcc
Q K L D A F L Q P L S K P L S
1237 agtcagccccaggccattgtcacagaggataagacagatatttct
S Q P Q A I V T E D K T D I S
1282 agtggcagggctaggcagcaagatgaggagatgcttgaactccca
S G R A R Q Q D E E M L E L P
1327 gcccctgctgaagtggctgccaaaaatcagagcttggagggggat
A P A E V A A K N Q S L E G D
1372 acaacaaaggggacttcagaaatgtcagagaagagaggacctact
T T K G T S E M S E K R G P T
1417 tccagcaaccccagaaagagacatcgggaagattctgatgtggaa
S S N P R K R H R E D S D V E
1462 atggtggaagatgattcccgaaaggaaatgactgcagcttgtacc
M V E D D S R K E M T A A C T
1507 ccccggagaaggatcattaacctcactagtgttttgagtctccag
P R R R I I N L T S V L S L Q
1552 gaagaaattaatgagcagggacatgaggttctccgggagatgttg
E E I N E Q G H E V L R E M L
1597 cataaccactccttcgtgggctgtgtgaatcctcagtgggccttg
H N H S F V G C V N P Q W A L
1642 gcacagcatcaaaccaagttataccttctcaacaccaccaagctt
A Q H Q T K L Y L L N T T K L
1687 agtgaagaactgttctaccagatactcatttatgattttgccaat
S E E L F Y Q I L I Y D F A N
1732 tttggtgttctcaggttatcggagccagcaccgctctttgacctt
F G V L R L S E P A P L F D L
1777 gccatgcttgccttagatagtccagagagtggctggacagaggaa
A M L A L D S P E S G W T E E
1822 gatggtcccaaagaaggacttgctgaatacattgttgagtttctg
D G P K E G L A E Y I V E F L
1867 aagaagaaggctgagatgcttgcagactatttctctttggaaatt
K K K A E M L A D Y F S L E I
1912 gatgaggaagggaacctgattggattaccccttctgattgacaac
D E E G N L I G L P L L I D N
1957 tatgtgccccctttggagggactgcctatcttcattcttcgacta
Y V P P L E G L P I F I L R L
2002 gccactgaggtgaattgggacgaagaaaaggaatgttttgaaagc
A T E V N W D E E K E C F E S
2047 ctcagtaaagaatgcgctatgttctattccatccggaagcagtac
L S K E C A M F Y S I R K Q Y
2092 atatctgaggagtcgaccctctcaggccagcagagtgaagtgcct
I S E E S T L S G Q Q S E V P
2137 ggctccattccaaactcctggaagtggactgtggaacacattgtc
G S I P N S W K W T V E H I V
2182 tataaagccttgcgctcacacattctgcctcctaaacatttcaca
Y K A L R S H I L P P K H F T
2227 gaagatggaaatatcctgcagcttgctaacctgcctgatctatac
E D G N I L Q L A N L P D L Y
2272 aaagtctttgagaggtgttaa 2292
K V F E R C *

mutS--

88 atgagtgcaatagaaaatttcgacgcccatacgcccatgatgcag
M S A I E N F D A H T P M M Q
133 cagtatctcaagctgaaagcccagcatcccgagatcctgctgttt
Q Y L K L K A Q H P E I L L F
178 taccggatgggtgatttttatgaactgttttatgacgacgcaaaa
Y R M G D F Y E L F Y D D A K
223 cgcgcgtcgcaactgctggatatttcactgaccaaacgcagtgct
R A S Q L L D I S L T K R S A
268 tcggcgggagagccgatcccgatggcggggattccctaccatgcg
S A G E P I P M A G I P Y H A
313 gtggaaaactacctcgccaaactggtgaatcagggcgagtccgtt
V E N Y L A K L V N Q G E S V
358 gccatctgcgaacaaattggcgatccggcgaccagcaaaggtccg
A I C E Q I G D P A T S K G P
403 gttgagcgcaaagttgtgcgtatcgttacgccaggcaccatcagc
V E R K V V R I V T P G T I S
448 gatgaagccctgttgcaggagcgtcaggacaacctgctggcggct
D E A L L Q E R Q D N L L A A
493 atctggcaggacagcaaaggtttcgcctacgcgacgctggatatc
I W Q D S K G F A Y A T L D I
538 agttccggtcgttttcgcctgagcgaaccggctgaccgcgaaacg
S S G R F R L S E P A D R E T
583 atggcggcagaactgcaacgcactaatcctgcggaactgctgtat
M A A E L Q R T N P A E L L Y
628 gcagaagattttgctgaaatgtcgttaattgaaggccgtcgcggc
A E D F A E M S L I E G R R G
673 ctgcgccgtcgcccgctgtgggagtttgaaatcgacaccgcgcgc
L R R R P L W E F E I D T A R
718 cagcagttgaatctgcaatttgggacccgcgatctggtcggtttt
Q Q L N L Q F G T R D L V G F
763 ggcgtcgagaacgcgccgcgcggactttgtgctgccggttgtctg
G V E N A P R G L C A A G C L
808 ttgcagtatgcgaaagatacccaacgtacgactctgccgcatatt
L Q Y A K D T Q R T T L P H I
853 cgttccatcaccatggaacgtgagcaggacagcatcattatggat
R S I T M E R E Q D S I I M D
898 gccgcgacgcgtcgtaatctggaaatcacccagaacctggcgggt
A A T R R N L E I T Q N L A G
943 ggtgcggaaaatacgctggcttctgtgctcgactgcaccgtcacg
G A E N T L A S V L D C T V T
988 ccgatgggcagccgtatgctgaaacgctggctgcatatgccagtg
P M G S R M L K R W L H M P V
1033 cgccatacccgcgtgttgcttgagcgccagcaaactattggcgca
R H T R V L L E R Q Q T I G A
1078 ttgcaggatttcaccgccgagttgcagccggtactacgtcaggtc
L Q D F T A E L Q P V L R Q V
1123 ggcgacctggaacgtattctggcgcgtctggctttacgaaccgct
G D L E R I L A R L A L R T A
1168 cgcccacgcgatctggcccgtatgcgtcacgctttccagcaactg
R P R D L A R M R H A F Q Q L
1213 ccggagctgcgtgcgcagttagaaactgtcgatagtgcaccggta
P E L R A Q L E T V D S A P V
1258 caggcgctacgtgagaagatgggcgagtttgccgagctgcgcgat
Q A L R E K M G E F A E L R D
1303 ctgctggagcgagcaatcatcgacacaccgccggtgctggtacgc
L L E R A I I D T P P V L V R
1348 gacggtggtgttatcgcatcaggctataacgaagagctggatgag
D G G V I A S G Y N E E L D E
1393 tggcgcgcgctggctgacggcgcgaccgattatctggagcgtctg
W R A L A D G A T D Y L E R L
1438 gaagtccgcgagcgtgaacgtaccggcctggacacgctgaaagtt
E V R E R E R T G L D T L K V
1483 ggctttaatgcggtgcacggctactacattcaaatcagccgtggg
G F N A V H G Y Y I Q I S R G
1528 caaagccatctggcacctatcaactatatgcgtcgccagacgctg
Q S H L A P I N Y M R R Q T L
1573 aaaaacgccgagcgctacatcattccagagctaaaagagtacgaa
K N A E R Y I I P E L K E Y E
1618 gataaagtcctcacttcaaaaggcaaagcactggctctggaaaaa
D K V L T S K G K A L A L E K
1663 cagctttatgaagagctgttcgacctgctgttgccgcatctggaa
Q L Y E E L F D L L L P H L E
1708 gcgttgcaacagagcgcgagcgcgctggcggaactcgacgtgctg
A L Q Q S A S A L A E L D V L
1753 gtgaacctggcggaacgggcctataccctgaactacacctgcccg
V N L A E R A Y T L N Y T C P
1798 accttcattgataaaccgggcattcgcattaccgaaggccgccat
T F I D K P G I R I T E G R H
1843 ccggtggttgaacaggtgctgaacgagccatttatcgccaacccg
P V V E Q V L N E P F I A N P
1888 ctgaatctgtcaccgcagcgccggatgttgattatcaccggtccg
L N L S P Q R R M L I I T G P
1933 aacatgggcggtaaaagtacctatatgcgccagaccgcactgatt
N M G G K S T Y M R Q T A L I
1978 gcgctgatggcctacatcggcagctacgtaccggcgcaaaaagtc
A L M A Y I G S Y V P A Q K V
2023 gagattggcccgattgaccgcatctttacccgcgtaggcgcggca
E I G P I D R I F T R V G A A
2068 gatgatctggcgtccgggcgttcaacctttatggtggagatgacc
D D L A S G R S T F M V E M T
2113 gaaaccgctaatattctgcataacgccaccgagtacagtctggtg
E T A N I L H N A T E Y S L V
2158 ctgatggatgagattgggcgcggaacgtccacttacgatggtctg
L M D E I G R G T S T Y D G L
2203 tcgctggcgtgggcgtgcgcggaaaatctggcgaataagattaag
S L A W A C A E N L A N K I K
2248 gcgttgacgctgtttgccacccactatttcgagctgacccagtta
A L T L F A T H Y F E L T Q L
2293 ccggagaaaatggaaggcgtcgccaacgtgcatctcgatgcactg
P E K M E G V A N V H L D A L
2338 gagcacggcgacaccattgcctttatgcatagcgtgcaggatggc
E H G D T I A F M H S V Q D G
2383 gcggcgagcaaaagctacggcctggcggttgcagctctggccggc
A A S K S Y G L A V A A L A G
2428 gtgccaaaagaggttattaagcgcgcacggcaaaaactgcgtgag
V P K E V I K R A R Q K L R E
2473 ctggaaagcatttcgccgaacgccgccgctacgcaagtggatggt
L E S I S P N A A A T Q V D G
2518 acgcaaatgtctttgctgtcagtaccagaagaaacttcgcctgcg
T Q M S L L S V P E E T S P A
2563 gtcgaagctctggaaaatcttgatccggattcactcaccccgcgt
V E A L E N L D P D S L T P R
2608 caggcgctggaatggatttatcgcttgaagagtctggtgtaa 2649
Q A L E W I Y R L K S L V *


3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.

GI:13878583, MLH1_Mouse, DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1).
GI:13591989, NP_112315, mismatch repair protein [Rattus norvegicus].
GI:4557757, NP_000240, mutL homolog 1; mutL (E. coli) homolog 1 [Homo sapiens].
GI:466462, AAA17374, human homolog of E. coli mutL gene product
GI:604369, AAA85687.1, hMLH1 gene product.
GI:12835158, BAB23172.1, putative [Mus musculus].
GI:13543339, AAH05833.1, Similar to mutL (E. coli) homolog 1 (colon cancer, nonpolyposis type 2) [Homo sapiens].
GI:7304079, AAF59117.1, Mlh1 gene product [Drosophila melanogaster].
GI:3192877, AAC19117.1, mutL homolog [Drosophila melanogaster].
GI:460627, AAA16835.1, Mlh1p.

4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.


5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.

position--- 147 ~ 325 a.a.

name---DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family.

Pfam ID number---01119


6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD
alignment.

answer