1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.

Sequence 1 gi 463988 Human DNA mismatch repair protein homolog (hMLH1) mRNA, complete cds. Length 2484

Sequence 2 gi 146905 Escherichia coli DNA mismatch repair protein (fdv) gene, complete cds. Length 3327
No significant similarity was found

How to find : NCBI > Blast > Pairwise BLAST


2. Translate the above two gene sequences to protein sequences.

hMLH1:MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKE
       GGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALAS
       ISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA
       TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVD
       NIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVE  
       STSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI  
       LERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSD
       KVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEE
       MLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVED
       DSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQW
       ALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDS
       PESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNY
       VPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
       QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
mutS:MSAIENFDAHTPMMQQYLRLKAQHPEILLFYRMGDFYELFYDDA
   KRASQLLDISLTKRGASAGEPIPMAGIPYHAVENYLAKLVNQGESVAICEQIGDPATS
   KGPVERKVVRIVTPGTISDEALLQERQDNLLAAIWQDSKGFGYATLDISSGRFRLSEP
   ADRETMAAELQRTNPAELLYAEDFAEMSLIEGRRGLRRRPLWEFEIDTARQQLNLQFG
   TRDLVGFGVENAPRGLCAAGCLLQYAKDTQRTTLPHIRSITMEREQDSIIMDAATRRN
   LEITQNLAGGAENTLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGALQDF
   TAGLQPVLRQVGDLERILARLALRTARPRDLARMRHAFQQLPELRAQLETVDSAPVQA
   LREKMGEFAELRDLLERAIIDTPPVLVRDGGVIASGYNEELDEWRALADGATDYLERL
   EVRERERTGLDTLKVGFNAVHGYYIQISRGQSHLAPINYMRRQTLKNAERYIIPELKE
   YEDKVLTSKGKALALEKQLYEELFDLLLPHLEALQQSASALAELDVLVNLAERAYTLN
   YTCPTFIDKPGIRITEGRHPVVEQVLNEPFIANPLNLSPQRRMLIITGPNMGGKSTYM
   RQTALIALMAYIGSYVPAQKVEIGPIDRIFTRVGAADDLASGRSTFMVEMTETANILH
   NATEYSLVLMDEIGRGTSTYDGLSLAWACAENLANKIKALTLFATHYFELTQLPEKME
   GVANVHLDALEHGDTIAFMHSVQDGAASKSYGLAVAALAGVPKEVIKRARQKLRELES
   ISPNAAATQVDGTQMSLLSVPEETSPAVEALENLDPDSLTPRQALEWIYRLKSLV

3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.
gi|13878583|sp|Q9JK91|       MLH1_MOUSE DNA MISMATCH REPAIR PROTEI... 1292   0.0
gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattu...        1289   0.0
gi|4557757 |ref|NP_000240.1| mutL homolog 1; mutL (E. coli) ...       1467   0.0
gi|466462  |gb|AAA17374.1|   (U07418) human homolog of E. coli ...    1466   0.0
gi|604369  |gb|AAA85687.1|   (U17857) hMLH1 gene product [Homo ...    1453   0.0
gi|12835158|dbj|BAB23172.1|  (AK004105) putative [Mus musculus]        753   0.0
gi|13543339|gb|AAH05833.1|   AAH05833 (BC005833) Similar to mu...      731   0.0
gi|7304079 |gb|AAF59117.1|   (AE003838) Mlh1 gene product [Dro...      615  e-175
gi|3192877 |gb|AAC19117.1|   (AF068257) mutL homolog [Drosophi...      608  e-173
gi|460627  |gb|AAA16835.1|   (U07187) Mlh1p [Saccharomyces cere...     471  e-132

How to find : NCBI > Blast > Protein BLAST > Paste sequence > Blast > Format


4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.


Mus musculus : Identities = 651/760 (85%), Positives = 693/760 (90%), Gaps = 4/760 (0%)

Pairwise alignment

Rattus norvegicus : Identities = 639/758 (84%), Positives = 684/758 (89%), Gaps = 3/758 (0%)

Pairwise alignment

Drosophila melanogaster : Identities = 334/751 (44%), Positives = 453/751 (59%), Gaps = 95/751 (12%)

Pairwise alignment

How to find : NCBI > Blast > Protein BLAST > Limit by entrez query or select from"Mus musculus"" R. norvegicus" "D. melanogaste"> BLAST >Format


5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.

Position: 147-327 amino acid

Name: DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family

Pfam ID: 01119

How to find : NCBI > Blast > Pairwise BLAST > Search for conserved domains > Paste sequence > Search


6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.

                       10        20        30        40        50        60
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 1 GTTVEVRDLFYNLPVRRKFLKSPKKEFRKILDLLQRYALIHPNVSFSLTKEG--KALLQL 58
1B63_A 144 GTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNG--KIVRQY 201
gi 8039787 159 GTVVRVEQLFENFPARKRFLGRQSAETTLCRSALIDVSLAHHPVEFRFTVDGthKLTLLS 218
gi 8928214 141 GTIVDVTKIFHNFPARKRFLKQEPIETKMCLKVLEEKIITHPEINFEIN-LN--QKLRKI 197
gi 3914081 141 GTEVEVRDLFFNLPVRRKFLKKEDTERRKVLELIKEYALTNPEVEFTLFSEG--RETLKL 198 70 80 90 100 110 120
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 59 KTSP--SSLKERIRSVFGTAVLKNLIPFEEKDGDFRIEG-FISSPNVSR-SSRDRQFLFI 114
1B63_A 202 RAVPegGQKERRLGAICGTAFLEQALAIEWQHGDLTLRG-WVADPNHTTpALAEIQYCYV 260
gi 8039787 219 QQTR--KDRCLETQMLKGDPALFHTIEGG--DCSFHFHLvLSEPAICRR--ERRGIFTFV 272
gi 8928214 198 YFK---ESLIDRVQNVYGNVIENNKFRVLKKEHDNIKIEiFLAPDNFSK-KSKRHIKTFV 253
gi 3914081 199 KKS----SLKERVEEVFQTKTEELYAERE--GITLRA---FVSRNQRQG-----KYYVFI 244 130 140 150 160 170 180
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 115 NGRPVEDKLLLKAIREVYATYLPRGRYPVFVLNLELPPELVDVNVHPDKKEVRLLKEEEI 174
1B63_A 261 NGRMMRDRLINHAIRQACEDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLV 320
gi 8039787 273 NGRRIFDYGLVQALVLGSEGYFPNGTFPVACLFLTVNSERIDFNIHPAKKEVHLQDYAHI 332
gi 8928214 254 NRRPIDQKDLLEAITNGHSRILSPGNFPICYLFLEINPEYIDFNVHPQKKEVRFYNLPFL 313
gi 3914081 245 NKRPIQNKNLKEFLRKVFG------YKTLVVLYAELPPFMVDFNVHPKKKEVNILKERKF 298 ....*
consensus 175 LDLIK 179
1B63_A 321 HDFIY 325
gi 8039787 333 RHTLS 337
gi 8928214 314 FKLIS 318
gi 3914081 299 LELVR 303
How to find : NCBI > Blast > Pairwise BLAST > Search for conserved domains > Paste sequence > Search > gnl|Pfam|pfam01119 > View alignment showing "up to 5" "top listed sequences"