Compare human colon cancer gene MLH1 with other genes.
- To use ORF finder to translate DNA sequence to protein sequence in all reading frames. -
- To use blastn, blastp, CD search and blast 2 sequence programs for searching and comparison. -
-Deadline- 11/06/2001

1. Compare MLH1 (answer of assignment 2.6) and mutS (answer of 2.7) sequence.

From the homepage of the NCBI, we can find "Tools" at the left. We keep going into the BLAST system and choose the "Standard nucleatide-nucleartide BLAST". Get in to the "Blast 2 sequences" and pasted the two sequences on.

But finally we can only get no answer...

Sequence 1 lcl|seq_1 Length 2484

Sequence 2 lcl|seq_2 Length 2947
No significant similarity was found


2. Translate the above two gene sequences to protein sequences.

We copy the sequence, and past it on the Nucleotide query of the Translated BLAST Research. After the BLAST action and Format action after that:

MHL1

BLASTX 2.2.1 [Apr-13-2001]

Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

RID: 1004725957-17574-30390

Query=
(2484 letters)


Database: nr
791,492 sequences; 251,575,206 total letters

Sequences producing significant alignments: (bits) Value

gi|4557757|ref|NP_000240.1| mutL homolog 1; mutL (E. coli) ... 1443 0.0
gi|466462|gb|AAA17374.1| (U07418) human homolog of E. coli ... 1442 0.0
gi|604369|gb|AAA85687.1| (U17857) hMLH1 gene product [Homo ... 1429 0.0
gi|13878583|sp|Q9JK91|MLH1_MOUSE DNA MISMATCH REPAIR PROTEI... 1290 0.0
gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattu... 1263 0.0

>gi|4557757|ref|NP_000240.1| mutL homolog 1; mutL (E. coli) homolog 1 [Homo sapiens]
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERF
TTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA
TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCEDKTLAF
KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEE
SILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPL
SKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVE
MVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLP
LLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIV
YKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC

E.coli mismatch repair gene mitS:

BLASTX 2.2.1 [Apr-13-2001]

Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

RID: 1004725247-10838-22729

Query=
(2947 letters)


Database: nr
791,492 sequences; 251,575,206 total letters



Sequences producing significant alignments: (bits) Value

gi|4557761|ref|NP_000242.1| mutS homolog 2; mutS (E. coli) ... 1778 0.0
gi|2135744|pir||I37550 mismatch repair protein MSH2 - human... 1675 0.0
gi|6678938|ref|NP_032654.1| mutS homolog 2 (E. coli) [Mus m... 1658 0.0
gi|726086|gb|AAA75027.1| (U21011) MutS homolog 2 [Mus muscu... 1657 0.0
gi|13591999|ref|NP_112320.1| mismatch repair protein [Rattu... 1646 0.0
gi|7448104|pir||I64827 gene MSH2 protein - human >gi|100088... 1503 0.0
gi|1079288|pir||S53609 DNA mismatch repair protein MSH2 - A... 1451 0.0
gi|14736868|ref|XP_034901.1| mutS homolog 2 [Homo sapiens] ... 1334 0.0
gi|15625578|gb|AAL04169.1|AF412833_1 (AF412833) mismatch re... 1321 0.0
gi|1000883|gb|AAB59571.1| (L47579) Insertion mutation resul... 1042 0.0

>gi|4557761|ref|NP_000242.1| mutS homolog 2; mutS (E. coli) homolog 2 [Homo sapiens]
MAVQPKETLQLESAAEVGFVRFFQGMPEKPTTTVRLFDRGDFYTAHGEDALLAAREVFKTQGVIKYMGPAGAKNLQSVVL
SKMNFESFVKDLLLVRQYRVEVYKNRAGNKASKENDWYLAYKASPGNLSQFEDILFGNNDMSASIGVVGVKMSAVDGQRQ
VGVGYVDSIQRKLGLCEFPDNDQFSNLEALLIQIGPKECVLPGGETAGDMGKLRQIIQRGGILITERKKADFSTKDIYQD
LNRLLKGKKGEQMNSAVLPEMENQVAVSSLSAVIKFLELLSDDSNFGQFELTTFDFSQYMKLDIAAVRALNLFQGSVEDT
TGSQSLAALLNKCKTPQGQRLVNQWIKQPLMDKNRIEERLNLVEAFVEDAELRQTLQEDLLRRFPDLNRLAKKFQRQAAN
LQDCYRLYQGINQLPNVIQALEKHEGKHQKLLLAVFVTPLTDLRSDFSKFQEMIETTLDMDQVENHEFLVKPSFDPNLSE
LREIMNDLEKKMQSTLISAARDLGLDPGKQIKLDSSAQFGYYFRVTCKEEKVLRNNKNFSTVDIQKNGVKFTNSKLTSLN
EEYTKNKTEYEEAQDAIVKEIVNISSGYVEPMQTLNDVLAQLDAVVSFAHVSNGAPVPYVRPAILEKGQGRIILKASRHA
CVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPCESAEVSIVDCILARVGAGDSQLK
GVSTFMAEMLETASILRSATKDSLIIIDELGRGTSTYDGFGLAWAISEYIATKIGAFCMFATHFHELTALANQIPTVNNL
HVTALTTEETLTMLYQVKKGVCDQSFGIHVAELANFPKHVIECAKQKALELEEFQYIGESQGYDIMEPAAKKCYLEREQG
EKIIQEFLSKVKQMPFTEMSEENITIKLKQLKAEVIAKNNSFVNEIISRIKVTT


3.Perform protein sequence homology searching for MLH1 in GenBank. Give the 10 highest hits.

Click the Protein query - Translated db [tblastn] of Translated BLAST Research and past the protein sequence. After the BLAST action and Format action, the reply will be like this:

Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

RID: 1004726621-24036-23745

Query=
(756 letters)


Database: nt
1,000,461 sequences; 70,461,447 total letters

Sequences producing significant alignments: (bits) Value

gi|14725770|ref|XM_044891.1| Homo sapiens mutL (E. coli) ho... 1443 0.0
gi|13905125|gb|BC006850.1|BC006850 Homo sapiens, mutL (E. c... 1443 0.0
gi|4557756|ref|NM_000249.1| Homo sapiens mutL (E. coli) hom... 1443 0.0
gi|463988|gb|U07343.1|HSU07343 Human DNA mismatch repair pr... 1443 0.0
gi|466461|gb|U07418.1|HSHMLHI Human DNA mismatch repair (hm... 1442 0.0
gi|7595953|gb|AF250844.1|AF250844 Mus musculus MutL homolog... 1290 0.0
gi|13591988|ref|NM_031053.1| Rattus norvegicus mismatch rep... 1263 0.0
gi|1724117|gb|U80054.1|RNU80054 Rattus norvegicus mismatch ... 1263 0.0
gi|12835157|dbj|AK004105.1|AK004105 Mus musculus 18 days em... 771 0.0
gi|13543416|gb|BC005866.1|BC005866 Homo sapiens, Similar to... 738 0.0


4. Compare human MLH1 protein with MLH1 in M. musculus, R. norvegicus and D. melanogaster. Give the pairwise alignment and % of sequence smility.

We click the Standard Protein-protein BLAST of the Protein BLAST. We paste the amino acid sequence, and then format, we can then find information below:

gi|7595954|gb|AAF64514.1|AF250844_1 (AF250844) MutL homolog 1 protein [Mus musculus]
Length = 760

Score = 1292 bits (3344), Expect = 0.0
Identities = 651/760 (85%), Positives = 693/760 (90%), Gaps = 4/760 (0%)

 

>gi|13591989|ref|NP_112315.1| mismatch repair protein [Rattus norvegicus]
gi|13878571|sp|P97679|MLH1_RAT DNA MISMATCH REPAIR PROTEIN MLH1 (MUTL PROTEIN HOMOLOG 1)
gi|1724118|gb|AAB38506.1| (U80054) mismatch repair protein [Rattus norvegicus]
Length = 757

Score = 1289 bits (3336), Expect = 0.0
Identities = 639/758 (84%), Positives = 684/758 (89%), Gaps = 3/758 (0%)

 

>gi|7304079|gb|AAF59117.1| (AE003838) Mlh1 gene product [Drosophila melanogaster]
Length = 664

Score = 615 bits (1586), Expect = e-175
Identities = 335/751 (44%), Positives = 453/751 (59%), Gaps = 94/751 (12%)

 


5. Search the conserve domain (CD) for MLH1. Give the position of the CD, name of CD and Pfam ID number.

Click the Search the Conserved Domain Database using RPS-BLAST, then paste the protein sequence and search:

RPS-BLAST 2.2.1 [Aug-1-2001]

Query= local sequence:
(756 letters)

Database: oasis_sap.v1.54
3693 PSSMs; 718,011 total columns


PSSMs producing significant alignments: Score

gnl|Pfam|pfam01119 DNA_mis_repair, DNA mismatch repair protein. Also known as the... 202 7e-53
gnl|Pfam|pfam02518 HATPase_c, Histidine kinase-, DNA gyrase B-, phytochrome-like ... 40.8 3e-04
gnl|Smart|smart00387 HATPase_c, Histidine kinase-like ATPases; Histidine kinase-, D... 40.0 5e-04


6. Show multiple alignment of MLH1 conserve domain with 5 sequences from the top of the CD alignment.

We choose the one:

gnl|Pfam|pfam01119, DNA_mis_repair, DNA mismatch repair protein. Also known as the mutL/hexB/PMS1 family.

We change the query to multiple alignment to displaying up to 5 sequence from the top of the CD alignment. The result is like this:

10 20 30 40 50 60
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 1 GTTVEVRDLFYNLPVRRKFLKSPKKEFRKILDLLQRYALIHPNVSFSLTKEG--KALLQL 58
query 147 GTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQG--ETVADV 204
1B63_A 144 GTTLEVLDLFYNTPARRKFLRTEKTEFNHIDEIIRRIALARFDVTINLSHNG--KIVRQY 201
gi 8039787 159 GTVVRVEQLFENFPARKRFLGRQSAETTLCRSALIDVSLAHHPVEFRFTVDGthKLTLLS 218
gi 8928214 141 GTIVDVTKIFHNFPARKRFLKQEPIETKMCLKVLEEKIITHPEINFEIN-LN--QKLRKI 197


70 80 90 100 110 120
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 59 KTSP--S-SLKERIRSVFGTAVLKNLIPF--EEKDGDFRIEG-FISSPNVSR-SSRDRQF 111
query 205 RTLP--NaSTVDNIRSIFGNAVSRELIEIgcEDKTLAFKMNG-YISNANYSV--KKCIFL 259
1B63_A 202 RAVPegG-QKERRLGAICGTAFLEQALAI--EWQHGDLTLRG-WVADPNHTTpALAEIQY 257
gi 8039787 219 QQTR--K-DRCLETQMLKGDPALFHTIEG--G--DCSFHFHLvLSEPAICRR--ERRGIF 269
gi 8928214 198 YFK---E-SLIDRVQNVYGNVIENNKFRV--LKKEHDNIKIEiFLAPDNFSK-KSKRHIK 250


130 140 150 160 170 180
....*....|....*....|....*....|....*....|....*....|....*....|
consensus 112 LFINGRPVEDKLLLKAIREVYATYLPRGRYPVFVLNLELPPELVDVNVHPDKKEVRLLKE 171
query 260 LFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHE 319
1B63_A 258 CYVNGRMMRDRLINHAIRQACEDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQS 317
gi 8039787 270 TFVNGRRIFDYGLVQALVLGSEGYFPNGTFPVACLFLTVNSERIDFNIHPAKKEVHLQDY 329
gi 8928214 251 TFVNRRPIDQKDLLEAITNGHSRILSPGNFPICYLFLEINPEYIDFNVHPQKKEVRFYNL 310

....*...
consensus 172 EEILDLIK 179
query 320 ESILERVQ 327
1B63_A 318 RLVHDFIY 325
gi 8039787 330 AHIRHTLS 337
gi 8928214 311 PFLFKLIS 318

Which is shown on another webpage.