Homework 7 (due on 12/31>

Sequence analysis / model building

題目: David Liu, a student in the department of Lfe Sciences, got the following mRNA sequence from a rat liver cDNA library. GCGTCGACGGGCTTGGCATCGGGCCTCCGCAGCCGCCCACCGCCAGAAGCTTCCAGCCT CACCACTATGGATCCCCGCAAAGTGAGCGAGCTTCGGGCCTTCGTGAAGATGTGTAGG CAGGACCCGAGCGTCCTGCACACCGAGGAAATGCGTTTCCTGAGGGAGTGGGTGGAGA GCATGGGGGGTAAAGTACCACCTGCTACTCATAAAGCGAAGTCAGAAGAAAACACTAA GGAAGAAAAAAGAGACAAGACGACAGAGGACAACATAAAGACAGAGGAGCCATCGAG TGAGGAGAGCGATCTAGAAATTGACAATGAAGGTGTAATTGAAGCAGACACTGATGCT CCTCAGGAAATGGGAGATGAAAATGCAGAGATAACTGAGGCGATGATGGATGAAGCAA ATGAAAAGAAGGGGGCTGCCATCGACGCTCTAAATGATGGTGAGCTCCAGAAAGCCAT TGACTTGTTCACAGACGCCATCAAGCTAAACCCTCGCTTGGCCATTCTGTATGCCAAGA GAGCCAGTGTTTTCGTCAAATTACAGAAGCCAAATGCTGCCATCCGAGACTGTGACAGA GCTATTGAAATAAACCCTGATTCAGCTCAGCCATACAAATGGAGAGGGAAAGCGCACA GACTCCTGGGTCACTGGGAAGAAGCAGCTCGCGATCTTGCCCTGGCCTGTAAATTGGAC TATGATGAGGACGCCAGTGCAATGCTGAGAGAAGTCCAGCCTCGGGCTCAAAAAATTGC TGAACATCGGAGAAAGTATGAGCGAAAACGTGAAGAGCGAGAGATAAAAGAACGAAT AGAAAGGGTGAAGAAGGCTCGAGAAGAGCATGAAAAAGCCCAAAGGGAAGAAGAAGC CAGAAGACAATCTGGATCTCAGTTTGGCTCTTTTCCAGGTGGTTTTCCTGGGGGAATGC CTGGTAATTTTCCTGGAGGAATGCCTGGAATGGGAGGGGCCATGCCAGGAATGGCAGG AATGCCTGGACTCAACGAAATCCTCAGTGACCCAGAGGTTCTTGCAGCCATGCAGGATC CAGAAGTCATGGTGGCTTTCCAGGATGTGGCCCAGAACCCATCAAATATGTCAAAATAT CAGAACAACCCAAAGGTTATGAATCTCATCAGTAAATTGTCAGCCAAGTTTGGAGGTCA CTCATAATGTCAAAGCCCTTGCTGAATGAAGAACAGCTTAGCTCACTTACTGGATGTTG CAATAATACAAACCAGTGTACCTCTGACCTCACCAGAGAGCTGGGGCGCTTCGAAGATA ATCCCTACCCTCTGCATCATATGCGGCTGAGGCATATTACAGTGGTTTGCCATTAGAGT GTTCATTCAGATAATGTTTTCCTATTAGGAATTACAAACTTAAAACATTTTTCAACCTTA AACATATTTTTTAAAAATTTAGGGGATGTCAATTCCTACATTTTTCGTTACTAATCTTTT TGGGTTTTTCCTTTTGAATTACTGGGCAAGGAAGGTGAATGTGGATGATTTACTGCTTT CATGAATGAAATAAAGATTTGTTAGTGGGAAGCAAATAAAACACATTTAAGTTGATTG AGTCGGACATACGGTTACTGCAACATCTTGAATTGTCTTTAATGTTTTACTTCACAATG ATCTATTTCAGTAAATCTTTTGGGACCACCAAAAAAAAAAAAAAAAAAAAAAAAA Unfortunately, he doesn't know how to use the sequence analysis tools availabled in the internet since he did not take Bioinformatics before. Could you help him to do the following analysis?

(1) Find its corresponding polypeptide sequence (DNA -> Protein translation).
Ans:用ExPASy tools menu( Sequence analysis tools)中的 (DNA -> Protein) Translate Tool 尋找。
總共找到了六個可能的polypeptide sequence
比較的結果,認為第一個的Met(開始)到stop之間比較長,比較可能是生物體內會製造的protein。

(2) Identify this protein. Is it a new protein? What kind of rat is this protein belong to? (China, Noway, German.....)
Ans:用Direct WU-BLAST submission at EMBNet-CH (Lausanne, Switzerland)查詢,
結果找到sp|P50503|HIP_RAT (HIP) HSC70-INTERACTING PROTEIN與this protein 之identities 為100%!所以此protein is not a new protein.
它的來源是
"RATTUS NORVEGICUS",即Noway Rat.

(3) Report the total number of negatively charged residues and positively charged residues.
Ans:用Sequence analysis tools: ProtParam查詢。
查到結果有許多統計的數值。其中我們所要找的資料如下:
Total number of negatively charged residues (Asp + Glu): 69
Total number of positively charged residues (Arg + Lys): 56

(4) Draw the hydrophobicity map for this protein using Eisenberg hydrophobicity scale with window size 7. The relative weight of the window edges compared to the window center should set to 40%.
Ans:用ProtScale查詢,結果如下:

(5) Please help him to use Prosite scanning tool to find out possible functions or pattern of this protein.
Ans:用 ScanProsite查詢,結果此protein可能有如下五個pattern:

[1] PDOC00001 PS00001  ASN_GLYCOSYLATION
N-glycosylation site

           343-346 NMSK                                                        

[2] PDOC00004 PS00004  CAMP_PHOSPHO_SITE
cAMP- and cGMP-dependent protein kinase phosphorylation site

Number of matches: 3
      1        4-7 RKVS                                                        
      2    152-155 KRAS                                                        
      3    270-273 RRQS                                                        

[3]  PDOC00005 PS00005  PKC_PHOSPHO_SITE
Protein kinase C phosphorylation site

Number of matches: 2
      1      46-48 THK                                                         
      2    361-363 SAK                                                         

[4]  PDOC00006 PS00006  CK2_PHOSPHO_SITE
Casein kinase II phosphorylation site

Number of matches: 5
      1      55-58 TKEE                                                        
      2      63-66 TTED                                                        
      3      74-77 SSEE                                                        
      4      78-81 SDLE                                                        
      5    317-320 SDPE                                                        

[5]  PDOC00008 PS00008  MYRISTYL
N-myristoylation site

Number of matches: 10
      1      86-91 GVIEAD                                                      
      2    274-279 GSQFGS                                                      
      3    278-283 GSFPGG                                                      
      4    282-287 GGFPGG                                                      
      5    286-291 GGMPGN                                                      
      6    287-292 GMPGNF                                                      
      7    290-295 GNFPGG                                                      
      8    294-299 GGMPGM                                                      
      9    298-303 GMGGAM                                                      
     10    301-306 GAMPGM                                                      

(6) Color the protein by the hydrophobicity of the amino acids.
Ans:用Protein Colourer,結果用四種顏色:
藍色:AGILPV
紅色:FYW
綠色:DENQRHSTK
黃色:CM
來區分hydrophobicity不同的部份: