Bioformatics Homework 6

  • David Liu, same guy from last homework, he also isolated a cDNA clone from snake venom library. The nucleotide sequence of this cDNA clone is shown here:

    AAAACCATCAAATACGTTATGCTGGAATGCAACGAACTGATCCCGCTGTTCTACGAAACCT

    GCCCGGCTGGTGAAAACATCTGCTACGAAATGTTCATGGTTGCTACCCCGAAAGTTCCGTGC

    GAACGTGGTTGCATCGACGTTTGCCCGGAATCTTCTCTGATCGTTAAATACGTTTGCTGCAA

    CACCGACCGTTGCCAGTAATCCAGCGCCTGATCTCTCGAAATAAAAGCCGCATTG

    (1) Please help him to analyze the cutting patterns of following restriction enzyme:

    (2) Please help him to find its corresponding polypeptide sequence.

    (3) Please help him to identify this toxin. Is it a new toxin?

    (4) David would like to see its structure. Could you help him to find structure of this toxin or make a model if it is a new protein? Show structure on your homwpage ( 3 different views).


    Solution:

    (1)The cutting patterns of EcoRI, Sau3AI, HinfI, and MseI:

          EcoRI    0
          Sau3AI    3    
            Sau3AI      'GATC_ - 3 Cut(s)
                             39    162    215
          HinfI    1
            HinfI       G'AnT_C - 1 Cut(s)
                           152
          MseI    1
            MseI        T'TA_A - 1 Cut(s)
                           168
    
    Restriction Map of Sequence
                                               
                                                   Sau3AI 
                                                   \               
          1   aaaaccatcaaatacgttatgctggaatgcaacgaactgatcccgctgttctacgaaacc     60
              ttttggtagtttatgcaatacgaccttacgttgcttgactagggcgacaagatgctttgg
                  ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
              K  T  I  K  Y  V  M  L  E  C  N  E  L  I  P  L  F  Y  E  T
    
         61   tgcccggctggtgaaaacatctgctacgaaatgttcatggttgctaccccgaaagttccg    120
              acgggccgaccacttttgtagacgatgctttacaagtaccaacgatggggctttcaaggc
                  ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
              C  P  A  G  E  N  I  C  Y  E  M  F  M  V  A  T  P  K  V  P
    
                                                            MseI
                                                      Sau3AI                 
                                            HinfI                              
                                            \         \     \
        121   tgcgaacgtggttgcatcgacgtttgcccggaatcttctctgatcgttaaatacgtttgc    180
              acgcttgcaccaacgtagctgcaaacgggccttagaagagactagcaatttatgcaaacg
                  ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
              C  E  R  G  C  I  D  V  C  P  E  S  S  L  I  V  K  Y  V  C
    
                                               Sau3AI
                                               \                
        181   tgcaacaccgaccgttgccagtaatccagcgcctgatctctcgaaataaaagccgcatt     240
              acgttgtggctggcaacggtcattaggtcgcggactagagagctttattttcggcgtaa
                  ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
              C  N  T  D  R  C  Q  Z  S  S  A  Z  S  L  E  I  K  A  A
    
    Ladder Map of Restriction Enzyme Cut Sites
    
    
                        20          40          60          80         100         120         140         160         180         200         220         240
                         :           :           :           :           :           :           :           :           :           :           :           :
        HinfI ------------------------------------------------------------------------------------------\-----------------------------------------------------
         MseI ----------------------------------------------------------------------------------------------------\-------------------------------------------
       Sau3AI ----------------------\-------------------------------------------------------------------------\-------------------------------\---------------
    

    (2)The corresponding polypeptide sequence:

    MLECNELIPLFYETCPAGENICYEMFMVATPKVPCERGCIDVCPESSLIV KYVCCNTDRCQ*


    (3)It is a new toxin.

    Since we can not find a protein which is of 100% identities with the toxin in the protein database
    (the highest is of 75% identities), David Liu must have found a new toxin.

    Database:  Non-redundant GenBank+EMBL+DDBJ+PDB sequences
               278,785 sequences; 413,172,879 total letters.
    
                                                                         Smallest
                                                                           Sum
                                                                  High  Probability
    Sequences producing High-scoring Segment Pairs:              Score  P(N)      N
    
    gb|L04640|NAJCRDTXN  Malayan spitting cobra cardiotoxin m...   510  2.9e-35   1
    gb|U58486|NAU58486   Naja atra cardiotoxin 3' mRNA, compl...   382  9.5e-23   1
    gb|U42585|NAU42585   Naja atra cardiotoxin III mRNA, comp...   364  4.0e-21   1
    
    gb|L04640|NAJCRDTXN Malayan spitting cobra cardiotoxin mRNA, complete cds.
                Length = 373 
    
      Plus Strand HSPs:
    
     Score = 510 (140.9 bits), Expect = 2.9e-35, P = 2.9e-35
     Identities = 138/183 (75%),Positives = 138/183 (75%), Strand = Plus / Plus
    
    Query:    22 CTGGAATGCAACGAACTGATCCCGCTGTTCTACGAAACCTGCCCGGCTGGTGAAAACATC 81
                 ||| | |||||| | ||| | || || |||||| |||| || ||||| ||  ||||| |
    Sbjct:    98 CTGAAGTGCAACAAGCTGGTGCCCCTTTTCTACAAAACTTGTCCGGCCGGCAAAAACCTT 157
    
    Query:    82 TGCTACGAAATGTTCATGGTTGCTACCCCGAAAGTTCCGTGCGAACGTGGTTGCATCGAC 141
                 |||||| |||||||||||||||| |||||||| |||||   | |||| || ||||| |||
    Sbjct:   158 TGCTACAAAATGTTCATGGTTGCCACCCCGAAGGTTCCTGTCAAACGCGGGTGCATTGAC 217
    
    Query:   142 GTTTGCCCGGAATCTTCTCTGATCGTTAAATACGTTTGCTGCAACACCGACCGTTGCCAG 201
                 || |||||  |    || ||  | || ||||| || ||||||||||||||| | ||| |
    Sbjct:   218 GTATGCCCCAAGAGCTCGCTCCTGGTGAAATATGTGTGCTGCAACACCGACAGGTGCAAC 277
    
    Query:   202 TAA 204
                 | |
    Sbjct:   278 TGA 280
    
    
    
    gb|U58486|NAU58486 Naja atra cardiotoxin 3' mRNA, complete cds
                Length = 472
    
      Plus Strand HSPs:
    
     Score = 382 (105.6 bits), Expect = 9.5e-23, P = 9.5e-23
     Identities = 122/179 (68%),Positives = 122/179 (68%), Strand = Plus / Plus
    
    Query:    26 AATGCAACGAACTGATCCCGCTGTTCTACGAAACCTGCCCGGCTGGTGAAAACATCTGCT 85
                 |||||||| ||||  | ||  | |||||  | || || || || ||  | ||| | ||||
    Sbjct:    68 AATGCAACAAACTCGTTCCTTTATTCTATAAGACTTGTCCAGCAGGGAAGAACTTATGCT 127
    
    Query:    86 ACGAAATGTTCATGGTTGCTACCCCGAAAGTTCCGTGCGAACGTGGTTGCATCGACGTTT 145
                 |  ||||||||||||| || || || || |||||   | || | || || ||||| ||||
    Sbjct:   128 ATAAAATGTTCATGGTGGCGACGCCAAAGGTTCCTGTCAAAAGGGGATGTATCGATGTTT 187
    
    Query:   146 GCCCGGAATCTTCTCTGATCGTTAAATACGTTTGCTGCAACACCGACCGTTGCCAGTAA 204
                 ||||  ||     |||  | || || || || || ||||| || ||| | ||| | | |
    Sbjct:   188 GCCCTAAAAGCAGTCTCCTAGTGAAGTATGTGTGTTGCAATACAGACAGATGCAACTGA 246
    
    
    
    gb|U42585|NAU42585 Naja atra cardiotoxin III mRNA, complete cds.
                Length = 474
    
      Plus Strand HSPs:
    
     Score = 364 (100.6 bits), Expect = 4.0e-21, P = 4.0e-21
     Identities = 120/179 (67%), Positives = 120/179 (67%), Strand = Plus / Plus
    
    Query:    26 AATGCAACGAACTGATCCCGCTGTTCTACGAAACCTGCCCGGCTGGTGAAAACATCTGCT 85
                 |||| ||| ||||  | ||  | |||||  | || || || || ||  | ||| | ||||
    Sbjct:    68 AATGTAACAAACTCGTTCCTTTATTCTATAAGACTTGTCCAGCAGGGAAGAACTTATGCT 127
    
    Query:    86 ACGAAATGTTCATGGTTGCTACCCCGAAAGTTCCGTGCGAACGTGGTTGCATCGACGTTT 145
                 |  ||||||||||||| || || || || |||||   | || | || || || || ||||
    Sbjct:   128 ATAAAATGTTCATGGTGGCGACGCCAAAGGTTCCTGTCAAAAGGGGATGTATTGATGTTT 187
    
    Query:   146 GCCCGGAATCTTCTCTGATCGTTAAATACGTTTGCTGCAACACCGACCGTTGCCAGTAA 204
                 ||||  ||     |||  | || || || || || ||||| || ||| | ||| | | |
    Sbjct:   188 GCCCTAAAAGCAGTCTCCTAGTGAAGTATGTGTGTTGCAATACAGACAGATGCAACTGA 246
    

    (4)Structure:
    			Number of H-Bonds --- 21
    			Number of Helices --- 0
    			Number of Strands --- 5
    			Number of Turns ----- 7
         Script:
              select all
              wireframe off
              cartoons on
              color group
      beta sheets : yellow [255,255,0]
      turns : pale blue [96,128,255]
      all other residues : white
    
      hbonds forming sheets : yellow
                     turns  : magenta 
    
    
         Script:
              select all
              wireframe off
              ribbon on
              color structure
              set specpower 20
              set specular on
              hbond on
              color hbond type
    ASP, GL	bright red   [230,10,10]
    LYS, ARG	blue	[20,90,255]
    CYS, MET	yellow	[230,230,0]
    SER, THR	orange	[250,150,0]
    PHE, TYR	mid blue  [50,50,170]
    ASN, GLN	cyan	[230,230,0]
    GLY	light grey	[235,235,235]
    LEU, VAL, ILE	green	[15,130,15]
    ALA	dark grey	[200,200,200]
    TRP	pink	[180,90,180]
    HIS	pale blue	[130,130,210]
    PRO	flesh	[220,150,130]
    
         Script:
              (Display Sticks)
              color amino
              (Turn the molecule)