Bioformatics Homework 5

  • David Liu, a student in department of Life Sciences, got bit by a snake in the backyard of life science building. He was so angry! Thus, he killed the snake and purified several toxins from its venoms. He sequenced one of the toxin, got this sequence:

    1 L K C N K L V P L F Y K T C P A G K N L C Y K M F M V A T P

    31 K V P V K R G C I D V C P K S S L L V K Y V C C N T D R C N

    (1) Is this a new toxin? Please help him to identify this toxin.

    (2) Can you find proteins that share sequence homology with this toxin?

    Show them in multiple alignment form.

    (3) Please predict its secondary structure.

    (4) Please show its charge distribution.


    Solution:

    (1) sp|Q02454|CX1_NAJSP CARDIOTOXIN PRECURSOR. pir||A44335 cardiotoxin -
    ¡@¡@¡@¡@ Naja naja gi|213375 (L04640) cardiotoxin [Naja naja]
    ¡@¡@¡@¡@ Length = 81 ¡@

    Score = 340 (163.9 bits), Expect = 1.0e-44, P = 1.0e-44
    Identities = 60/60 (100%), Positives = 60/60 (100%)

    Query:
    1 LKCNKLVPLFYKTCPAGKNLCYKMFMVATPKVPVKRGCIDVCPKSSLLVKYVCCNTDRCN
    60LKCNKLVPLFYKTCPAGKNLCYKMFMVATPKVPVKRGCIDVCPKSSLLVKYVCCNTDRCN
    Sbjct:
    22 LKCNKLVPLFYKTCPAGKNLCYKMFMVATPKVPVKRGCIDVCPKSSLLVKYVCCNTDRCN 81


    (2)

                                      [Image]
                                    Version 1.4
    ----------------------------------------------------------------------------
    
                            Multi Sequence Align Results
    
    ----------------------------------------------------------------------------
                               CARDIOTOXIN PRECURSOR.
                        CARDIOTOXIN III (NMR, 13 STRUCTURES)
                 CARDIOTOXIN II (NMR, MINIMIZED AVERAGE STRUCTURE)
                       CARDIOTOXIN CTX I (NMR, 11 STRUCTURES)
                CARDIOTOXIN GAMMA (NMR, MINIMIZED AVERAGE STRUCTURE)
                      CARDIOTOXIN CTX IIB (NMR, 20 STRUCTURES)
                       CARDIOTOXIN V=4===/II$== (TOXIN /III$)
                         CARDIOTOXIN V (NMR, 2 STRUCTURES)
    ----------------------------------------------------------------------------
    


    (3)

    GARNIER Results

    Predict secondary structure of protein sequences using the method of Garnier, Osgusthorpe, and Robinson, J. Mol. Biol., (1978) 120:97-120.

    CARDIOTOXIN PRECURSOR.

    Garnier plot of CARDIOTOXIN PRECURSOR.
     81 aa; DCH = 0, DCS = 0
     
               .   10    .   20    .   30    .   40    .   50    .   60
           MKTLLLTTVVVTIVCLDLEYTLKCNKLVPLFYKTCPAGKNLCYKMFMVATPKVPVKRGCI
     helix HHHHHH          HHHHHHH                HHH     HHHH         
     sheet       EEEEEEEEEE         EEEEEEEEE        EEEEE      E E  EE
     turns                        TT          TTTT            T  T TT  
     coil                                    C                 C       
    
               .   70    .   80
           DVCPKSSLLVKYVCCNTDRCN
     helix                      
     sheet EEE    EEEEEEEE      
     turns    TTTT        TTTTTT
     coil                       
    
     Residue totals: H: 20   E: 39   T: 20   C:  2
            percent: H: 30.8 E: 60.0 T: 30.8 C:  3.1
    

    (4) CHARGE DISTRIBUTIONAL ANALYSIS

           1  0+00000000 000000-0-0 00+00+0000 00+00000+0 000+000000 0+000++000 
          61  -000+00000 +000000-+0 0
    
    A. CHARGE CLUSTERS.
    
    
    Positive charge clusters (cmin = 12/30 or 16/45 or 19/60):  none
    
    
    Negative charge clusters:  not evaluated (frequency of - < 5%, too low)
    
    
    Mixed charge clusters (cmin = 14/30 or 19/45 or 24/60):  none
    
    
    B. HIGH SCORING (UN)CHARGED SEGMENTS.
    
    
    ______________________________________
    High scoring positive charge segments:
    
    score=   2.00 frequency=   0.148  ( KR )
    score=   0.00 frequency=   0.000  ( BZX )
    score=  -1.00 frequency=   0.802  ( LAGSVTIPNFQYHMCW )
    score=  -2.00 frequency=   0.049  ( ED )
    
     Expected score/letter:  -0.605
     - now scoring for positive charge segments;    Average information/letter:   0.727
     Minimal length of displayed segments set to:  20
    
    M_0.01= 11.01  (cv=  6.42, lambda=  0.68489, k=  0.23341, x=  4.59;
                    90% confidence interval for segment length:  15 +-  15)
    M_0.05=  8.63  (x=  2.21)
    
    # of segments (>=20 residues) exceeding M_0.05: none
    
    
    ______________________________________
    High scoring negative charge segments:
    
    score=   2.00 frequency=   0.049  ( ED )
    score=   0.00 frequency=   0.000  ( BZX )
    score=  -1.00 frequency=   0.802  ( LAGSVTIPNFQYHMCW )
    score=  -2.00 frequency=   0.148  ( KR )
    
     Expected score/letter: -100.00
     - now scoring for negative charge segments;    Average information/letter:   2.722
     Minimal length of displayed segments set to:  20
    
    M_0.01=  5.91  (cv=  3.17, lambda=  1.38629, k=  0.44580, x=  2.74;
                    90% confidence interval for segment length:   4 +-   4)
    M_0.05=  4.73  (x=  1.56)
    
    # of segments (>=20 residues) exceeding M_0.05: none
    
    
    ___________________________________
    High scoring mixed charge segments:
    
    score=   1.00 frequency=   0.198  ( KEDR )
    score=   0.00 frequency=   0.000  ( BZX )
    score=  -1.00 frequency=   0.802  ( LAGSVTIPNFQYHMCW )
    
     Expected score/letter:  -0.605
     - now scoring for mixed charge segments;    Average information/letter:   1.223
     Minimal length of displayed segments set to:  20
    
    M_0.01=  5.86  (cv=  3.13, lambda=  1.40180, k=  0.45603, x=  2.72;
                    90% confidence interval for segment length:  10 +-   8)
    M_0.05=  4.69  (x=  1.56)
    
    # of segments (>=20 residues) exceeding M_0.05: none
    
    
    ________________________________
    High scoring uncharged segments:
    
    score=   1.00 frequency=   0.802  ( LAGSVTIPNFQYHMCW )
    score=   0.00 frequency=   0.000  ( BZX )
    score=  -8.00 frequency=   0.198  ( KEDR )
    
     Expected score/letter:  -0.778
     - now scoring for uncharged segments;    Average information/letter:   0.128
     Minimal length of displayed segments set to:  20
    
    M_0.01= 38.95  (cv= 26.39, lambda=  0.16653, k=  0.08143, x= 12.56;
                    90% confidence interval for segment length:  73 +-  63)
    M_0.05= 29.16  (x=  2.78)
    
    # of segments (>=20 residues) exceeding M_0.05: none
    
    
    C. CHARGE RUNS AND PATTERNS.
    
    pattern  (+)|  (-)|  (*)|  (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)|
    lmin0     5 |   3 |   5 |  34 |   9 |   6 |  10 |  11 |   7 |  12 | 
    lmin1     6 |   4 |   7 |  41 |  11 |   8 |  12 |  13 |   9 |  15 | 
    lmin2     7 |   5 |   8 |  45 |  12 |   9 |  14 |  15 |  10 |  16 | 
    
    There are no charge runs or patterns exceeding the given minimal lengths.
    
    Run count statistics:
    
      +  runs >=   3:   0
      -  runs >=   3:   0
      *  runs >=   4:   0
      0  runs >=  22:   0