Homework 5 (due on 12/17>

Sequence comparison / Homology search

題目1:Shirley Lee, a student in department of Life Sciences, was bitten by a snake in the backyard of life science building. She was so angry! Thus, she killed the snake and purified several toxins from its venoms. She sequenced one of the toxin, got this sequence:

1 L K C N K L V P L F Y K T C P A G K N I C Y K M F M V A T P
31 K L P V K R G C I D V C P K S S L L V R Y V C C N T D K C N

(1) Is this a new toxin? Please help him to identify this toxin. (2) Can you find proteins that share sequence homology with this toxin? Show them in multiple alignment form. (3) Please predict its secondary structure. (4) Please show its charge distribution.

Ans:Biology Workbenchprotein tool查詢:
先Add protein sequence, 再用BLASTP和Databank比對,結果有三個toxin和此sequence只有四個不同,其Identities達93%,分別是: 1.cardiotoxin - Naja naja, 2.cytotoxin 3 - Chinese cobra , 3.cytotoxin 10 - monocled cobra.
因為沒有完全相同的sequence,所以這應該是 a new toxin 吧。

(2)more proteins that share sequence homology with this toxin:
  (a) 由上題比對的結果找到了數十個proteins,我選出其中的八個,用MSA - Multiple Sequence Alignment 來比對其相似性。
 (b)再用MSASHADE-Color-coded Plots of Pre-Aligned Sequences將比對的結果用不同顏色表現出來:

(3)用CHOFAS-Predict secondary protein structure (Chou-Fasman plot)predict its secondary structure,結果如下:

60 aa


                .         .         .         .         .         .
       LKCNKLVPLFYKTCPAGKNICYKMFMVATPKLPVKRGCIDVCPKSSLLVRYVCCNTDKCN
 helix <--------->        <------------->       <-------->         
 sheet     EEEEEEEEE      EEEEEEEEE         EEEE     EEEEEEEEE     
 turns                  T           T     T       TT               
       

 Residue totals: H: 36   E: 31   T:  5
        percent: H: 60.0 E: 51.7 T:  8.3

(4)Charge distribution:SAPS-Statistical Analysis of Protein Sequences,結果如下:
LKCNKLVPLF YKTCPAGKNI CYKMFMVATP KLPVKRGCID VCPKSSLLVR YVCCNTDKCN
0+00+00000 0+00000+00 00+0000000 +000++000- 000+00000+ 000000-+00

A. CHARGE CLUSTERS.
Positive charge clusters (cmin = 13/30 or 18/45 or 22/60):  none
Negative charge clusters:  not evaluated (frequency of - < 5%, too low)
Mixed charge clusters (cmin = 15/30 or 20/45 or 25/60):  none

B. HIGH SCORING (UN)CHARGED SEGMENTS.

a.High scoring positive charge segments:

   2.00 (KR)   1.00 (H)   0.00 (BZX)  -1.00 (LAGSVTIPNFQYMCW)  -2.00 (ED)

 Expected score/letter:  -0.483
 - now scoring for positive charge segments;    Average information/letter:   0.430
 Minimal length of displayed segments set to:  20

M_0.01= 13.07  (cv=  7.77, lambda=  0.52686, k=  0.16371, x=  5.30;
                90% confidence interval for segment length:  23 +-  25)
M_0.05=  9.97  (x=  2.20)

# of segments (>=20 residues) exceeding M_0.05: none

b.High scoring negative charge segments:

score=   2.00 frequency=   0.033  ( ED )
score=   0.00 frequency=   0.000  ( BZX )
score=  -1.00 frequency=   0.783  ( LAGSVTIPNFQYHMCW )
score=  -2.00 frequency=   0.183  ( KR )

 Expected score/letter:  -1.083
 - now scoring for negative charge segments;    Average information/letter:   3.490
 Minimal length of displayed segments set to:  20

M_0.01=  4.94  (cv=  2.54, lambda=  1.61122, k=  0.47671, x=  2.40;
                90% confidence interval for segment length:   3 +-   3)
M_0.05=  3.92  (x=  1.38)

# of segments (>=20 residues) exceeding M_0.05: none

c.High scoring mixed charge segments:

score=   1.00 frequency=   0.217  ( KEDR )
score=   0.00 frequency=   0.000  ( HBZX )
score=  -1.00 frequency=   0.783  ( LAGSVTIPNFQYMCW )

 Expected score/letter:  -0.567
 - now scoring for mixed charge segments;    Average information/letter:   1.051
 Minimal length of displayed segments set to:  20

M_0.01=  6.07  (cv=  3.19, lambda=  1.28520, k=  0.40993, x=  2.89;
                90% confidence interval for segment length:  11 +-   9)
M_0.05=  4.80  (x=  1.62)

# of segments (>=20 residues) exceeding M_0.05: none
4gh scoring uncharged segments:

score=   1.00 frequency=   0.783  ( LAGSVTIPNFQYMCW )
score=   0.00 frequency=   0.000  ( BZX )
score=  -2.00 frequency=   0.000  ( H )
score=  -8.00 frequency=   0.217  ( KEDR )

 Expected score/letter:  -0.950
 - now scoring for uncharged segments;    Average information/letter:   0.173
 Minimal length of displayed segments set to:  20

M_0.01= 32.53  (cv= 20.56, lambda=  0.19916, k=  0.10900, x= 11.97;
                90% confidence interval for segment length:  54 +-  44)
M_0.05= 24.34  (x=  3.79)

# of segments (>=20 residues) exceeding M_0.05: none.

C. CHARGE RUNS AND PATTERNS.

pattern  (+)|  (-)|  (*)|  (0)| (+0)| (-0)| (*0)|(+00)|(-00)|(*00)|
lmin0     5 |   3 |   6 |  29 |  10 |   6 |  10 |  11 |   6 |  12 | 
lmin1     6 |   4 |   7 |  36 |  12 |   7 |  12 |  14 |   8 |  14 | 
lmin2     7 |   4 |   8 |  39 |  13 |   8 |  14 |  15 |   9 |  16 | 

There are no charge runs or patterns exceeding the given minimal lengths.

Run count statistics:

  +  runs >=   3:   0
  -  runs >=   3:   0
  *  runs >=   4:   0
  0  runs >=  20:   0