HOMEWORK#8
Sequence analysis / model building
- due on 5/28/98
TGGATGCCATGTTCCGGAGGTAATATGAAGAAATCAATATTATTTATTTTTCTTT
CTGTATTGTCTTTTTCACCTTTCGCTCAGGATGCTAAACCAGTAGAGTCTTCAAA
AGAAAAAATCACACTAGAATCAAAAAAATGTAACATTGCAAAAAAAAGTAATA
AAAGTGGTCCTGAAAGCATGAATAGTAGCAATTACTGCTGTGAATTGTGTTGTA
ATCCTGCTTGTACCGGGTGCTATTAATAATATAAAGGGAACTAAACAGTTCCCT
TTATATTTGTTCTGATTCTGATGATGTCTGTAACGTATGTCCTGTTGCTTTGTTG
AATAAATCGA
Unfortunately, he doesn't know how to use the sequence analysis tools availabled in the internet since he did not take Bioinformatics course before. Could you help him to do the following analysis?
Answer :
SEQUENCE : 72 AA; 7909 MW;
MKKSILFIFL SVLSFSPFAQ DAKPVESSKE KITLESKKCN IAKKSNKSGP ESMNSSNYCC ELCCNPACTG CY
gb|M29255|ECOTOXHS
E.coli heat-stable toxin (st) gene, complete cds. Length = 336
*Plus Strand HSPs:
Score = 1380 (381.3 bits), Expect = 1.0e-104, P = 1.0e-104 Identities = 286/336 (85%),
Positives = 286/336 (85%), Strand = Plus / Plus
*Minus Strand HSPs:
Score = 134 (37.0 bits), Expect = 0.46, P = 0.37 Identities = 30/34 (88%), Positives = 30/34 (88%), Strand = Minus / Plus
*It's a new protein
Total number of negatively charged residues (Asp + Glu): 6
Total number of positively charged residues (Arg + Lys): 10
D(Asp), E(Glu)
K(Lys), R(Arg)
MKKSILFIFL SVLSFSPFAQ DAKPVESSKE KITLESKKCN IAKKSNKSGP ESMNSSNYCC ELCCNPACTG CY
Using the scale Hphob. / Eisenberg et al., the individual values for the 20 amino acids are:
Ala: 0.620 , Arg: -2.530 , Asn: -0.780 , Asp: -0.900
Cys: 0.290 , Gln: -0.850 , Glu: -0.740 , Gly: 0.480
His: -0.400 , Ile: 1.380 , Leu: 1.060 , Lys: -1.500
Met: 0.640 , Phe: 1.190 , Pro: 0.120 , Ser: -0.180
Thr: -0.050 , Trp: 0.810 , Tyr: 0.260 , Val: 1.080
Asx: -0.840 , Glx: -0.795 , Xaa: -0.000
Weights for window positions 1,..,7, using linear weight variation model :
1 | 2 | 3 | 4 | 5 | 6 | 7 |
0.40 | 0.60 | 0.80 | 1.00 | 0.80 | 0.60 | 0.40 |
edge | center | edge |
ScanProsite - Protein against PROSITE
MKKSILFIFL SVLSFSPFAQ DAKPVESSKE KITLESKKCN IAKKSNKSGP ESMNSSNYCC ELCCNPACTG CY
[1] PDOC00001 PS00001
ASN_GLYCOSYLATION N - glycosylation site
Number of matches: 2
-Consensus pattern: N-{P}-[ST]-{P} [N is the glycosylation site]
[2] PDOC00005 PS00005
PKC_PHOSPHO_SITE Protein kinase C - phosphorylation site
Number of matches: 3
-Consensus pattern: [ST]-x-[RK] [S or T is the phosphorylation site]
[3] PDOC00006 PS00006
CK2_PHOSPHO_SITE Casein kinase II - phosphorylation site
Number of matches: 2
-Consensus pattern: [ST]-x(2)-[DE] [S or T is the phosphorylation site]
-Note: this pattern is found in most of the known physiological substrates.
[4] PDOC00246 PS00273
ENTEROTOXIN_H_STABLE Heat-stable enterotoxins signature
-Consensus pattern: C-C-x(2)-C-C-x-P-A-C-x-G-C [The six C's are involved in disulfide bonds]