HOMEWORK #4                                                                          due on 5/19


Kevin Lyu, a graduate student in our department, got a cDNA that encode a protein from rice. Here is the sequence:

CGGCACGAGCAGCAACTTAACTTGATCTTCGTGTGACCGATCGATGGCTCGCGCGGGACATAATAAGTAT
GTGGCGCGCGTGATGGTGGTGGCGCTGCTGTTGGCCGCGCGGGCACCCGTGACATGCGGGCAGGTGGTGA
GCACTTGGGCGCCGTGCATCATGTACGCCGACGGGGAGGGTGTCGCCCCCACCGGCGGCTGCTGCGACGG
GGTCAGGACCCTCAACTCCGCCGCCGCCACCACCGCCGACCGCCAGACCACCTGCGCCTGCCTCAAGCAG
CAGACCAAGGCCATGGGCCGGCTGAGGCCCGACCACGTCGCCGGCATCCCCTCCAAGTGCGGGGTCAACA
TCCCCTACGCTATCAGCCCTTCCACCGACTGCTCCAGGGTGCACTGAGTGGATCAACGTCAAGTGATGCC
ACAATAATAATGGAGAGATGGATCCATCGATCTGCGGCTCTCATTTTGCGGTTGCTATCTGCAATATTCG
TCGTCGTCGGAGAGATCGAGCTAGAAATGCATGTTACTCCTCCGTTCTGTTACTATCTGCTTACCTGTTG
CTTCGTGCGGTTTGATAGTGTCGTTATAGCTAGTGTAAGAGTGTGAGGGTTGATTTTGATCTGTCTCCTT
TACGGGACGAGGGGCACGGCGAATCATGCATGAATCTTAGAGGACCTGCTTGCATTGTACCTTACTCAGT
GCATGCTTCAATATATATCCATCAAATGAAGATCTTTTAATGAAAAAAAAAAAAAAAAAAAAAAAA

5'3' Frame 1
R H E Q Q L N L I F V Stop P I D G S R G T Stop Stop V C G A R D G G G A A V G R A G T R D Met R A G G E H
L G A V H H V R R R G G C R P H R R L L R R G Q D P Q L R R R H H R R P P D H L R L P Q A A D Q G H G P A
E A R P R R R H P L Q V R G Q H P L R Y Q P F H R L L Q G A L S G S T S S D A T I I Met E R W I H R S A A L I
L R L L S A I F V V V G E I E L E Met H V T P P F C Y Y L L T C C F V R F D S V V I A S V R V Stop G L I L I C
L L Y G T R G T A N H A Stop I L E D L L A L Y L T Q C Met L Q Y I S I K Stop R S F N E K K K K K K K
5'3' Frame 2
G T S S N L T Stop S S C D R S Met A R A G H N K Y V A R V Met V V A L L L A A R A P V T C G Q V V S T W
A P C I Met Y A D G E G V A P T G G C C D G V R T L N S A A A T T A D R Q T T C A C L K Q Q T K A Met G
R L R P D H V A G I P S K C G V N I P Y A I S P S T D C S R V H Stop V D Q R Q V Met P Q Stop Stop W R D G
S I D L R L S F C G C Y L Q Y S S S S E R S S Stop K C Met L L L R S V T I C L P V A S C G L I V S L Stop L V
Stop E C E G Stop F Stop S V S F T G R G A R R I Met H E S Stop R T C L H C T L L S A C F N I Y P S N E D L L
Met K K K K K K K K
5'3' Frame 3
A R A A T Stop L D L R V T D R W L A R D I I S Met W R A Stop W W W R C C W P R G H P Stop H A G R W
Stop A L G R R A S C T P T G R V S P P P A A A A T G S G P S T P P P P P P P T A R P P A P A S S S R P R P W
A G Stop G P T T S P A S P P S A G S T S P T L S A L P P T A P G C T E W I N V K Stop C H N N N G E Met D P
S I C G S H F A V A I C N I R R R R R D R A R N A C Y S S V L L L S A Y L L L R A V Stop Stop C R Y S Stop C
K S V R V D F D L S P L R D E G H G E S C Met N L R G P A C I V P Y S V H A S I Y I H Q Met K I F Stop Stop
K K K K K K K
3'5' Frame 1
F F F F F F F F H Stop K I F I Stop W I Y I E A C T E Stop G T Met Q A G P L R F Met H D S P C P S S R K G D R
S K S T L T L L H Stop L Stop R H Y Q T A R S N R Stop A D S N R T E E Stop H A F L A R S L R R R R I L Q I A
T A K Stop E P Q I D G S I S P L L L W H H L T L I H S V H P G A V G G R A D S V G D V D P A L G G D A G D
V V G P Q P A H G L G L L L E A G A G G L A V G G G G G G G V E G P D P V A A A A G G G D T L P V G V H
D A R R P S A H H L P A C H G C P R G Q Q Q R H H H H A R H I L I Met S R A S H R S V T R R S S Stop V A A
R A
3'5' Frame 2
F F F F F F F F I K R S S F D G Y I L K H A L S K V Q C K Q V L Stop D S C Met I R R A P R P V K E T D Q N Q
P S H S Y T S Y N D T I K P H E A T G K Q I V T E R R S N Met H F Stop L D L S D D D E Y C R Stop Q P Q N E
S R R S Met D P S L H Y Y C G I T Stop R Stop S T Q C T L E Q S V E G L I A Stop G Met L T P H L E G Met P A
T W S G L S R P Met A L V C C L R Q A Q V V W R S A V V A A A E L R V L T P S Q Q P P V G A T P S P S A Y
Met Met H G A Q V L T T C P H V T G A R A A N S S A T T I T R A T Y L L C P A R A I D R S H E D Q V K L L LV P
3'5' Frame 3
F F F F F F F S L K D L H L Met D I Y Stop S Met H Stop V R Y N A S R S S K I H A Stop F A V P L V P Stop R R
Q I K I N P H T L T L A I T T L S N R T K Q Q V S R Stop Stop Q N G G V T C I S S S I S P T T T N I A D S N R K
Met R A A D R W I H L S I I I V A S L D V D P L S A P W S S R W K G Stop Stop R R G C Stop P R T W R G C R
R R G R A S A G P W P W S A A Stop G R R R W S G G R R W W R R R S Stop G S Stop P R R S S R R W G R H PP R R R T Stop C T A P K C S P P A R Met S R V P A R P T A A P P P S R A P H T Y Y V P R E P S I G H T K I K
L S C C S C

But the most possible sequence is:
ARAGHNKYVARVMVVALLLAARAPVTCGQVVSTWAPCIMYADGEGVA
PTGGCCDGVRTLNSAAATTADRQTTCACLKQQTKAMGRLRPDHVAGIP
SKCGVNIPYAISPSTDCSRVH

             Oryza sativa lipid transfer protein
 
          May be a Transmembrane dormain or Substrate binding site.
 
 N-glycosylation site     5-8       NLTS
 N-myristoylation site     1-6       GTSSNL
  43-47     GQVVST
  59-64     GVAPTG
  65-70     GCCDGV
185-190   GLIVSL
 Casein kinase II phosphorylation site     8-11     SSCD
  79-82     TTAD
160-163   SSSE
231-234   SNED
 Protein kinase C phosphorylation site 162-164   SER
165-167   SSK
202-204   TGR
 Plant lipid transfer proteins signature 108-129   IPSKCGVNIPYAISPSTDCSRV