Bioinformatics HW#7

Jason Lee, a student in the department of Life Sciences, got the mRNA from a plant. Here is the sequence:

   1 atccactact tcatcataaa cctcacaact actattctat cttctcttct ctaattttca
  61 taatcattaa gaatggaaat ggttaacaag attgcatgct ttgtgctttt atgcatggta
 121 gtggttgcac cccatgcaga ggcactaact tgtggtcaag ttacatctac cttggctcct
 181 tgtctccctt atctaatgaa tcgcggtcct ctcggaggct gttgtggtgg tgttaagggt
 241 cttttgggtc aagcccagac tacagtagac cgacagaccg catgcacttg cctaaaatca
 301 gctgcttctt cttttacagg ccttgatttg ggcaaagctg ctagtctccc tagcacttgt
 361 agtgtcaaca tcccttacaa gatcagcccc tctactgact gctctaaagt tcagtaaagc
 421 tgatcatcag aatttggttt catgaggaga attaagaata agatagatag cattgatctt
 481 gcttatggat cctttctttc tatgttgtat cagttgtcac tttctgtttt ttctgtgttt
 541 cctttaaatt ctcgtatgta gtcgagtctt gtatcgaaat ttgacgattg attatattgt
 601 atcagttgtt actttctgtt ttcctgtgtt tcttttaaaa tcgtatgtag tcgagtcttg
 661 tatcgaaatt tcccgattgg ctatgttgta ttaatctaat ctttgataat acacatctat
 721 cttatttggt 


1. Please help him to translate the DNA sequence to protein sequence.

Here's the most impossible protein sequence among 6 possible OMFs:
MEMVNKIACF VLLCMVVVAP HAEALTCGQV TSTLAPCLPY LMNRGPLGGC CGGVKGLLGQ
AQTTVDRQTA CTCLKSAASS FTGLDLGKAA SLPSTCSVNI PYKISPSTDC SKVQ


2. Please help him to identify the complete cds of this gene. Use the graphic view to explain all the features of this gene.

The gene is: LPU66466 Lycopersicon pennellii lipid transfer protein 2 (LpLTP2) gene (1854 bp), NID=01519356
Complete cds:
    1 gtaatccagc taagaacgtc agaagtaaaa caaacttgtc gtaaaatatt taatttgaag
   61 ttgtatttaa atcttaatta ttttttttta aagctatact cacatcattt caattattct
  121 ttttgtaaaa gtatctctag agcttcataa tttttttttt aaaaatcttc gatcaaactg
  181 ttagagtagg taaaagtctc acattgatgg ggaaatagac tgattatttg cttataagga
  241 tgtggacaat actcctctca tataatagca tttaagatta aattagacct aaataacata
  301 ttttagcatg atattagagt tatattcatt cttgtttgaa cttccgatcc acatctcaat
  361 tggatctaca taaaaaaggg atattaaagt aagtaaaagc cctacattaa tcgaggaatc
  421 tacttatacg aactttggtg ataaaaaaaa agactcctac acgtaagatg ttagaactag
  481 ctaccacatg actttagagc cagcataata atgtacacca tcaaaatgct ttaaattttc
  541 aacctaacaa ccaactacct ctctcactcc tccattggcc atctactcca aatttccctc
  601 tataaaaaca ctcaaccaaa acacatttct tctcatccac tacttcatca taaacctcac
  661 aactactatt ctatcttctc ttctctaatt ttcataatca ttaagaatgg aaatggttaa
  721 caagattgca tgctttgtgc ttttatgcat ggtagtggtt gcaccccatg cagaggcact
  781 aacttgtggt caagttacat ctaccttggc tccttgtctc ccttatctaa tgaatcgcgg
  841 tcctctcgga ggctgttgtg gtggtgttaa gggtcttttg ggtcaagccc agactacagt
  901 agaccgacag accgcatgca cttgcctaaa atcagctgct tcttctttta caggccttga
  961 tttgggcaaa gctgctagtc tccctagcac ttgtagtgtc aacatccctt acaagatcag
 1021 cccctctact gactgctcta agtatgttaa tttttcatct tttttgacct ataacaacac
 1081 ctaactcttc gtattaatcc tagtacgaaa aataaagtaa caaaaaaatg atatgtgcta
 1141 gcacattgtc acaatatgac atgcaagtgt gtttggtttt ctcaaaaaat aagtggattt
 1201 tttatttata ttttagtgtt aagaaatatt agtttaaaaa tatttatata tgtaattata
 1261 aagaaaaaag atactattat agttagtaca ttatgttttt gttatcatta tcattattat
 1321 tattattaat gttggttttg ttcattgtta atgcagagtt cagtaaagct gatcatcaga
 1381 atttggtttc atgaggagaa ttaagaataa gatagatagc attgatcttg cttatggatc
 1441 ctttctttct atgttgtatc agttgtcact ttctgttttt tctgtgtttc ctttaaattc
 1501 tcgtatgtag tcgagtcttg tatcgaaatt tgacgattga ttatattgta tcagttgtta
 1561 ctttctgttt tcctgtgttt cttttaaaat cgtatgtagt cgagtcttgt atcgaaattt
 1621 cccgattggc tatgttgtat taatctaatc tttgataata cacatctatc ttatttggta
 1681 tatgtactct ctcgtctatt caatattttt ggtctacttt tactagggtt tttttaatat
 1741 gcattacaca tatatatcaa attcgagtaa tatatagtat acgctattgt gtgctcattc
 1801 atctaggtac ctcctttttc taaccacttc ttacacgtac aatgctaatt attg
The features of the gene:



3.After translation of this gene, please help him to do the protein sequence analysis (including pI, mol. wt., secondary structure prediction, hydrophobic profile, homology search, prosite scanning..........)

Molecular weight: 11715.8

Theoretical pI: 8.36

Second structure prediction:

(from BCM Protein Secondary Structure Search: SSPAL / Nearest-neighbor with local alignments SS prediction) Length=114 10 20 30 40 50 PredSS aaaaaaaabbbb bbbbb AA seq MEMVNKIACFVLLCMVVVAPHAEALTCGQVTSTLAPCLPYLMNRGPLGGC ProbA 56555543333332111111122221111111222122244421101000 ProbB 22321123566653666521011233344333332211122210111111 60 70 80 90 100 PredSS bb bbb bbbb AA seq CGGVKGLLGQAQTTVDRQTACTCLKSAASSFTGLDLGKAASLPSTCSVNI ProbA 11022323211111112111122222322111111111010000111000 ProbB 11123232111235532335553431111233353100111111244554 110 PredSS bb AA seq PYKISPSTDCSKVQ ProbA 00000000111222 ProbB 45543201221235

Hydrophobic profile:

(from ExPASy ProtScale tool) Using the scale Hphob. / Kyte & Doolittle, the individual values for the 20 amino acids are: Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200 Asx: -3.500 Glx: -3.500 Xaa: -0.490 Weights for window positions 1,..,9, using linear weight variation model: 1 2 3 4 5 6 7 8 9 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 edge center edge MIN: -1.54444444444444 MAX: 3.51111111111111

Homology searching:

(from BLAST Search) Sequences producing significant alignments: score(bits) E-Value ------------------------------------------------------------------------------- gb|U66466|LPU66466 Lycopersicon pennellii lipid transfer pr... 807 0.0 gb|U66465|LPU66465 Lycopersicon pennellii lipid transfer pr... 262 1e-67 emb|X56040|LETSW12 L.esculentum TSW12 mRNA 248 2e-63 dbj|D13952|TOBLTP Nicotiana tabacum mRNA for lipid transfer... 165 2e-38 gb|U81996|LEU81996 Lycopersicon esculentum non specific lip... 153 8e-35 emb|X62395|NTLTP1 N.tabacum ltp1 gene for lipid transferase 101 3e-19 gb|AF044204|AF044204 Gossypium hirsutum cultivar Siokra 1-2... 56 1e-05 gb|S78173|S78173 LTP=lipid transfer protein {clone GH3} [Go... 56 1e-05 gb|U15153|GHU15153 Gossypium hirsutum nonspecific lipid tra... 56 1e-05 emb|X92648|HALTP H.annuus mRNA for non-specific lipid-trans... 48 0.003 gb|AF118131|AF118131 Capsicum annuum lipid transfer protein... 44 0.054 gb|U64874|GHU64874 Gossypium hirsutum lipid transfer protei... 44 0.054 gb|AF031649|AF031649 Arabidopsis thaliana neutral amino aci... 42 0.22 gb|AF002994|HSAF002994 Homo sapiens cosmids Qc4G10, Qc3C7, ... 42 0.22 emb|AL109612.7|HSJ1018A4 Human DNA sequence from clone 1018... 40 0.85 emb|AJ245873.1|BNA245873 Brassica napus LTP gene for non-sp... 40 0.85 gb|AE001715.1|AE001715 Thermotoga maritima section 27 of 13... 40 0.85 gb|AE001714.1|AE001714 Thermotoga maritima section 26 of 13... 40 0.85 emb|X92748|BVIWF1 B.vulgaris mRNA for IWF1' 40 0.85 gb|U22175|BNU22175 Brassica napus germination-specific lipi... 40 0.85 gb|L33906|BNALTPWC Brassica oleracea lipid transfer protein... 40 0.85 gb|L29767|BNALTP Broccoli lipid transfer protein mRNA, comp... 40 0.85 emb|AL035467.23|HS288M22 Human DNA sequence from clone RP1-... 38 3.4 gb|AF109195.1|AF109195 Hordeum vulgare lipid transfer prote... 38 3.4 gb|AC011362.2|AC011362 Homo sapiens chromosome 5 clone CIT-... 38 3.4 gb|AC005406.2|AC005406 Homo sapiens, complete sequence 38 3.4 gb|AC004049|AC004049 Homo sapiens chromosome 4 clone B203C2... 38 3.4 dbj|AB007893|AB007893 Homo sapiens KIAA0433 mRNA, partial cds 38 3.4 gb|AC003113|AC003113 Arabidopsis thaliana BAC F24O1 chromos... 38 3.4 gb|U90882|HIVU90882 HIV-2 clone D3.6 from Spain, gag protei... 38 3.4 emb|Z80152|HSCAC44 H.sapiens CACNL1A4 gene, exon 44 >gi|477... 38 3.4 emb|X57655|HSHUSIII H.sapiens RNA for acrosin-trypsin inhib... 38 3.4 gb|M91438|HUMHUSII Human kazal-type serine proteinase (HUSI... 38 3.4

Prosite Scanning:

(from ExPASy ScanProsite tool: scan a sequence for the occurrence of PROSITE patterns) 1] Casein kinase II phosphorylation site (PDOC00006 PS00006 CK2_PHOSPHO_SITE) Number of matches: 2 1 63-66 TTVD 2 82-85 TGLD [2] N-myristoylation site (PDOC00008 PS00008 MYRISTYL) Number of matches: 6 1 28-33 GQVTST 2 48-53 GGCCGG 3 49-54 GCCGGV 4 52-57 GGVKGL 5 59-64 GQAQTT 6 83-88 GLDLGK [3] Plant lipid transfer proteins signature (PDOC00516 PS00597 PLANT_LTP) 92-113 LPSTCSVNIPYKISPSTDCSKV