Bioinformatics HW#7
Jason Lee, a student in the department of Life Sciences, got the mRNA from a plant. Here is the sequence:
1 atccactact tcatcataaa cctcacaact actattctat cttctcttct ctaattttca
61 taatcattaa gaatggaaat ggttaacaag attgcatgct ttgtgctttt atgcatggta
121 gtggttgcac cccatgcaga ggcactaact tgtggtcaag ttacatctac cttggctcct
181 tgtctccctt atctaatgaa tcgcggtcct ctcggaggct gttgtggtgg tgttaagggt
241 cttttgggtc aagcccagac tacagtagac cgacagaccg catgcacttg cctaaaatca
301 gctgcttctt cttttacagg ccttgatttg ggcaaagctg ctagtctccc tagcacttgt
361 agtgtcaaca tcccttacaa gatcagcccc tctactgact gctctaaagt tcagtaaagc
421 tgatcatcag aatttggttt catgaggaga attaagaata agatagatag cattgatctt
481 gcttatggat cctttctttc tatgttgtat cagttgtcac tttctgtttt ttctgtgttt
541 cctttaaatt ctcgtatgta gtcgagtctt gtatcgaaat ttgacgattg attatattgt
601 atcagttgtt actttctgtt ttcctgtgtt tcttttaaaa tcgtatgtag tcgagtcttg
661 tatcgaaatt tcccgattgg ctatgttgta ttaatctaat ctttgataat acacatctat
721 cttatttggt
1. Please help him to translate the DNA sequence to protein sequence.
Here's the most impossible protein sequence among 6 possible OMFs:
MEMVNKIACF VLLCMVVVAP HAEALTCGQV TSTLAPCLPY LMNRGPLGGC CGGVKGLLGQ
AQTTVDRQTA CTCLKSAASS FTGLDLGKAA SLPSTCSVNI PYKISPSTDC SKVQ
2. Please help him to identify the complete cds of this gene.
Use the graphic view to explain all the features of this gene.
The gene is:
LPU66466 Lycopersicon pennellii lipid transfer protein 2 (LpLTP2) gene (1854 bp), NID=01519356
Complete cds:
1 gtaatccagc taagaacgtc agaagtaaaa caaacttgtc gtaaaatatt taatttgaag
61 ttgtatttaa atcttaatta ttttttttta aagctatact cacatcattt caattattct
121 ttttgtaaaa gtatctctag agcttcataa tttttttttt aaaaatcttc gatcaaactg
181 ttagagtagg taaaagtctc acattgatgg ggaaatagac tgattatttg cttataagga
241 tgtggacaat actcctctca tataatagca tttaagatta aattagacct aaataacata
301 ttttagcatg atattagagt tatattcatt cttgtttgaa cttccgatcc acatctcaat
361 tggatctaca taaaaaaggg atattaaagt aagtaaaagc cctacattaa tcgaggaatc
421 tacttatacg aactttggtg ataaaaaaaa agactcctac acgtaagatg ttagaactag
481 ctaccacatg actttagagc cagcataata atgtacacca tcaaaatgct ttaaattttc
541 aacctaacaa ccaactacct ctctcactcc tccattggcc atctactcca aatttccctc
601 tataaaaaca ctcaaccaaa acacatttct tctcatccac tacttcatca taaacctcac
661 aactactatt ctatcttctc ttctctaatt ttcataatca ttaagaatgg aaatggttaa
721 caagattgca tgctttgtgc ttttatgcat ggtagtggtt gcaccccatg cagaggcact
781 aacttgtggt caagttacat ctaccttggc tccttgtctc ccttatctaa tgaatcgcgg
841 tcctctcgga ggctgttgtg gtggtgttaa gggtcttttg ggtcaagccc agactacagt
901 agaccgacag accgcatgca cttgcctaaa atcagctgct tcttctttta caggccttga
961 tttgggcaaa gctgctagtc tccctagcac ttgtagtgtc aacatccctt acaagatcag
1021 cccctctact gactgctcta agtatgttaa tttttcatct tttttgacct ataacaacac
1081 ctaactcttc gtattaatcc tagtacgaaa aataaagtaa caaaaaaatg atatgtgcta
1141 gcacattgtc acaatatgac atgcaagtgt gtttggtttt ctcaaaaaat aagtggattt
1201 tttatttata ttttagtgtt aagaaatatt agtttaaaaa tatttatata tgtaattata
1261 aagaaaaaag atactattat agttagtaca ttatgttttt gttatcatta tcattattat
1321 tattattaat gttggttttg ttcattgtta atgcagagtt cagtaaagct gatcatcaga
1381 atttggtttc atgaggagaa ttaagaataa gatagatagc attgatcttg cttatggatc
1441 ctttctttct atgttgtatc agttgtcact ttctgttttt tctgtgtttc ctttaaattc
1501 tcgtatgtag tcgagtcttg tatcgaaatt tgacgattga ttatattgta tcagttgtta
1561 ctttctgttt tcctgtgttt cttttaaaat cgtatgtagt cgagtcttgt atcgaaattt
1621 cccgattggc tatgttgtat taatctaatc tttgataata cacatctatc ttatttggta
1681 tatgtactct ctcgtctatt caatattttt ggtctacttt tactagggtt tttttaatat
1741 gcattacaca tatatatcaa attcgagtaa tatatagtat acgctattgt gtgctcattc
1801 atctaggtac ctcctttttc taaccacttc ttacacgtac aatgctaatt attg
The features of the gene:
3.After translation of this gene, please help him to do the protein sequence analysis (including pI, mol. wt., secondary structure prediction, hydrophobic profile, homology search, prosite scanning..........)
Molecular weight: 11715.8
Theoretical pI: 8.36
Second structure prediction:
(from BCM Protein Secondary Structure Search: SSPAL / Nearest-neighbor with local alignments SS prediction)
Length=114
10 20 30 40 50
PredSS aaaaaaaabbbb bbbbb
AA seq MEMVNKIACFVLLCMVVVAPHAEALTCGQVTSTLAPCLPYLMNRGPLGGC
ProbA 56555543333332111111122221111111222122244421101000
ProbB 22321123566653666521011233344333332211122210111111
60 70 80 90 100
PredSS bb bbb bbbb
AA seq CGGVKGLLGQAQTTVDRQTACTCLKSAASSFTGLDLGKAASLPSTCSVNI
ProbA 11022323211111112111122222322111111111010000111000
ProbB 11123232111235532335553431111233353100111111244554
110
PredSS bb
AA seq PYKISPSTDCSKVQ
ProbA 00000000111222
ProbB 45543201221235
Hydrophobic profile:
(from ExPASy ProtScale tool)
Using the scale Hphob. / Kyte & Doolittle, the individual values for the 20 amino acids are:
Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500
Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900
Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900
Tyr: -1.300 Val: 4.200 Asx: -3.500 Glx: -3.500 Xaa: -0.490
Weights for window positions 1,..,9, using linear weight variation model:
1 2 3 4 5 6 7 8 9
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
edge center edge
MIN: -1.54444444444444
MAX: 3.51111111111111
Homology searching:
(from BLAST Search)
Sequences producing significant alignments: score(bits) E-Value
-------------------------------------------------------------------------------
gb|U66466|LPU66466 Lycopersicon pennellii lipid transfer pr... 807 0.0
gb|U66465|LPU66465 Lycopersicon pennellii lipid transfer pr... 262 1e-67
emb|X56040|LETSW12 L.esculentum TSW12 mRNA 248 2e-63
dbj|D13952|TOBLTP Nicotiana tabacum mRNA for lipid transfer... 165 2e-38
gb|U81996|LEU81996 Lycopersicon esculentum non specific lip... 153 8e-35
emb|X62395|NTLTP1 N.tabacum ltp1 gene for lipid transferase 101 3e-19
gb|AF044204|AF044204 Gossypium hirsutum cultivar Siokra 1-2... 56 1e-05
gb|S78173|S78173 LTP=lipid transfer protein {clone GH3} [Go... 56 1e-05
gb|U15153|GHU15153 Gossypium hirsutum nonspecific lipid tra... 56 1e-05
emb|X92648|HALTP H.annuus mRNA for non-specific lipid-trans... 48 0.003
gb|AF118131|AF118131 Capsicum annuum lipid transfer protein... 44 0.054
gb|U64874|GHU64874 Gossypium hirsutum lipid transfer protei... 44 0.054
gb|AF031649|AF031649 Arabidopsis thaliana neutral amino aci... 42 0.22
gb|AF002994|HSAF002994 Homo sapiens cosmids Qc4G10, Qc3C7, ... 42 0.22
emb|AL109612.7|HSJ1018A4 Human DNA sequence from clone 1018... 40 0.85
emb|AJ245873.1|BNA245873 Brassica napus LTP gene for non-sp... 40 0.85
gb|AE001715.1|AE001715 Thermotoga maritima section 27 of 13... 40 0.85
gb|AE001714.1|AE001714 Thermotoga maritima section 26 of 13... 40 0.85
emb|X92748|BVIWF1 B.vulgaris mRNA for IWF1' 40 0.85
gb|U22175|BNU22175 Brassica napus germination-specific lipi... 40 0.85
gb|L33906|BNALTPWC Brassica oleracea lipid transfer protein... 40 0.85
gb|L29767|BNALTP Broccoli lipid transfer protein mRNA, comp... 40 0.85
emb|AL035467.23|HS288M22 Human DNA sequence from clone RP1-... 38 3.4
gb|AF109195.1|AF109195 Hordeum vulgare lipid transfer prote... 38 3.4
gb|AC011362.2|AC011362 Homo sapiens chromosome 5 clone CIT-... 38 3.4
gb|AC005406.2|AC005406 Homo sapiens, complete sequence 38 3.4
gb|AC004049|AC004049 Homo sapiens chromosome 4 clone B203C2... 38 3.4
dbj|AB007893|AB007893 Homo sapiens KIAA0433 mRNA, partial cds 38 3.4
gb|AC003113|AC003113 Arabidopsis thaliana BAC F24O1 chromos... 38 3.4
gb|U90882|HIVU90882 HIV-2 clone D3.6 from Spain, gag protei... 38 3.4
emb|Z80152|HSCAC44 H.sapiens CACNL1A4 gene, exon 44 >gi|477... 38 3.4
emb|X57655|HSHUSIII H.sapiens RNA for acrosin-trypsin inhib... 38 3.4
gb|M91438|HUMHUSII Human kazal-type serine proteinase (HUSI... 38 3.4
Prosite Scanning:
(from ExPASy ScanProsite tool: scan a sequence for the occurrence of PROSITE patterns)
1] Casein kinase II phosphorylation site (PDOC00006 PS00006 CK2_PHOSPHO_SITE)
Number of matches: 2
1 63-66 TTVD
2 82-85 TGLD
[2] N-myristoylation site (PDOC00008 PS00008 MYRISTYL)
Number of matches: 6
1 28-33 GQVTST
2 48-53 GGCCGG
3 49-54 GCCGGV
4 52-57 GGVKGL
5 59-64 GQAQTT
6 83-88 GLDLGK
[3] Plant lipid transfer proteins signature (PDOC00516 PS00597 PLANT_LTP)
92-113 LPSTCSVNIPYKISPSTDCSKV