About
Identification of the subcellular location of protein has been an area of interest in protein science. It has been extensively studied in past few decades in a computational paradigm. However, most of the methods are focused on single location prediction.
The proposed approach successfully predicts 11 single locations (cell membrane, cell wall, plastid, cytoplasm, endoplasmic reticulum, extracellular, golgi apparatus, mitochondrion, nucleus, peroxisome and vacuole) and three significant multi location proteins (cytoplasm-nucleus, mitochondrion-plastid and cytoplasm-golgi apparatus).
Various sequence derived features based on composition and physicochemical properties such as amino acid composition, pseudo amino acid composition, dipeptide composition and hybrid of these are used to represent the protein.
Here a brief description of the sequences features used by the application.
(AAC): Amino Acid Composition based, it is a 20-vector descriptor with the percentages of each of the aminoacid in the total sequence.
(Dipep): Dipeptide Composition based, it is a 400-vector descriptor with the percentages of each pair of subsequent aminoacids in the total sequence.
(PseAAC): Pseudo Amino Acid Composition based, it is a 30-vector descriptor with the information of AAC plus 10 more descriptors based on hydropholicity and hydrophilicity values of the aminoacids present.
(NCC): N-Center-C terminal Composition based, it is a 60-vector descriptor with the AAC information of each of the three subsequences of the total lenght sequence. N is a subsequence of the 25 first aminoacids, C-terminal is a subsequence of the 25 last aminoacids and C-Center is the rest of the aminoacids.
(CTDC): Composition according of the hydrophobicity, normalized van der Waals volume, polarity, and polarizability attributes. 21-vector descriptor.(protr R package)
(CTDT): Transition according of the hydrophobicity, normalized van der Waals volume, polarity, and polarizability attributes. 21-vector descriptor.(protr R package)
(QSO): Quasi Order Descriptor, normalized occurence for aminoacids. 100-vector descriptor(protr R package)
(PseAACNCCDipep): Hybrid of Pseudo AAC, N-Center-C terminal (3-parts) and Dipeptide Composition based. 490-vector descriptor.
(NCCDipepCTDCCTDTQSO): Hybrid N-Center-C terminal (3-parts), Dipeptide Composition, Composition and Transition (Dubchack) and Quasi Order Descriptor based. 602-vector descriptor.
Author