Interolog, can be defined as the process to search homologous of unknown interacting proteins into protein-protein interaction databases. Initially this server will Diamond-Blast your input sequences against the Diamond databases representations of the PPI databases, after that will try to match those hits using SQL.
You can provide two different datasets, named as the Host Dataset and the Pathogen Dataset. The service receives either nucleotides or proteins sequences in FASTA format. You can choose nucleotide sequences for one Dataset and protein sequences for the other(or viceversa), also you can use for both a single type of sequences, however, wherever you choose for one dataset you need to stick to it for the entire dataset. The webserver itself decide based on your sequences which type they are.
If you want to run this for only one dataset, you can deactivate one of the Host or Pathogen panels (by clicking the button next to the Dataset title). Depending of which of the panels you deactivate your search will change. If you decide to choose running as an "Only Host" query, you will search your sequences against column interactor_A of the PPI databases; or if you select the "Only Pathogen" option, you will do it against column interactor_B (See PSI-MITAB standard definition)
Default values for initial blast are evalue of 1e-10, identity of 30% and coverage of 30%. These values will produce a large result set. Consider to increase or decrease those values to adjust it to your particular interest.
The International Molecular Exchange Consortium (IMEX) had the initiative to cluster the largest public interaction data providers. From those we have selected five (HPIDB, MINT, DIP, BioGRID and IntAct) as we think are the most comprehensive. Moreover, four other resources outside IMEX were included, VirHostNet, STRING, VIH-1 NCBI and PHISTO. VirHostNet is one of the most complete resources for Vir-Host interactions. STRING is the largest repository of protein-protein interaction, in the service only the experimental interaction of STRING were included. Dataset used in the benchmark of the PredHPI manuscript were finally included.
Summary of the databases version running on this service:
HPIDB have 19,395 sequences with 62,653 interactions.
MINT have 24,226 sequences with 123,890 interactions.
DIP have 20,532 sequences with 76,881 interactions.
BioGRID have 50,096 sequences with 1,530,395 interactions.
IntAct have 93,869 sequences with 843,123 interactions.
VirHostNet have 8,932 sequences with 34,760 interactions.
STRING have 130,820 sequences with 2,028,129 interactions.
HIV-1 Human Interaction Database have 3,882 sequences with 16,215 interactions.
ArabHPI (PPIN-2, ATIN) have 659 sequences with 982 interactions.
PHISTO have 12,747 sequences with 62,347 interactions.
Please notice that can be overlapping data between databases, network visualization will handle this with an additional edge with a different color for each overlap interaction.
Please upload a valid FASTA file or paste a valid sequence in the text area for Host.
Please upload a valid FASTA file or paste a valid sequence in the text area for Pathogen.
Please upload a valid FASTA file or paste a valid sequence in the text area for both datasets (Pathogen & Host).
Max number of sequences for Host dataset was exceeded.
Max number of sequences for Pathogen dataset was exceeded.
Max number of sequences for Both dataset was exceeded.
There is a problem with the input provided.
Please check the email (you must provided a valid email).
Please check your minimum identity percentage (valid number from 1 to 100).
Please check your minimum coverage percentage (valid number from 1 to 100).
Please check your evalue (valid number).
Also you can leave it blank (This will run with default values) to continue with the Interolog prediction service.You have succesfully submitted a query to the Interolog prediction service at PredHPI.
This process takes the aminoacid sequences provided by the user and use the software HMMER to find proteins domains in those sequences using the information avaliable at Pfam, then will make a search inside the DDI(Domain-Domain Interaction) databases using SQL, trying to find a match between the domains obtained.
You can provide two different datasets, named as the Host Dataset and the Pathogen Dataset. The service receives aminoacid sequences in FASTA format.
If you want to do the search for a single dataset you can deactivate one of the Dataset panels by clicking the button next to the Dataset name. Wherever Dataset you decide to use alone will not produce results representing a host pathogen (or a protein protein) interaction (this is only accomplished when you use both datasets). Any other scenario will produce a protein domain interaction.
If you select the "No filters" option for the HMMScan you will increase your sensitivity but will take much longer.
3DID is the default database due to its updating rate.
Summary of the databases version running on this service:
3DID have 11,200 interactions.
IDDI have 204,716 interactions.
DOMINE have 26,219 interactions.
Results Ids are shown as the are presented on the queried Database. Ids on 3DID are as domains names and on IDDI are Pfam Id.
The Pfam release used in this version is Pfam31.0.
There is a problem with the input provided.
Please check the email (you must provided a valid email).
Please check your viterbi p-value.
Please check your forward pvalue.
Please check your MSV pvalue.
Please check your evalue (valid number).
Please check your coverage value (domE) .
Also you can leave it blank (This will run with default values) to continue with the Domain prediction service.You have succesfully submitted a query to the Domain Host-pathogen interaction prediction service at PredHPI.
This process takes the sequences provided by the user and use the software InterProScan to find the GO terms associated, then will use the R package GOSemSim to calculated the similarities between each of the GO terms, according to the overall similarity the protein pair would be interacting or not.
You must provide two different datasets, named as the Host Dataset and the Pathogen Dataset. The service receives sequences in FASTA format.
GOSemSim package will calculate a simmilarity matrix between the GO terms set of Host and Pathogen, then will combine those matrix values (pairwise similiarities between two GO terms) according to the strategy selected.
Wang method is used in the GOSemSim to calculate the similarities, this is due the fact that ignore BP, CC and MF subontologies in the graph, allowing to make use of all the GO terms founded by InterProScan.
Be aware of the dataset of GO used for your analysis, results can change dramatically between differents Dataset, uses the closest to the host specie analyzed.
Semantic similarity is calculated for each pairwise combination of GO terms, some proteins can have several GO terms associated, thus is necesary to wrap those values into a single one. GOSemSim provides a series of combine methods from which in PredHPI are avaliable four (BMA, max, average, rcmax), Descriptions are taken from GOSemSim website, further information at (https://bioconductor.org/pacakges/devel/bioc/vignettes/GOSemSim.html#combine-methods).
BMA: Best-match average strategy, calculates the average of all maximum similarities for each pairwise comparison.
max: Maximum.
average: Average.
rcmax: Similarity among two set of GO terms form a matrix, rcmax method uses the maximum between rows average and columns average.
There is a problem with the input provided.
Please check the email (you must provided a valid email).
Please check your Threshold value (you must provided a number from 0 to 1).
You have succesfully submitted a query to the GOppi Host-pathogen interaction prediction service at PredHPI.
Protein protein interaction prediction based on the phylogenetic profiling consist in the aligning (Diamond-Blast) of these proteins into a wide-range of genomes, marking binaries presences (1) or absences (0) for each of the genomes in which the proteins were aligned.
The vector with zeros and ones for each protein will be defined as the profiling pattern and if the two proteins shares a similar pattern then will be identified as interacting. The default thresholdPhylo of similarity have been defined as 0.5 (half-similarity), however we encouraged our users to play with this value.
The genome pools used in this service have been generated from the Uniprot proteomes set (ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/) using the species in the Bioconductor Org.db annoation package and other models species manually added, complete list of the species present in the pools can be found at faq
BC18: It means Bioconductor 18, which is the pool of genomes made from the 18 Bioconductor Org.DB species annotated.
UP82: It means Uniprot 82, which is the pool of genomes made from model uniprot proteomes.
ProtPhylo490: It means ProtPhylo 490, which is the pool of genomes made from the original ProtPhylo pool of proteomes.
Complete list of the species present in the pools can be found at faq
There is a problem with the input provided.
Please check the email (you must provided a valid email).
Please check your minimum identity percentage (valid number from 1 to 100).
Please check your minimum coverage percentage (valid number from 1 to 100).
Please check your evaluePhylo (valid number).
Please check your Threshold value (you must provided a number from 0 to 1).
You have succesfully submitted a query to the Phylo-profiling Host-pathogen interaction prediction service at PredHPI.