TY - JOUR
T1 - Predicting Binding within Disordered Protein Regions to Structurally Characterised Peptide-Binding Domains
AU - Khan, Waqasuddin
AU - Duffy, Fergal
AU - Pollastri, Gianluca
AU - Shields, Denis C.
AU - Mooney, Catherine
N1 - Funding Information:
The authors acknowledge the Research IT Service at University College Dublin and the SFI/HEA Irish Centre for High-End Computing (ICHEC) for providing high performance computing (HPC) resources that have contributed to the research results reported within this paper. The authors thank Kevin Rue for technical assistance. W. Khan gratefully acknowledges the award of a European Commission (EC) Erasmus Mundus Europe Asia (EMEA) Split-Doctoral Scholarship Scheme in University College Dublin (UCD), Ireland.
PY - 2013/9/3
Y1 - 2013/9/3
N2 - Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif) containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58).Next, we trained a bidirectional recurrent neural network (BRNN) using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72) showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods) clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.
AB - Disordered regions of proteins often bind to structured domains, mediating interactions within and between proteins. However, it is difficult to identify a priori the short disordered regions involved in binding. We set out to determine if docking such peptide regions to peptide binding domains would assist in these predictions.We assembled a redundancy reduced dataset of SLiM (Short Linear Motif) containing proteins from the ELM database. We selected 84 sequences which had an associated PDB structures showing the SLiM bound to a protein receptor, where the SLiM was found within a 50 residue region of the protein sequence which was predicted to be disordered. First, we investigated the Vina docking scores of overlapping tripeptides from the 50 residue SLiM containing disordered regions of the protein sequence to the corresponding PDB domain. We found only weak discrimination of docking scores between peptides involved in binding and adjacent non-binding peptides in this context (AUC 0.58).Next, we trained a bidirectional recurrent neural network (BRNN) using as input the protein sequence, predicted secondary structure, Vina docking score and predicted disorder score. The results were very promising (AUC 0.72) showing that multiple sources of information can be combined to produce results which are clearly superior to any single source.We conclude that the Vina docking score alone has only modest power to define the location of a peptide within a larger protein region known to contain it. However, combining this information with other knowledge (using machine learning methods) clearly improves the identification of peptide binding regions within a protein sequence. This approach combining docking with machine learning is primarily a predictor of binding to peptide-binding sites, and is not intended as a predictor of specificity of binding to particular receptors.
UR - http://www.scopus.com/inward/record.url?scp=84883418471&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0072838
DO - 10.1371/journal.pone.0072838
M3 - Article
C2 - 24019881
AN - SCOPUS:84883418471
SN - 1932-6203
VL - 8
JO - PLoS ONE
JF - PLoS ONE
IS - 9
M1 - e72838
ER -