UniDL4BioPep: A universal deep learning architecture for bioactive peptide prediction

The webserver is the implementation of the paper "Du, Z., Ding, X., Xu, Y., & Li, Y. (2023). UniDL4BioPep: a universal deep learning architecture for binary classification in peptide bioactivity. Briefings in Bioinformatics, bbad135."

Notice: For very large dataset processing: please download our model locally or contact us at zhenjiao@ksu.edu or yonghui@ksu.edu for more assistant.

Quick output version: 1. Choose a model → 2. Input a peptide sequence

Large-scale output version: 1. Prepare your files (xls, xlsx, fasta, or txt) and click “Choose File” for uploading → 2. Choose one or multiple models → 3. Download the results.

Usage of the webserver:

Example for “Quick output version” :

1. Select “Antihypertensive” model for antihypertensive activity prediction.  →   →  →  2. Insert a peptide or protein sequence, “VPP” →   →  →  3. Click “Run”→   →  → 4. The result will be returned in seconds below the “Run” button

Notice: it also support multiple sequence at the same time. Just input as “VPP,IPP,CCL,AGR” (sequences are separated by comma, no space)

Example for “Large-scale output version:” :

1. Prepare your xls, xlsx, txt or fasta files → → → 2. Upload the file through “Choose File” botton → → → 3. Select one or several models → → → 4. Click “Run” → → → 5. It will automatically download your results.

Notice: File preparation should follow the examples under this repository https://github.com/dzjxzyd/UniDL4BioPep_webserver/tree/main/Example%20uploading%20files

Detailed explaination of the activity abbreviation

Antihypertensive: Angiotensin-converting enzyme inhibitory activity (main target in for hypertension); DPPIV: dipeptidyl peptidase IV (DPPIV) inhibitory activity (main target for diabetes); AMP: antimicrobial activity; AMAP: antimalarial activity (main and alternative is corresponding to two datasets; QS: quorum-sensing activity; ACP: anticancer activity: MRSA: anti-methicillin-resistant S. aureus strains activity); TTCA: tumor T cell antigens; BBP: blood-brain barrier peptide; APP: anti-parasitic activity; FL: is just an indicator of the Focal loss as loss function version, typically, we recommend the FL-version if available (for balanced datassets, we do not using FL for model generation, but you can try it based our our template tutorial in github)

The whole model architecture

Whole architecture

Dataset for the those models

Bioactivity Training dataset Test dataset
ACE inhibitory activity 913 positives and 913 negatives 386 positives and 386 negatives
DPP IV inhibitory activity 532 positives and 532 negatives 133 positives and 133 negatives
Bitter 256 positives and 256 negatives 64 positives and 64 negatives
Umami 112 positives and 241 negatives 28 positives and 61 negatives
Antimicrobial activity 3876 positives and 9552 negatives 2584 positives and 6369 negatives
Antimalarial activity Main dataset (111 positives and 1708 negatives); alternative dataset (111 positives and 542 negatives) Main dataset (28 positives and 427 negatives); alternative dataset (28 positives and 135 negatives)
Quorum sensing activity 200 positives and 200 negatives 20 positives and 20 negatives
Anticancer activity Main dataset (689 positives and 689 negatives); alternative dataset (776 positives and 776 negatives) Main dataset (172 positives and 172 negatives); alternative dataset (194 positives and 194 negatives)
Anti-MRSA strains activity 118 positives and 678 negatives 30 positives and 169 negatives
Tumor T cell antigens 470 positives and 318 negatives 122 positives and 75 negatives
Blood-Brain Barrier 100 positives and 100 negatives 19 positives and 19 negatives
Anti-parasitic activity 255 positives and 255 negatives 46 positives and 46 negatives
Neuropeptide 1940 positives and 1940 negatives 485 positives and 485 negatives
Antibacterial activity 6583 positives and 6583 negatives 1695 positives and 1695 negatives
Antifungal activity 778 positives and 778 negatives 215 positives and 215 negatives
Antiviral activity 2321 positives and 2321 negatives 623 positives and 623 negatives
Toxicity 1642 positives and 1642 negatives 290 positives and 290 negatives
Antioxidant activity 582 positives and 541 negatives 146 positives and 135 negatives