Volunteer your browser for antibiotic discovery. Your computer runs ESM-2, a 150 million parameter protein language model, to analyze peptide sequences for antimicrobial activity. No installs needed. Every batch you complete is saved permanently, even if you close early. The model downloads once (~566MB) and is cached in your browser for future sessions.
click to begin
progress-
-
0
your sequences
0
global total
0
candidates
0
active nodes
activity

how it works

01. generate

The server generates novel peptide sequences using guided mutation and recombination of known antimicrobial scaffolds. Each sequence is a unique candidate that has never been tested.

02. analyze

Your browser runs ESM-2 (150M parameters, trained on 250M protein sequences) to compute a 640-dimensional embedding vector for each candidate. This captures structural and functional properties.

03. classify

A classifier trained on 453 known antimicrobial peptides and 453 non-antimicrobial controls scores each candidate. High-scoring sequences are flagged as potential new antibiotics for experimental validation.


what makes a good antimicrobial peptide

Antimicrobial peptides (AMPs) kill bacteria by disrupting their cell membranes. The ideal candidate has:

Only about 3,500 AMPs are currently known. The theoretical sequence space for a 20-residue peptide is 2020, roughly 1026 possible combinations. Exhaustive screening is impossible. Distributed computing with ML-guided exploration is how we navigate this space.


why this matters

Antibiotic resistance is one of the most urgent threats to global health. Drug-resistant infections kill 1.27 million people annually. By 2050, projections estimate 10 million deaths per year. Traditional antibiotic development has largely stalled because it is not profitable enough for pharmaceutical companies.

AMPs are promising because bacteria have difficulty developing resistance to them. Unlike traditional antibiotics that target specific proteins, AMPs attack the fundamental structure of the bacterial membrane. This makes resistance much harder to evolve.


the model

ESM-2 is a protein language model developed by Meta's Fundamental AI Research team. It was trained on 250 million protein sequences from UniRef and published in Science (2023). The model learns structural and functional properties of proteins directly from sequence data.

We use the 150M parameter variant (esm2_t30_150M_UR50D), which produces 640-dimensional embedding vectors. This is small enough to run in a browser via WebAssembly but large enough to capture meaningful biological signal. The embeddings feed into a classifier trained on validated AMPs from the APD3 database.

Citation: Lin, Z. et al. "Evolutionary-scale prediction of atomic-level protein structure with a language model." Science 379.6637 (2023).


transparency and disclosures

What runs on your computer: ESM-2 150M (open source, by Meta AI) runs in your browser via WebAssembly. It performs mathematical operations on amino acid sequences. No other software is installed or executed.

Resource usage: The model uses CPU cycles and approximately 400-800MB of RAM. Your browser tab will use moderate to heavy CPU while running. Close the tab at any time to stop. Your work up to that point is saved.

What data is sent: Only peptide sequences and their computed embedding vectors (arrays of numbers). No personal data, browsing history, cookies, IP addresses, or device information is collected or stored.

Open source: All code, data, and methods are publicly available. No patents will be filed on volunteer-discovered candidates. Results will be published openly on bioRxiv.

Limitations: All results are computational predictions that have not been experimentally validated. "Candidate" means a sequence with promising predicted properties, not a confirmed antimicrobial agent. Experimental validation by a qualified laboratory is required before any medical or therapeutic claims can be made.

Research status: This is an independent citizen science project. It is not affiliated with any university, pharmaceutical company, or government agency. It has not undergone institutional peer review.

data sources

APD3 (Antimicrobial Peptide Database, University of Nebraska Medical Center): 453 validated antimicrobial peptides used as positive training data. Wang et al., Nucleic Acids Research (2016).

UniProt/Swiss-Prot: Reviewed non-antimicrobial peptides used as negative training data. Sequences explicitly lacking antimicrobial, antibiotic, defensin, or bacteriocin annotations.

ESM-2: Pre-trained protein language model. Lin et al., Science (2023). Weights from Hugging Face (Xenova/esm2_t30_150M_UR50D).