Harnessing the power of peptides

26 May 2025
Maxwell Tabarrok argues that the creation of a new million-peptide database could help beat antibiotic resistance
Komodo dragons, the world’s largest lizards, eat carrion and live on Indonesian islands. Their saliva hosts many of the world’s most stubborn and infectious bacteria. However, Komodos almost never get infections¹. Even when they have open wounds, Komodo dragons can trudge happily along through rotting corpses and mud without worry.
Their resilience is partially due to an arsenal of antimicrobial peptides in their blood. Peptides such as these are an important potential solution in the fight against antimicrobial resistance, which now causes over a million deaths a year globally – and many more acute, chronic or repeated illnesses – and on its current trajectory threatens to upend centuries of progress in medicine in the decades ahead.
Peptides are highly promising candidates for fighting pathogens, including drug-resistant ones, for two reasons. First, they can resist resistance. While antibiotics often target very narrow biochemical reaction pathways of a pathogen, or particular proteins found in its cytoplasm, peptides target general properties of a bacterium’s entire membrane, such as charge or lipid composition. While a pathogen may be able to change one residue in a target protein, it is much harder to change the electric charge over its entire surface. This general targeting has enabled antimicrobial peptides to be effective first defences against pathogens, in all classes of life, for millions of years².
Peptides also have in vitro effects on the toughest antibiotic-resistant infections including MRSA³, HIV⁴, fungal infections⁵ and even cancer⁶. But they are still not common on pharmacy shelves or in hospital treatments. Some current clinical trials⁷ will change this, but the main barrier is still in the fundamental research.
Second, they are easy to programme and synthesise. Their properties and structure are a result of short amino acid chains, so it is relatively easy to work with them computationally and apply bioinformatics or machine learning to programme desired properties. It is also fairly simple to manufacture and test peptides⁸. This is in contrast to small molecule antibiotic manufacturing, where figuring out how to synthesise a particular chemical can take years of trial and error – and making that synthesis efficient can take even longer.
The data problem
Researchers at the frontier of this field use machine learning to predict new sequences that will have antimicrobial properties, or to classify known sequences⁹. This field of research is highly promising, but progress is not fast enough to meet the challenge of antibiotic resistance. It is moving so slowly due to one main constraint: data.
Google’s AlphaFold protein prediction platform was trained on just over 100,000 3D protein structures from the Protein Data Bank. Its AI board-game model, AlphaGo, trained on 30 million moves from human games and orders of magnitude more moves during games it played against itself. The largest language models are trained on at least 60 terabytes of text, which is an extraordinary number of words and sentences.
The data available for antimicrobial peptides is nowhere near these benchmarks. Some databases contain a few thousand peptides each, but they are scattered, unstandardised, incomplete and frequently duplicative. Data on a few thousand peptide sequences and a scattershot view of their biological properties is simply not sufficient to get accurate machine learning predictions for a system as complex as protein-chemical reactions.
The antimicrobial peptide database (APD3) is small, with just under 4,000 sequences, but is among the most tightly curated and detailed. However, most of the sequences available are from amphibians.
Another database, CAMPR4, has around 20,000 sequences, but roughly half are ‘predicted’ or synthetic peptides that may not have experimental validation, and therefore contain less information about their source and activity.

The formatting of each of these sources is different, so it’s not easy to put all the sequences into one model. More inconsistencies and idiosyncrasies stack up for the dozens of other data sets that are available.
There is even less negative training data – that is, data on all the amino acid sequences without interesting publishable properties. Labs will test dozens or even hundreds of peptide sequences for activity against certain pathogens, but usually only publish and upload the sequences that worked. Training a model without this data makes it extremely difficult to avoid false positive predictions.
Million-peptide solution
The problem can be solved with an investment in public data production. A massive, standardised and detailed data set of one million peptide sequences and their antimicrobial properties (or lack thereof) would accelerate progress toward new drugs that are effective against antibiotic-resistant pathogens.
There are no significant scientific barriers to generating a data set 1,000 times or 10,000 times larger than those that exist.
Several high-throughput testing methods¹⁰ have been successfully demonstrated, with some screening as many as 800,000 peptide sequences¹¹ and nearly doubling the number of unique antimicrobial peptides reported in publicly available data sets.
These methods will need to be scaled up, not only by testing more peptides, but also by testing them against different bacteria, checking for human toxicity and testing other chemical properties.
The idea of targeted data infrastructure investments has three successful precedents: PubChem, the Human Genome Project (HGP) and ProteinDB. With an annual budget of $3m, PubChem exceeded the size of the leading private molecule database by around 10,000 times in 2011 and made the data free. It’s credited with supporting a renaissance in machine learning for chemistry¹².
ProteinDB, at an estimated cost of $764m, has become the primary depository for protein structure discoveries – and like the Human Genome Project was paired with a large data generation programme, the Protein Structure Initiative (PSI). The hundreds of thousands of detailed 3D protein structures in the PSI databank became the essential training data behind the success of AlphaFold¹³.
Making it happen
To create our million-peptide database, we would need to start by merging and standardising existing peptide data sets and open them to all. Organising this data once and for all, opening it to all interested researchers and establishing a central repository for all future peptide sequence discoveries would save thousands of hours of researcher time.
Collecting existing data won’t be nearly enough, though. The next step would be to industrialise peptide testing. Mass-produced protein synthesis and testing are already well-established techniques in the field, so such a project won’t need the remarkable advances in technology seen over the course of the HGP. Scientific funders need only support scaling up of existing techniques.
Researchers can already test tens or hundreds of thousands of peptides simultaneously¹⁰,¹¹. Industrialising peptide testing is more complicated than the demonstrations in these papers, however, because we need to screen for lots of variables in addition to a single measure of antimicrobial activity. We want to know about the peptide’s activity against a broad range of bacteria, viruses, fungi and cancer cells; we want to know about the peptide’s effects on benign human cells or beneficial bacteria so it doesn’t do too much collateral damage; and we want to know about the peptides that failed to have any interesting effects, so our machine learning models know what to avoid.
By my calculations¹⁴ this entire project could be completed in five years for around $350m.
In contrast, the direct treatment cost for just six drug-resistant infections is around $4.6bn annually in the US¹⁵, with a far greater cost coming from the excess mortality and damaged health. A single concentrated effort over several years would lay a foundation for a renaissance in antimicrobial peptide research, as PubChem, the HGP and ProteinDB did for their respective fields.
No time to wait
Infectious diseases that harried humanity for millennia are regaining strength as antibiotic resistance spreads across the globe. Every year antibiotic-resistant infections claim more than 1.2 million lives worldwide. Peptides – found in everything from Komodo dragon blood to human saliva – have been nature’s first line of defence against these infections for millions of years. We can learn from and improve on nature’s example, making effective new treatments for some of the world’s deadliest and most intransigent diseases.
Peptide antimicrobials might even exceed the effectiveness and versatility of antibiotics. They are just short proteins, the machinery of all living things, and often demonstrate other useful properties against viruses, fungal infections and cancer. Once we figure out how the properties of a peptide change as we substitute different amino acid building blocks, we might be able to design, test and mass-manufacture new treatments within a matter of weeks, rather than the decades it takes for new antibiotics to come to market.
Targeted approach
The path towards this future is clear. Computational prediction on the sequence of amino acids is a promising and tractable way to advance our understanding and control over the properties of antimicrobial peptides. The most difficult scientific bottlenecks with this strategy have been crossed.
We can meet this challenge and solve it quickly if we target our resources at building open-data infrastructure that thousands of research projects will use. Let’s not wait while antibiotic-resistant pathogens get stronger.
Maxwell Tabarrok is a pre-doctoral researcher at Dartmouth College, Virginia, US, studying the economics of science.
1) Chung, E.M.C. et al. npj Biofilms Microbiomes 3(9), (2017).
2) Wang, G. Methods Mol. Biol. 1268, 4366 (2015).
3) Menousek, J. et al. Int. J. Antimicrob. Agents. 39(5), 402–406 (2012).
4) Qureshi, A. et al. HIPdb: a database of experimentally validated HIV inhibiting peptides. PLoS One 8(1), e54908 (2013).
5) Agrawal, P. et al. 26(9), 323 (2018).
6) Beheshtirouy, S. et al. Curr. Protein Pept. Sci. 22(1), 74–88 (2021).
7) Intratumoral Injections of LL37 for Melanoma. National Library of Medicine, 2021. clinicaltrials.gov/study/NCT02225366#more-information
8) Johnston, S. A. et al. Sci. Rep. 7, 17610 (2017).
9) Wan, F. et al. Nat. Rev. Bioeng. 2(5), 392–407 (2024).
10) Koch, P. et al. Sci. Rep. 12, 4097 (2022).
11) Tucker, A T. et al. Cell. 172(3), 618–628 (2018).
12) Baskin, I. et al. Expert Opin. Drug Discov. 11(8), 785–795 (2016).
13) Jumper, J. et al. Nature 596, 583–589 (2021).
14) How scientific incentives stalled the fight against antibiotic resistance, and how we can fix it. Institute for Progress (2025).
15) CDC partners estimate healthcare cost of antimicrobial-resistant infections. CDC (2021).