What does Evo 2 AI do?
An AI trained on genomic language
A new biological foundation model has been trained on an unprecedented scale of DNA data and can read and predict genome features across the tree of life. Built by exposing a large neural model to trillions of base pairs, the system learns statistical patterns that map sequence to function — for example, where genes, splice sites and regulatory elements sit in a genome. Because it was trained on genomes from many kinds of organisms, the model can generalize: it recognizes conserved signals across bacteria, plants and animals and can propose short, plausible DNA sequences in contexts where biology follows common rules.
Why that matters:
- It speeds annotation: regions that used to require laborious experiments can be prioritized by model predictions.
- It aids design: synthetic biology and protein engineering workflows can use the model to propose candidate sequences with desired properties.
- It opens discovery: pattern recognition at scale can highlight unusual or previously overlooked motifs that merit follow-up.
Practical and ethical caveats remain. The approach does not replace experiments: model outputs are probabilistic and must be validated in the lab. There are also biosecurity and governance implications because sequence-generation tools could, in principle, be misused to design harmful constructs. Responsible stewardship — including transparency about training data, limits on generation capabilities, and alignment with biosafety frameworks — will shape how quickly and widely the technology can be adopted.
In short, this represents a major step in applying large‑scale machine learning to genomic science: it promises to accelerate research, but it also raises practical, safety and policy questions that researchers and regulators will need to address.