After Proteins, Google's AI Reads Dark Matter in DNA

After AlphaFold , the Artificial Intelligence model capable of deciphering the 3D structure of proteins that earned John Jumper, David Baker and Damis Hassabis the 2024 Nobel Prize in Chemistry , the Google DeepMind company presents a new AI model called AlphaGenome . It is designed to read the so-called 'dark matter' of DNA , that is, the set of genetic sequences that do not code for proteins but that influence their activity . Wrongly and for a long time branded as 'junk DNA,' these mysterious sequences constitute the vast majority of human DNA, a good 98%. The AlphaGenome model is described in an article that has not yet been examined by the scientific community. "The coding part of our genome, made up of around 20 thousand genes, is now well known ," Giuseppe Novelli, a geneticist at the University of Rome Tor Vergata, told ANSA. " The rest , however, is extremely heterogeneous : one part is made up of repetitive DNA , another is made up of mobile elements that can change their position. In any case, these are always genes , estimated at 60-63 thousand , which however code for RNA (the single-stranded molecule related to DNA). Given their enormous quantity - underlines Novelli - it is very important to have a tool like this , which can at least indicate which family they belong to". AlphaGenome can read long sequences of DNA, up to 1 million letters , and make thousands of predictions about their role and the possible effects of any mutations . In one example, researchers led by Žiga Avsec tested the model with some mutations identified in people affected by leukemia and AlphaGenome was able to accurately predict that the mutations would indirectly activate a nearby gene considered one of the most common causes of this type of tumor. The new AI, however, is still limited because it has been trained only on data from humans and mice and struggles if the mutations alter genes located very far away. "Given the great development that RNA-based drugs are having and will continue to have in the future," adds Novelli, "having a tool that allows us to predict what role these molecules play could also help us identify potential targets for future drugs more quickly ."
ansa