A new algorithm provides rich and detailed information about the location and function of proteins in a cell

Humans are good at looking at pictures and finding patterns or making comparisons. Look at a collection of dog photos, for example, and you can sort them by color, ear size, face shape, and more. But could you compare them quantitatively? And perhaps more intriguingly, could a machine extract meaningful information from images that humans cannot?

Now, a team of scientists from Chan Zuckerberg Biohub have developed a machine learning method to quantitatively analyze and compare images – in this case protein microscopy images – without any prior knowledge. As reported in Natural methods, their algorithm, dubbed “cytoself”, provides rich and detailed information about the location and function of proteins in a cell. This capability could speed up research time for cell biologists and potentially be used to speed up the process of drug discovery and screening.

It’s very exciting – we’re applying AI to a new kind of problem and still picking up everything humans know, and more. In the future, we might do this for different types of images. It opens up a lot of possibilities.”

Loïc Royer, corresponding co-author of the study

Cytoself not only demonstrates the power of machine learning algorithms, but it has also generated insights into cells, the basic building blocks of life, and proteins, the molecular building blocks of cells. Each cell contains around 10,000 different types of protein – some working alone, many working together, performing various tasks in various parts of the cell to keep them healthy. “A cell is much more spatially organized than we previously thought. This is an important biological finding on how the human cell is wired,” said Manuel Leonetti, also co-corresponding author of the study.

And like all the tools developed at CZ Biohub, cytoself is open source and accessible to everyone. “We hope this will inspire many people to use similar algorithms to solve their own image analysis problems,” Leonetti said.

No matter a doctorate, machines can learn for themselves

Cytoself is an example of what is called self-supervised learning, which means that humans don’t teach the algorithm anything about protein images, as they do in supervised learning. “In supervised learning, you have to teach the machine one by one with examples; it’s a lot of work and very tedious,” said Hirofumi Kobayashi, lead author of the study. And if the machine is limited to the categories that humans teach it, it can introduce biases into the system.

“Manu [Leonetti] I thought the information was already in the pictures,” Kobayashi said. “We wanted to see what the machine could figure out on its own.”

Indeed, the team, which also included CZ Biohub software engineer Keith Cheveralls, was surprised by the amount of information the algorithm was able to extract from the images.

“The degree of detail in protein localization was much higher than we would have thought,” said Leonetti, whose group develops tools and technologies to understand cellular architecture. “The machine turns each protein image into a mathematical vector. So you can start classifying images that look alike. We realized that by doing this we could predict, with high specificity, which proteins work together in the cell just by comparing their images, which was quite surprising.”

First of its kind

Although there has been previous work on imaging proteins using self-supervised or unsupervised models, never before has self-supervised learning been used so successfully on such a large ensemble. data set of more than one million images covering more than 1,300 proteins measured from living human cells, said Kobayashi, an expert in machine learning and high-speed imaging.

The images were a product of CZ Biohub’s OpenCell, a project led by Leonetti to create a comprehensive map of the human cell, possibly including characterization of the approximately 20,000 types of proteins that power our cells. Published earlier this year in Science were the first 1,310 proteins they characterized, including images of each protein (produced using a type of fluorescent tag) and maps of their interactions with each other.

Cytoself has been key to the success of OpenCell (all images are available at opencell.czbiohub.org), providing very granular and quantitative information about protein localization.

“The question of what are all the possible ways for a protein to locate in a cell — all the places it can be and all sorts of combinations of places — is fundamental,” Royer said. “Biologists have tried to establish every possible place, over decades, and every possible structure within a cell. But it has always been done by humans looking at the data. how imperfect are human limitations and biases?

Royer added, “As we’ve shown, machines can do this better than humans. They can find finer categories and see distinctions in images that are extremely fine.”

The team’s next goal for cytoself is to track how small changes in protein localization can be used to recognize different cell states, for example, a normal cell versus a cancerous cell. This could hold the key to a better understanding of many diseases and facilitate drug discovery.

“Drug testing is basically trial and error,” Kobayashi said. “But with cytoself, it’s a big leap because you won’t need to experiment one-by-one with thousands of proteins. It’s a low-cost method that could dramatically increase research speed.”


Journal reference:

Kobayashi, H. et al. (2022) Self-supervised deep learning encodes high-resolution features of protein subcellular localization. Natural methods. doi.org/10.1038/s41592-022-01541-z.

About Cecil Cobb

Check Also

Oral chemo-new drug combination leads to better outcomes in patients with metastatic breast cancer

Treatment with a combination of oral paclitaxel and encequidar resulted in higher confirmed tumor response …