Single-cell technologies have greatly advanced along the last few decades. The amount of singlecell
data being generated grows exponentially, in multiple areas and formats such as high content
imaging, flow or mass cytometry or RNA sequencing. However, most current machine learning
techniques cannot cope with that huge amount of information, the most commonly followed
approach being summarizing the data of all the cells of a sample into a single vector of means or
medians. In this project, we will tackle this problem from a Multi-Instance Learning point of view,
in order to extract the rich knowledge that is expected to be obtained form such large sets of data.
We will develop parallel, scalable algorithms that are able to cope with the large volumes of singlecell
data that are currently being generated. We will also explore and develop new Deep Learning
approaches to address the problem at hand.
The developed methods will be applied on the wide variety of single-cell datasets that are and will
be generated by the host institution and other collaborating research facilities. We expect this
close contact to lead to novel results from the biological point of view, which in turn can open new
insights on cellular and disease mechanisms.