In the era of big data, analyzing and extracting knowledge from large-scale data sets is becoming a very challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining methods that embraces the huge storage and processing capacities of cloud platforms is required.
In this research proposal, we will design highly scalable methods for preprocessing and data mining tasks in order to tackle big data problems, in general, and in particular those that can be framed into the semi-supervised multi-label learning context. This recent topic is attracting much attention in many real world applications, such as bioinformatics, image classification, text mining, web mining, speech recognition, etc.
We will address the semi-supervised multi-label learning problem by using preprocessing
techniques and new classification methods. After a careful study of the state-of-the-art, we will develop new classification algorithms. Then, we will develop new feature selection/weighting algorithms as well as instance reduction techniques in order to tackle big data problems, surpassing the lack of scalability of existing proposals by using cloud-based technologies.
As application domains, we will focus on data mining problems in the context of bioinformatics.
The recent data explosion in these fields requires the use of scalable data mining tools.