Project

Multi-labeled semi-supervised learning in big data problem

Code
01P13514
Duration
01 October 2014 → 01 June 2016
Funding
Regional and community funding: Special Research Fund
Research disciplines
  • Natural sciences
    • Animal biology
  • Agricultural and food sciences
    • Veterinary medicine
    • Other veterinary sciences
    • Other agricultural, veterinary and food sciences
Keywords
data analysis multi-label
 
Project description

In the era of big data, analyzing and extracting knowledge from large-scale data sets is becoming a very challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining methods that embraces the huge storage and processing capacities of cloud platforms is required.
In this research proposal, we will design highly scalable methods for preprocessing and data mining tasks in order to tackle big data problems, in general, and in particular those that can be framed into the semi-supervised multi-label learning context. This recent topic is attracting much attention in many real world applications, such as bioinformatics, image classification, text mining, web mining, speech recognition, etc.
We will address the semi-supervised multi-label learning problem by using preprocessing
techniques and new classification methods. After a careful study of the state-of-the-art, we will develop new classification algorithms. Then, we will develop new feature selection/weighting algorithms as well as instance reduction techniques in order to tackle big data problems, surpassing the lack of scalability of existing proposals by using cloud-based technologies.
As application domains, we will focus on data mining problems in the context of bioinformatics.
The recent data explosion in these fields requires the use of scalable data mining tools.