-
Natural sciences
- Information retrieval and web search
- Information technologies
- Knowledge management
- Database systems and architectures
- Web information systems
Data on the Web is increasingly becoming part of centralized data silos, where personal data is in control of large companies. Due to recent data abuse scandals, there is an increasing push towards the decentralization of data on the Web into a large number of user-owned data vaults. While query engines are typically used for interacting with data due to their ability to abstract the complexities behind data access, they are optimized for centralized data storage. To build usable applications over decentralized data, new querying algorithms for massively distributed data are needed. The recent link traversal query execution paradigm is a promising match, as I have shown that it can effectively discover data over large decentralized environments during query execution. However, it is often too slow due to a lack of query planning algorithms that are compatible with this live data discovery aspect. Hence, I will introduce new query planning algorithms that can adaptively optimize the query plan upon discovery of structural information and heterogeneous interfaces within decentralized environments. To evaluate these algorithms, I will compare them to existing approaches by simulating realistic decentralized environments. In general, adaptive query plan optimization upon live discovered sources will be the primary academic breakthrough of this research, which is an open problem in Semantic Web research, and an urgent need by decentralized application developers and organizations.