Defesa de Dissertação Adriano Neves, dia 20/07/17, as 10:00, sala 202 - B, Auditório do CEAD


Dynamic Topic Hierarchies and Segmented Rankings in Textual OLAP Technology


The OLAP technology emerged 20 years and recently has been redesigned so that its dimensions, hierarchies and measures can support the particularities of textual data. Organizing textual data hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is defined only once in the data cube, i.e., for the entire lattice of cuboids. However, such hierarchy is sensitive to the document collection content. Thus, a data cube cell can contain a collection of documents distinct from others in the same cube, causing potential changes in the topic hierarchy. Furthermore, the text segment used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach builds a topic hierarchy per text segment. Another contribution of this work refers to query response. The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go beyond by returning other text segments, such as the most significant titles, abstracts and paragraphs. The approach is designed in four additional steps and each step attenuates a bit more the impact of building multiple topic hierarchies and segmented rankings per cube cell. Experiments using part of the DBLP papers as a document collection reinforce our hypotheses.

