ALEXANDRIA – Temporal Retrieval, Exploration and Analytics in Web archives.


The Web is one of the most important socio-technical systems of our time, mirroring trivia and popular culture, propaganda and politics,
literature and high culture. Yet we have only very limited capabilities
for accessing and exploring the past of the Web, in stark contrast to
what we could learn from it. Those who cannot remember the past are condemned to repeat it.

Within ALEXANDRIA, we want to provide at least some of the tools
enabling us to analyze the past, based on what is and what was available on the Web. Our goal in ALEXANDRIA is to significantly advance semantic and time-based indexing for Web archives using human-compiled knowledge available on the Web, to efficiently index, retrieve and explore information about entities and events from the past. In doing so, we will focus on the concurrent evolution of this knowledge and the Web content to be indexed, and take into account diversity and incompleteness of this knowledge.

We will further investigate mixed crowd- and machine-based Web analytics to support long-running and collaborative retrieval and analysis processes on Web archives. Usage of implicit human feedback will be essential to provide better indexing through insights during the analysis process and to better focus harvesting of content.

Finally, finding an equilibrium between the user’s right to privacy and
the public’s right to information is a key goal of ALEXANDRIA, and
bringing together both legal knowledge and privacy-preserving data
mining models and algorithms is crucial to accomplish that goal.