Category Archives: Research Roadmap


The amount of information on the Web is enormous and growing exponentially. Indeed, it is a major challenge to measure the amount of information contained in the Web. It is even harder to assess how much of this information is useful or original. In addition, the information on the Web comes in a huge range of formats from a vast number of disparate sources. All of these aspects raise a crucial research topic: how we are to browse, explore and query the Web at this scale? Once again, this theme requires the inter-disciplinary approach embodied in Web Science. From a computer science perspective, we need to know how inference can be supported at the Web scale; for example, how can context be represented and supported? What do psychology and linguistics tell us about the design of interfaces for querying complex data? How can the data sources within the Web be exploited to help us to develop understanding of the sociological aspects of the Web? Understanding the possibilities for inference online is an important skill for those moving into scientific research and development.

Security, Privacy and Trust

All economic, social and legal interactions are based on certain assumptions: that individuals can verify identities; can rely on the rules and institutions governing the interactions; and are assured that certain information will remain private. These assumptions are challenged by the Web: an environment where security, privacy and trust can be very difficult to monitor, verify and enforce. Will the Web grind to a halt as a result? Will ways be found to ensure that these basic features are present? Or will users of the Web find their own ways to cope with the absence of e.g., trust? These questions call on a broad range of Web Science disciplines: to understand how individuals perceive trust and privacy when they use the Web; to see how concepts such as trust can be computationally represented; to develop the legal institutions needed to govern Web interactions. An understanding of the technology underlying security, the variables underlying trust and the extent of the privacy that Web users demand is clearly valuable in a number of industries.

The Dynamics of the Web

The Web is different from most hitherto-studied systems in that it is changing at a rate which is of the same order as, or maybe greater than, our ability to observe it. This introduces many new inter-disciplinary research challenges. How are we to instrument the Web and how can we log it or identify behaviours? Once we are able to measure what is happening in this world of constant flux, we can then turn to the issue of how to model and understand it. Mathematical tools help to analyse the changing structure of the Web (e.g., using graph theory). Sociology can develop an understanding of the two-way process by which individuals and technologies shape each other. A legal perspective is needed to assess whether law is a catalyst for Web dynamics, or merely reactive to it. Linguistics will allow us to assess how language (and e.g., the preponderance of people for whom English is a second language) is affecting the development of the Web. As the Web changes, so does practice. Understanding the dynamics and pace of change is very important in a number of industries.

The Openness of the Web

The Web, as it exists today, is a complex mixture of open, public areas and closed, private zones. There are prominent advocates of both positions: those that maintain that the Web must be based on open platforms; those that argue that property rights provide the strongest incentive for innovation in the Web. There has been little systematic and coherent research to resolve these positions. That research must be interdisciplinary. From a technical computing perspective, we need to know exactly what is meant by “openness”. How can legal frameworks be constructed to deal with openness on the Web? Is openness necessary for innovation, or are private and commercial incentives more effective? Is openness compatible with the security requirements of e.g., e-health applications? Economic and legal issues predominate when we examine the open Web, and there are important questions of balance for proprietary Web development organisations. When is it important to release intellectual property to build a user base, and when should a more restrictive business model come into place? These are central issues for those concerned with providing software services and content on the Web.

Collective Intelligence

Collective intelligence is the surprising result of collaborative endeavour with only light rules of co-ordination that lead to the emergence of large-scale, coherent resources (such as Wikipedia). The existence and stability of these resources present major challenges for all the researchers engaged in Web Science. How, from a technical point of view, can collective intelligence be enabled? What are the socio-economic reasons why individuals participate in collective endeavour? What legal framework governs (or should govern) the resources that are created? What is the psychology of identification with an online collective community? How can collective intelligence emerge, given the different languages used by different genders, races, classes and communities? What role is there for policy-makers to engage in and facilitate collaborative endeavour? In an age where political participation is declining, harnessing the potential of collective and collaborative intelligence is an important theme for governments as they try to engage citizens, verify their legitimacy and find creative policy levers.

A Legal Perspective

Techniques for representing and reasoning over legal and social rules – what new tools need to be developed within legal theory to explore and understand the impact of law as a driver in shaping the Web development? Should law be a catalyst for change or merely reactive to it and how should it interact and respond to economic, social and technological influences?

Is the present intellectual property regulatory regime fit for purpose in the Web 2.0 (+) environment given that its legal principles were established in the offline world? What is content in the Semantic Web and what rights should attach to it particularly when much is likely to be “computer generated”?

Which technologies within the Web should the law ensure remain “open” rather than becoming the “property” of one or more commercial entities and what are the consequences of the choices available?

To what extent are the service providers going to become the legal gatekeepers for public authorities in terms of delivering their public policy objectives e.g. Web policing for what is judged to be “illegal and harmful content”?

What privacy issues arise in a Web environment of increasingly sophisticated information sharing? Does the rise of vigilante justice Web sites suggest that we are entering a new phase of macro technological regulation in which traditional forms of regulatory regime are simply sidestepped?

An Economic Perspective

What are the economics of Web 2.0 (+)? What new economic issues are raised by the opportunities that Web 2.0 gives for users to generate content and share it in self-forming networks?

What are the economic forces that shape the formation of social networks on the Web? What are the properties of those networks? What is the relationship between the economic structure of the Web, its social and mathematical structure?

What are the commercial incentives created by the Web? What will be the industrial structure? Is the Web inherently prone to concentration, where a large part of the structure is owned and controlled by a small number of players? Or are there forces that will allow smaller scale operations to co-exist with large firms?

What are the economic arguments for and against open platforms in the Web? Should policy (economic and public) play any role in shaping or determining the openness of Web platforms?

What (economic and social) mechanisms can be designed to improve the performance of the Web? For example, are there mechanisms that can improve the extent and quality of participation in online communities?

How can economics help with such issues as piracy, privacy and identity?

A Social Science Perspective

How can we develop inter-disciplinary epistemologies that will enable us to understand the Web as a complex socio-technical phenomenon?

How can we do mixed methods research to explore the relations between ethnographic insights to Web practice and the emergence of the Web at the macro level?

How can we draw on new data sources e.g. digital records of network use to develop understanding of the sociological aspects of the Web?

What are the on-going iterative relations between use and design of the Web?

How and why do people use newly emergent forms of the Web in the way that they do? What kinds of sociological and psychological concepts do we need to understand this? What implications does this have for our understanding of key sociological categories, e.g. kinship, gender, race, class and community, and vice versa? What implications does this have for our understanding of psychological constructs, e.g. personal and group identity, collaborative decision making, perception and attitudes.

How is the Web situated within networks of power and in relation to social inequalities? To what extent might the Web offer empowering political resources? How might the Web change further as new populations access it?

A Mathematical Perspective

How do we model the transient or ephemeral Web? Billions of Web pages are dynamically generated; they exist for the period of a particular query or transaction. How do we model this graph beneath the graph that is the Web?

How are Bayesian or other uncertainty representations best used within the Web?

What is the topological structure of the Web? Can connections always be established between its various parts, or do particular dynamic and time-dependent conditions create disconnected or sub- regions within it?

A particular query about a given subject may organize Web pages, existing or virtual, according to “how close they are” with respect to the given search criteria. This changes the virtual “shape” of the Web, as observed by the user. Given the huge numbers of searches performed simultaneously, the Web, at any given moment, will present a different structure to different users. It is a mathematical challenge to develop tools to describe this structure.

How do we measure the level of complexity of the Web? For a graph, this can be done by finding a linear space of a lowest dimension in which the graph will fit as a metric subspace. Such techniques are studied in pure mathematics and also in computer science.

A Computational Perspective

With the emergence of the so-called Linked Data Web or Semantic Web a key emerging challenge as we move from a Web of documents to a Web of linked data at a more fine-grained level is how we are to browse, explore and query such a Web at scale.

Collective Intelligence is the surprising result that collaborative endeavour with only light rules of social coordination can lead to the emergence of large-scale, coherent resources such as Wikipedia. What are the characteristics of such resources? Why do people contribute and how do they maintain a highly stable core body of connected content?

How do we support inference at a Web scale? What types of reasoning are possible? How is context represented and supported in Web inference?

How are concepts such as trust and provenance computationally represented, maintained and repaired on the Web?

As the Web has grown substantial amounts of it have become disconnected, atrophied or in others ways redundant. How are we to identify such necrotic and non-functional parts of the Web and what should be done about them?