Basics of Open Data

Facts and Figures about Open Data

Open data means allocating appropriate licences to data to make it freely available to the public and usable for a wide variety of output channels. Open data is data that can be used, shared and reused by anyone for any purpose. Use restrictions are only permitted to safeguard the origin and openness of knowledge, e.g., by naming the author or using a share-alike clause.

As not all data can be made accessible immediately and with the highest possible degree of openness, a step-by-step approach is recommended. Tim Berners-Lee's 5-star deployment scheme for open data classifies the degree of openness of data sets.

Tim Berners-Lee's 5-star deployment scheme for open data classifies the degree of openness of data sets, ranging from one star PDF to five star LOD format.

* make your data available on the web (whatever format) under an open licence

** make it available as structured data (e.g., Excel instead of image scan of a table)

*** make it available in a non-proprietary open format (e.g., CSV instead of Excel)

**** use URLs to denote things so that data can be linked

***** link your data to other data to provide context

The free use of data can lead to new kinds of reporting and analyses and trigger new products, services or business models. Open data is therefore key for innovation.

What Is Structured Data?

For data to meet the criteria of open data it must be structured in a uniform way and made available in a machine-readable format so that it can be filtered, searched and processed by other applications.

The order and labelling of data is perhaps the most important basis of open data. Without it, information will not be found.

To structure and uniformly describe data the de facto global standard Schema.org is widely used. Schema.org is an ontology, i.e., a collection of terms to describe certain things on the web and their relationship to each other. On websites, Schema.org is integrated into the source code of the page and not visible to users. Embedding makes the content machine-readable and machine interpretable.

Simplified example: Structured data according to Schema.org

Schema.org is not a finished product and expanded on an ongoing basis. Certain properties, e.g. to describe MICE data sets, may not yet be mapped in Schema.org. Vocabulary is being extended in the context of domain specifications.

Clearly Defined Licences for Free Use

Rights for images, video and text must be clearly defined in licences in order to make data available for open use. The Creative Commons (CC) licensing system is recommended. Creative Commons is a non-profit organisation that has developed a range of standard licence agreements that allow creators to grant the public rights to use their work. Different Creative Commons licence types provide for different uses. The preferred licences are CC0 ("No Rights Reserved"), CC BY (Attribution) and CC BY-SA (Attribution, Share-Alike), which allow free use and redistribution under their respective terms. In addition to the CC0 licence, the CC BY and CC BY-SA categories allow commercial use, which is a requirement for open data.

The CC0 marking of Creative Commons | © Creative Commons — Visual for marking a CC0 license under Creative Commons

Connecting Data in a Knowledge Graph

Providing German MICE data as open data is a first step towards more visibility and reach. However, only when data is connected and interrelated can benefits and added value be realised.

A knowledge graph is a semantic database. By connecting data in a knowledge graph, a network of objects (e.g. people, places, organisations, events, etc.) and the relationships between these objects is created. Information about venues, for example, can be linked with infrastructure data, travel information, sights at the location, etc.

Knowledge graphs are particularly well suited for complex and nested queries and analyses. The branched graph structure enables finding connections that are otherwise difficult to visualise. A knowledge graph does not have a fixed pattern, but is flexibly adaptable. Data sets can be added to the graph as needed by relating them to an existing data set. Individual data points and their relationships to each other are managed independently of the output channel. This allows for data being delivered in various types of context, different ways and across a range of channels.

Target Groups of the Open Data Knowledge Graph

The Open Data Knowledge Graph is freely accessible to all interested parties such as global sales platforms, tourism and MICE service providers or start-up companies in order to generate new services from the linked data - and, of course, the use of the data also opens up new opportunities for providers (data providers) to create offers for their customers.

The target groups for these new services in the tourism sector are end consumers, i.e. tourists planning a holiday trip or on site at the holiday destination. In the MICE sector, on the other hand, event planners are addressed, meaning that use here is primarily B2B orientated.

The data models stored in the tourism and MICE sectors are correspondingly different. The set-up of the Knowledge Graph with its own MICE subgraph and the various options for feeding data into the graph take account of the different requirements and uses in the tourism and MICE sectors.