Architecture Decentralized Digital Heritage Network

Living Document,

This version:
https://github.com/ErfgoedPod/architecture
Issue Tracking:
GitHub
Inline In Spec
Editor:
(meemoo)

Abstract

This specification gives a high-level overview of a decentralized digital heritage network using principles from Solid and the Researcher Pod project.

1. Set of documents

This document is one of the Decentralized Digital Heritage Network specifications produced by the ErfgoedPod project by Netwerk Digitaal Erfgoed, meemoo - Flemish Institute for Archives and Ghent University - IDLab:

  1. Decentralized Digital Heritage Network architecture (this document)

  2. Use cases & Business processes

  3. Common infrastructure in Cultural Heritage Institutions

This project also contributes to the following companion specifications of the ResearcherPod project:

  1. Orchestrator

  2. Data Pod

  3. Rule language

  4. Artefact Lifecycle Event Log

  5. Notifications

  6. Collector

2. Introduction

The Decentralized Digital Heritage Network is a protocol and a set of best practices to establish a sustainable exchange network of digital heritage data between cultural heritage institutions and their services. It is an application of the generic ResearcherPod boilerplate architecture for decentralized Web networks based on the [solid-protocol] and [ACTIVITYSTREAMS-VOCABULARY]. This document lays out the high-level concepts and design of Decentralized Digital Heritage Network.

3. Overview

The decentralized digital heritage network

create diagram when more is known about the components on mellon side

4. Actors

A Decentralized Digital Heritage Network consists of multiple parties that actively participate in the exchange. Such party is henceforth considered a network actor.

All actors operate under the same protocol, but can differ in purpose. Every actor in a decentralized digital heritage network is therefore one of the following types:

Cultural Heritage Institution

an individual or organisation producing and sharing digital heritage data;

Service provider

an organisation consuming and processing digital heritage data to provide a service to other actors in the network;

Service portal

an organisation consuming digital heritage data to provide a service to end-users.

5. Artefacts

A Digital Heritage Artefact is a unit digital heritage data that is the object of exchange between actors. We distinguish the following types of artefacts:

Dataset

The description of a collection of data as defined in Requirements for Datasets §dataset.

Dataset description

Metadata that publishers should provide about their dataset aligned with the machine-readable publication model described in [requirements-datasets].

Actor profile

A description of an actor that is a member of the network. Often, this is about the organisation acting as cultural heritage institution, service provider, or service portal. It is used as part of a registration or identification proces.

All artefacts have a lifecycle that consists of a sequence of lifecycle events. A Lifecycle Event is a documented activity that reflect changes to the artefact’s presence or positioning on the network. The occurence of such event can render the artefact eligable for certain services or exchanges. For instance, a dataset (= artefact) can only be archived by an archiving service once its registered (= lifecycle event).

Create
Create
Destroy
Destroy
Register
Register
Index
Index
Preserve
Preserve
Update
Update
Store
Store
Announce
Announce
Enrich
Enrich
Link
Link
Viewer does not support full SVG 1.1

We distinguish the following types of lifecycle events:

complete the list of lifecycle events.

6. Components

6.1. Digital Heritage Pod

The Digital Heritage Pod is a Cultural Heritage Institution's main exchange hub for sharing digital heritage information with external service provider and other cultural heritage institutions. By design, the Digital Heritage Pod is a passive component: it can respond to requests for the digital heritage artefacts it stores, but cannot start an interaction with other actors (see the additional components layed out in § 6.3 Participating in a decentralized Digital Heritage Network for active participation).

The Digital Heritage Pod’s core fundament is a Solid Data Pod The Solid Protocol §data-pod consisting of:

The digital heritage artefacts stored in the datastore originate from two data management systems at the cultural heritage institution:

Finally, the pod exposes a Artefact Lifecycle Event Log: a resource containing an immutable log that records all lifecycle events related to artefacts known to the pod.

Inbox
Inbox
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
ACL
ACL
Artefact
Lifecycle
Event Log
Artefact...
Datastore
Datastore
Collections
Management
System
Collections...
Digital Asset Management
System
Digital Asset...
Maintainer
Maint...
Viewer does not support full SVG 1.1

6.2. Digital Heritage Service Hub

A Digital Heritage Service Hub is a service provider's exchange hub to make its services available to other network actors such as cultural heritage institutions or service portals. It consists of some of the same interface components as the Digital Heritage Pod:

In contrast to the Digital Heritage Pod, it is unspecified what other subcomponents a service hub should provide. Processes that store data, provide security or execute the services are considered a black box.

Inbox
Inbox
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
Artefact
Lifecycle
Event Log
Artefact...
Service
Service
Viewer does not support full SVG 1.1

6.3. Participating in a decentralized Digital Heritage Network

To actively participate in the network, actors require a few components that enable them to interact with other actors. For cultural heritage institutions), these components commonly complement a digital heritage pod or Digital Heritage Service Hub.

Interations between actors are always about a digital heritage artefact and result in a lifecycle event of that artefact.

Dashboard
(UI)
Dashboard...
Orchestrator
(agent)
Orchestrator...
policy
policy
Inbox
Inbox
Datastore
Datastore
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
ACL
ACL
Collections
Management
System
Collections...
Digital Asset Management
System
Digital Asset...
possibly
integrated
possibly...
Maintainer
Maint...
Viewer does not support full SVG 1.1

6.4. Collection information from a decentralized Digital Heritage Network

Actors retrieve two types of information from the network:

Query index

An index that allows for fine-grained search into the contents of the files stored in the Data Pod.

Collector

Agent that queries or craws the decentralized network for distributed information targeted by a certain query that needs solving.

Filters

Description of the information that needs to be collected.

Inbox
Inbox
Datastore
Datastore
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
ACL
ACL
Inbox
Inbox
Datastore
Datastore
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
ACL
ACL
Filters
Filters
cultural
heritage
objects
cultural...
lifecycle
events
lifecycle...
Collector
Collector
Query Index
Query Index
Query Index
Query Index
Query Index
Query Index
Inbox
Inbox
Datastore
Datastore
Linked Data Platform (LDP) API
Linked Data Platform (LDP) API
ACL
ACL
Viewer does not support full SVG 1.1

7. Technical aspects

7.1. WebID

Simple universal identification mechanism for the Web and a core aspect of Solid. Used in ErfgoedPod to identify acting organisations in the network (eg. a Cultural Heritage Institution, a Registry, ...)

Example: http://kb.nl#me

7.2. Linked Data Notifications (LDN)

Communication protocol between two actors in the network. Defines an inbox to receive an [LDN]. An inbox can be discovered thorugh a Link header when requesting a resource, like the WebID.

Example:

POST /inbox HTTP/1.1
Host: registry.nde.nl
Content-Type: application/ld+json;profile="https://www.w3.org/ns/activitystreams"
Content-Language: en

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "KB created dataset.ttl",
  "type": "Create",
  "actor": "http://kb.nl#me",
  "object": "created pod.kb.nl/dataset.ttl"
}

7.3. Eventlog

The eventlog is a mandatory log stored in each Pod or Service Hub (eg. Registry) that participates in the network. Lifecycle events of datasets (and other artefacts) are stored there.

Example:

pod.kb.nl/eventlog

@prefix lode: <http://linkedevents.org/ontology/>.
@prefix time: <http://www.w3.org/2006/time#>.

_:1 a lode:Event;
    lode:atTime [ 
        a time:Instant; 
        time:inXSDDateTimeStamp  2020-04-12T10:30:00+10:00 . 
    ];
    lode:involvedAgent <http://kb.nl#me>;
    dc:description "Created pod.kb.nl/dataset.ttl".

...
[2020-04-12T10:30:00+10:00] Created pod.kb.nl/dataset.ttl
[2020-04-12T11:30:00+10:00] Created pod.kb.nl/dataset-desc.ttl
[2020-04-12T12:30:00+10:00] Requested registration: pod.kb.nl/dataset.ttl with registry.nde.nl
[2020-04-12T13:30:00+10:00] registry.nde.nl registered pod.kb.nl/dataset.ttl

7.3.1. Rulebook

A rulebook is a configuration file with machine-readable business rules and is the driver for the Orchestrator component. It dictates what actions the Orchestrator should take when it is notified of an event (typically by an incoming Linked Data Notification).

8. Application to business processes

This section implements each business process from [bp-nde] using the architecture described in this document. The process description, the involvement of the § 6 Components, and other implementation details are noted in the following template:

Roles

The different actor roles that interact with the process.

Components

The components that are involved.

Goal

The final successful outcome that completes the process.

Stakeholders

Anybody or anything with an interest or investment in how the system performs.

Preconditions

The elements that must be true before a use case can occur.

Triggers

The events that cause the use case to begin.

Postconditions

What the system should have completed by the end of the steps.

Procedure

The process and steps taken to reach the end goal, including the necessary functional requirements and their anticipated behaviors.

In addition to these template, an example HTTP exchange is added for each process.

8.1. (BP1) Initialize a Digital Heritage Pod

Roles
Components
Goal The organization has an operational Digital Heritage Pod.
Stakeholders
Preconditions A Cultural Heritage Object Maintainer and a reachable running Solid Data Pod are provided by the Cultural Heritage Institution.
Triggers A Cultural Heritage Object Maintainer wants to share metadata with Solid networks.
Postconditions A Solid Pod has been initialized as Digital Heritage Pod. It is running and is reachable by other actors in the Decentralized Digital Heritage Network.
System
  1. The Maintainer logs in into the Dashboard and enters the base URL of a Solid Data Pod.
  2. The Dashboard discovers the location of the Inbox resource by requesting the pod’s base URL. If authorization is required, the Dashboard requests access to the inbox.
  3. The Dashboard creates an eventlog resource inside the Solid Data Pod and stores the URI.
  4. If the event log was created successfully, the Dashboard informs the Maintainer that the initialization is complete.

Example HTTP Sequence

use a more generic organization domain than kb.nl

8.2. (BP2) Register an Orchestrator for a Digital Heritage Pod

Roles
Components
Goal The organization has an operational Orchestrator connected to its Digital Heritage Pod.
Stakeholders
Preconditions There is a reachable Digital Heritage Pod within the Cultural Heritage Institution.
Triggers A Cultural Heritage Institution wants to share metadata with Solid networks.
Postconditions An orchestrator is running and is reachable by the institution.
System
  1. The Maintainer logs into the Dashboard and starts the registration of a new orchestrator service.
  2. The Dashboard starts an Orchestrator process locally or uses an Orchestrator Web service. Once complete, the orchestrator process returns the location of the Orchestrator's Inbox resource.
  3. The Dashboard supplies a location of an Inbox resource - most likely that of a Digital Heritage Pod - to the Orchestrator by sending a notification to its inbox. The orchestrator now watches the supplied Inbox resources for incoming notifications.
  4. The Dashboard informs the Maintainer that the Orchestrator service is up and running.
  5. Next, the Maintainer configures a list of rulebook file locations, which are used by the Dashboard to initialize the Orchestrator.
  6. The Orchestrator acknowledges that the rulebooks are now active. The Dashboard informs the Maintainer that this completes the initialization.

Example HTTP Sequence

8.3. (BP3) Adding a Cultural Heritage Institution to the Registry

Roles
Components
Goal The network is aware of the organization: it can be discovered by exploring or querying the Registry.
Stakeholders
Preconditions
Triggers A Cultural Heritage Institution wants to join the network
Postconditions Organisation Profile is added to the search query index of the Registry.
The Cultural Heritage Institution is aware that it is registered with the Registry
System
  1. The Maintainer uploads or curates the Organisation Profile using the Dashboard. The profile is made available through a URI, which can coincide with the institution’s persistent identifier.
  2. A Maintainer of the Cultural Heritage Institution then uses the Dashboard to start registration.
  3. The Orchestrator of the Cultural Heritage Institution detects the registration event and sends a notification on the institution’s existence to the Inbox of its preferred Registry(s). It contains the organization’s identifier and a link to the profile.
  4. When notified, the Orchestrator of the Registry follows the link to the Organisation Profile and attempts to download it.
  5. *Optional step in case the Organisation Profile is not openly licensed:* when the Registry has read access to the Organisation Profile in the Digital Heritage Pod’s ACL, it is granted access. Else, the download is refused.
  6. When downloaded, the Orchestrator of the Registry stores the profile in its Digital Heritage Pod and adds it to the Query Index.
  7. After completion, the Orchestrator of the Registry sends a notification to the institution’s inbox that the registration was successful.

Example HTTP Sequence

split sequence with and without orchestrator?

8.4. (BP4) Adding a new Dataset to the Registry #

is there a new version of the dataset or of the dataset description?

Roles
Components
Goal The Dataset’s location shows up in search results when querying the Registry with dataset-level metadata.
Stakeholders
  • Other Cultural Heritage Institutions
  • Applications, Portals and services
  • The Knowledge graph
Preconditions
Triggers A new Dataset is ready for publication
Postconditions
  • The Metadata Dataset is added to the search query index of the Registry.
  • The Cultural Heritage Institution is aware that the Dataset is registered with the Registry, because it has confirmation that the Metadata Dataset was accepted and processed.
Procedure
  1. A Maintainer of the Cultural Heritage Institution exports a new Dataset from its Collection Management System and stores it in the Data Pod. The Dataset is given a persistent URI through which it is remotely accessible.
  2. Using the Dashboard, the Maintainer creates a ‘Metadata Dataset’ profile with a brief description of the Dataset (object descriptions, annotations), which is also stored in the Data Pod.In practice, the Collection Management System will adopt the role of Dashboard and the creation of the profile happens transparently.
  3. The Maintainer uses the Dashboard to start the registration of the Dataset.
  4. The Orchestrator of the Cultural Heritage Institution detects the dataset registration event and sends a notification on the existence of the published Dataset to the preferred Registry’s Inbox. It contains the identifier of the Dataset and the link to the Metadata Dataset.
    When notified, the Orchestrator of the Registry follows the link to the Metadata Dataset and attempts to download it.
  5. Optional step in case the Metadata is not open licensed: When the Registry has reading access to the Metadata Dataset in the Data Pod’s ACL, it is granted access. Else, the download is refused.
  6. After the Metadata Dataset has been downloaded, the Orchestrator of the Registry adds the metadata with the Dataset’s location to the Query Index for search.
  7. After the indexing has completed, the Orchestrator of the Registry sends a notification to the institution’s inbox that the Dataset was registered.

Example HTTP Sequence

8.5. (BP5) Updating a Dataset in the Registry #

Roles
Components
Goal Reflect latests changes of a registered Dataset in the query results of the Registry
Stakeholders
Preconditions
Triggers A new version of a Dataset is ready to be released.
Postconditions
Procedure
  1. A Maintainer of the Cultural Heritage Institution exports a new version of an existing Dataset from its Collection Management System.
  2. The Maintainer updates the Dataset in the Data Pod using the Dashboard. An updated Metadata Dataset is created.In practice, the Collection Management System will adopt the role of Dashboard and the update happens transparently.
  3. The Orchestrator of the Cultural Heritage Institution picks up this update event and sends a notification to the Registry’s inbox that the Dataset has been updated.
  4. The Orchestrator of the Registry downloads the updated Metadata Dataset and adjusts its index entries if the profile introduces changes.
  5. After completion, the Orchestrator of the Registry sends a notification to the institution’s inbox that the update was successful.

Example HTTP Sequence

8.6. (BP6) Adding a Registry to the ACL of a Digital Heritage Pod

Roles
Components
Goal The Registry can access the necessary data in the Digital Heritage Pod
Stakeholders /
Preconditions The Registry does not have access to the Digital Heritage Pod of the Cultural Heritage Institution
Triggers The Registry instance is unknown to the Digital Heritage Pod
Postconditions The Registry is added to the ACL list of the Digital Heritage Pod
Procedure
  1. The Maintainer opens the Dashboard and adds the instance to the list of relevant Registry’s.
  2. The Maintainer grants the Registry (partial) access to the Organization and Metadata Dataset.
  3. The Orchestrator of the Cultural Heritage Institution picks up this change and sends a notification to the Registry’s Inbox to inform it on its access control rights.

Example HTTP Sequence

is the orchestrator picking up new links or does it get a notification?

Roles
Components
Goal Cultural Heritage Institution B adds links to Objects of Cultural Heritage Institution A after becoming aware of links from Objects of Cultural Heritage Institution A to a Objects of Cultural Heritage Institution B
Stakeholders Knowledge Graphs
Browser/ Service Portals
Users
Preconditions A Dataset of Cultural Heritage Institution A known to the Registry contains links to objects of Cultural Heritage Institution B
The links have not been processed by the Registry yet.
Triggers The Registry has added or updated a Metadata Dataset in its Query Index
Postconditions Cultural Heritage Institution B has enriched its Datasets with links to Objects of Cultural Heritage Institution A
Procedure
  1. The Orchestrator of the Registry picks up new links in the Query index and selects the known Datasets that contain the Objects targeted by the links.
  2. Based on the Dataset’s origin, it sends a notification to the Inbox of Cultural Heritage Institution B about the found links.
  3. The Orchestrator of the Cultural Heritage Institution B picks up this change and generates new backlinks to Cultural Heritage Institution A, which it stores as a linkset in its Data Pod.

Example HTTP Sequence

8.8. (BP8) Subscribing to a topic

Roles
Components
Goal A Service Portal Provider or Network of Terms subscribes to a topic - defined by a query - at a Registry, of which it wants to receive notifications about related artefacts.
Stakeholders
  • Users
Preconditions
  1. A Registry is known to the Service Portal.
  2. The Service Portal has a query that defines the subscription topic.
Triggers A Service Portal wants to subscribe to artefacts or a certain topic.
Postconditions The Service Portal is subscribed to a dataset topic in the Registry.
Procedure
  1. A Service Portal sends a notification to the Registry requesting subscription to a dataset topic. This request includes metadata about the topic, ie. a list of Terms and the inbox of the Service Portal.
  2. The Registry adds the Service Portal's inbox URL to an index per term. If an index does not exist, the Registry creates it.
  3. When the indexing is completed, the Registry sends a notification to the Service Portal's inbox to confirm the subscription.

Example HTTP Sequence

Roles
Components
Goal A Service Portal Provider retrieved a discovered Dataset for further processing.
Stakeholders
  • Users
Preconditions
  1. The Registry is known to the Service Portal.
  2. The Service Portal is registered to a certain term at the Registry .
  3. The Registry has new or updated Metadata Dataset related to the desired topic.
Triggers The Registry has added or updated a Metadata Dataset in its Query Index.
Postconditions The Service Portal has retrieved the Dataset.
Procedure
  1. On the reception of a new or updated Metadata Dataset (via § 8.4 (BP4) Adding a new Dataset to the Registry # or § 8.5 (BP5) Updating a Dataset in the Registry #), the Registry cross-checks the list of subscriptions.
  2. If a Metadata Dataset matches a subscription term], the Registry sends a notification to the inbox of the Service Portal's that are subscribed. The notification contains a link to the Dataset, which was extracted from the Dataset Description.
  3. When receiving this notification, the Service Portal extracts the Dataset link and downloads the Dataset from the Cultural Heritage Institution's Digital Heritage Pod.
  4. After the download is complete, the Service Portal processes the Dataset.

Example HTTP Sequence

8.10. (BP10) Discovering a Term source

Roles
Components
Goal A Network of Terms Provider adds a new Term Source to its index.
Stakeholders
  • Users
Preconditions
  1. The Registry is known to the Network of Terms.
  2. The Network of Terms is registered to a certain term at the Registry .
  3. The Registry has new or updated Metadata Term Source related to the desired term.
Triggers The Registry has added or updated a Metadata Term Source in its Query Index.
Postconditions The Network of Terms has retrieved the Term Source.
Procedure
  1. On the reception of a new or updated Metadata Term Source (via § 8.4 (BP4) Adding a new Dataset to the Registry # or § 8.5 (BP5) Updating a Dataset in the Registry #), the Registry cross-checks the list of subscriptions.
  2. If a Metadata Term Source matches a subscription term], the Registry sends a notification to the inbox of the Service Portal's that are subscribed. The notification contains a link to the Term Source, which was extracted from the Dataset Description.
  3. When receiving this notification, the Network of Terms extracts the Term Source link and downloads the Term Source from the Cultural Heritage Institution's Digital Heritage Pod.
  4. After the download is complete, the Network of Terms processes the Term Source.

Example HTTP Sequence

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. 23 May 2017. REC. URL: https://www.w3.org/TR/activitystreams-vocabulary/
[LDN]
Sarven Capadisli; Amy Guy. Linked Data Notifications. 2 May 2017. REC. URL: https://www.w3.org/TR/ldn/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Informative References

[BP-NDE]
Miel Vander Sande. Use Cases & Business Processes Decentralized Digital Heritage Network. Living Standard. URL: https://erfgoedpod.github.io/usecases
[REQUIREMENTS-DATASETS]
David de Boer; Bob Coret. Requirements for Datasets. Living Standard. URL: https://netwerk-digitaal-erfgoed.github.io/requirements-datasets
[SOLID-PROTOCOL]
Sarven Capadisli; et al. The Solid Protocol. Editor’s Draft. URL: https://solidproject.org/TR/protocol/

Issues Index

create diagram when more is known about the components on mellon side
complete the list of lifecycle events.
use a more generic organization domain than kb.nl
split sequence with and without orchestrator?
is there a new version of the dataset or of the dataset description?
is the orchestrator picking up new links or does it get a notification?