Skip to content

Global Architecture

cmarat edited this page Jan 28, 2013 · 1 revision

Permalink

Global architecture

Goal

Implement an entity registry that works both in online and offline settings. During the on-line mode, the registry uses information coming from a centralised data server whereas in the offline setting the information will come from other instances of the registry. Every entity instance potentially contributes information to the global data space, either as a caching or an authority source.

Design considerations

  1. Use URIs as identifiers. The URIs are not restricted in terms of domain but use the HTTP protocol.
  2. Use open standard supported by the W3C, including those currently in discussion (c.f. work from the Linked Data Platform - LDP - working group).
  3. Focus on command line, API, and network based interfaces.
  4. Validation and stress test performed on the XO-1 from OLPC and the SheevaPlug from GlobalScale Technologies.
  5. Implementation essentially done in Python 2. Eventual use of Java for the centralised data server. Eventual use of compiled (C, C%2B%2B, …) low level data backend for increased performances on the XO.

Target features

  1. Search for the existence of an identifier
  2. Get the list of properties/values associated to an identifier with a dedicated API call
  3. Get the list of properties/values associated to an identifier with an HTTP access to the identifier
  4. Edit the properties/values associated to an identifier
  5. Create a new identifier
  6. Delete an identifier and all the related properties/values

Target milestones

The entity registry is developed in an incremental way with target milestones for the implemented features. Every new milestone is based on the work of the previous milestone, enhancing it with new features.

M4: System that allows to maintain locally a set of identifiers and their description (i.e. list of properties values)

Using this implementation, one can:

  1. Check for the existence of a particular identifier
  2. Retrieve the description associated to an identifier (lookup)
  3. Edit the description of an identifier
  4. Delete an identifier and its associated description

This system features a local consistency mechanism ensuring there is no duplicate identifiers.

Tasks:

  1. Teodor: check M. Hausenblas’ and assess wether it could be used as a basis for the back-end
  2. Teodor: install and test ld-in-couch or a cassandra-based solution (i.e. cumulusrdf - or else...); might be a good idea to use Python if we want do deploy on the XOs (TBD)
  3. VU: test if couchdb runs on the XO, also implement the same with SemanticXO

M7: Connection of several systems in a network

The additional target features are:

  1. Distributed lookup within the network of registry instances
  2. Replication of entity descriptions across the instances

The publication of entity is still local to the instances. The consistency check is extended to ensure the non existence of duplicate identifiers across all the instances

M9: Addition of centralised system

The network of instances of the registry is enhanced with the access to a centralised authority allowing for the publication/removal of identifiers. Added features:

  1. Access to the description of an identifier through HTTP requests to the centralised server
  2. Publication and removal of authoritative description of identifiers

M12: Integration with other registries

Integration of the system with registries such as DNS. Also inclusion of LOD data for the identifiers which happen to be also described in this data space. Added feature:

  1. Integration of DNS records, resolution of identifier from these records
  2. Integration of LOD data as part of the description of an identifier

Additional feature and design issues

These points are likely to be desired features but are not mission critical

  1. Encryption of the data
  2. Compression of the data to save bandwidth
  3. Full track of provenance of the information as part of the description of an identifier
  4. ACLs for the access to the information (probably to be coupled with provenance)
  5. Versioning

Open (design) issues

  1. If the system is used to describe dbpedia:Amsterdam, how do we ensure that this description is accessible through HTTP? Ideas:

  2. Tweak the URI to have something like wikireg.com/entity/dbpedia:Amsterdam, add the description from dbpedia:Amsterdam as additional, external, information.

Related work

  1. OKKAM: centralised entity registry
  2. LD-in-Couch: entity store whose goals are similar to the system targetted for M4 ()
  3. Work by Alethia Hume on distributed entity search ()