A.M. Kuchling
akuchlin@mems-exchange.org
Release 0.34
October 15, 2002
This document collects my thoughts about implementing functionality required for the Matisse project. The plan is to actually implement and use this for the MEMS Exchange's internal architecture, because it's simple, small, provides the features we need, and should be readily implementable on top of our existing tools.
The goal of the Matisse project is to build an infrastructure for distributed scientific computing. This proposal aims to specify a nice, simple protocol for internal use at the MEMS Exchange that we can implement reasonably quickly and easily on top of Quixote and our other tools. For Matisse participants relying on the Sarnoff Matisse code, we could then build a gateway between the Matisse session manager and the interfaces described in this document.
This document describes the small set of basic interfaces, and tries to do so in enough detail to make writing the implementation and writing applications for it straightforward.
It turns out that the list of required interfaces is pretty short:
A client doesn't have to use every single interface. For example, CNRI's remote microscope only needs to use the Login and Authorization interfaces, as it doesn't generate any experimental data that needs to be analyzed further.
I've taken these interfaces and described bindings for accessing them using XML-RPC. The interfaces described here are implemented and running on http://www.mems-exchange.org; feel free to experiment with them.
To start out, let's define some terms:
Application: A client or server program that performs a particular task or controls a particular instrument. There will be a fixed list of applications, identified by string IDs. When you write a new application, you'll need to register the application ID by sending an e-mail to a human. Think of application IDs as being similar to MIME types. You can request a specific version of an application by appending a "/" and the version number to the application ID.
(A specification for version numbers is needed. Greg Ward wrote a lengthy spec that's described in the docstrings for Lib/distutils/version.py in the Python source tree. I won't copy that text right now, but it basically just formalizes the version numbers programmers are familiar with: i.e. 0.4, 0.4.1, 0.5a2.)
Service: A particular instance of an application, running somewhere on the network. Services can come and go fairly often. Services also have string IDs; they might be URLs, opaque strings, or something else.
For example, ``mx-microscope'' might be the identifier for the MEMS Exchange's remote microscope application, in any version. There would then be 5 services supporting this application, one for each of the remote microscope machines running a copy of the microscope server. Requesting ``mx-microscope/2.0'' would get the machines running version 2.0 of the application.
The basic idea is to define a minimal set of interfaces for storing and accessing data files, and to use RDF for indexing these files. This section will go through the interfaces in the order that an application such as the remote microscope client would likely use them. I'll use simple Python code to keep the examples short, but the interface should be usable from any language with an XML-RPC implementation. (See http://www.xmlrpc.com/ for a list.)
The interface consists of a bunch of XML-RPC methods. The first step in calling them is to create an XML-RPC server object. In Python, it looks like this:
>>> import xmlrpclib
>>> from mems.services import transport
>>> server = xmlrpclib.Server('http://ute.mems-exchange.org/xml/rpc',
transport = transport.CookieAwareTransport())
>>>
(That URL points to my personal workstation, and won't work unless you're inside CNRI. Use www.mems-exchange.org if you're an external user.)
First, you can get a list of available services. Services represent a single application server running on a particular host and port somewhere:
>>> server.list_services()
[{'application': 'http://services.mems-exchange.org/application/microscope',
'description': '',
'host': 'cnf-uscope.mems-exchange.org',
'id': 'cnf-scope',
'name': 'cnf-scope',
'port': 19000},
{'application': 'http://services.mems-exchange.org/application/microscope',
'description': '',
'host': 'mx-uscope.mems-exchange.org',
'id': 'mx-scope',
'name': 'mx-scope',
'port': 19000},
...
]
In XML-RPC terms, this is an array of structs, which maps to a list of
dictionaries in Python, a Vector of Hashtables in Java, something
appropriate in Tcl, &c. Each struct contains information about a
single service. Services are referred to by their ID (cnf-scope,
mx-scope), and run on a host and port somewhere. A service is a
particular application, identified by a URI.
I've chosen http://services.mems-exchange.org/application/microscope
for the remote microscope, for example.
You can also request a list of the servers for one particular application:
>>> server.list_services('http://services.mems-exchange.org/application/datastore')
[{'application': 'http://services.mems-exchange.org/application/datastore',
'description': '',
'host': 'ute.mems-exchange.org',
'id': 'mx-datastore',
'name': 'mx-datastore',
'port': 80}]
>>>
Or the information about a particular service:
>>> server.get_service('mx-scope')
{'application': 'http://services.mems-exchange.org/application/microscope',
'description': '',
'host': 'mx-uscope.mems-exchange.org',
'id': 'mx-scope',
'name': 'mx-scope',
'port': 19000}
>>>
There's no programmatic interface for adding services or removing them; right now there's a text file that I'll edit manually, made up of entries like this:
Service: mx-datastore Application-URI: http://services.mems-exchange.org/application/datastore Host: ute.mems-exchange.org Port: 80 Key: 17e4d18f0348f191189aee1e47185bb5
What's the key for? That will be explained in the next section.
To get access to a service, you need to present a ticket for that service, an idea borrowed from Kerberos, and you need to provide your user name and password to get a ticket.
>>> server.get_service_ticket('akuchlin', 'password', 'mx-scope')
'peOYGwxQ8HOkt3LR5zXXTHTqweNw+aB3gTPr3/MqIPBD4SrbGG4/c62imaZm8ciF'
This gets me a ticket for the mx-scope service, under the user ID 'akuchlin'.
The ticket is a chunk of data, encrypted with the service's key, given as the Key: line. (Section 4.2 of this document specifies the ticket format.) This means the ticket is opaque to clients, which can't do anything to a ticket beyond handing it to a service. Tickets contain the service and user ID they were generated for, and a lifetime; tickets expire after a given time (6 hours, in my current code).
Clients will therefore prompt the user for user name and password, and request a ticket for the service selected by the user. This ticket can then be sent to the service using some protocol (unspecified -- that's up to the application). The service then checks the ticket, and grants or denies access accordingly.
If you're writing a service in this model, you'll therefore need to implement handling of tickets in your programming language of choice. That shouldn't be difficult, as the specification is pretty short and simple.
You don't need a ticket to deal with the single central service, though. XML-RPC requests to our server will set a cookie containing the ID of your server session. When you can call the login() method, your session will be flagged as authenticated. (Note that this requires your client software to read and interpret the HTTP "Set-Cookie" header.)
>>> server.login('akuchlin', 'password')
'' # Returns nothing useful
>>>
Other XML-RPC requests to the server will return an error if you haven't logged in.
After connecting to an instrument and using it for an hour, you want to save a file of data to storage, plus some metadata about the file (creator, contents, time of measurement, &c).
First, the client requests a location to store the data:
>>> url = server.new_cache_location() >>> url 'http://ute.mems-exchange.org/data/akuchlin/1019068230/'
You get back a URL, which is how you'll access this data in future.
So how do you store the data and the corresponding metadata? For the
data file, you do an HTTP PUT to the URL. I'll assume an existing
function PUT(url, body) that does a PUT to the given URL with the
contents of 'body' as the body of the HTTP request, and returns the numeric
response code. You'll therefore do a PUT to a URL like
http://ute.mems-exchange.org/data/akuchlin/1019068230/. The
HTTP request should also contain the requisite "Cookie" header,
so that the session tracking will work.
>>> resp = PUT(url, 'contents of data file go here') >>>
'resp' should be 204, the HTTP status code for a '204 Nothing' response. To read the data again, simply do an HTTP GET on the same URL:
>>> f = urllib.urlopen(url) >>> f.read() 'contents of data file go here' >>> f.close() >>>
Metadata is handled similarly, by PUTting to and GETting from
url +
'/metadata':
http://ute.mems-exchange.org/data/akuchlin/1019068230/metadata,
in this case.
The data can be in any format that your application requires, but the metadata has to be a well-formed XML file because it will be run through an XML parser. To be most useful, this XML file should contain some RDF information, but the server doesn't check that the metadata contains RDF.
>>> resp = PUT(url + '/metadata' + '?' + ticket,
"""<?xml version="1.0"?>
<metadata>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:s="http://xml.mems-exchange.org/schema/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<Description about="http://ute.mems-exchange.org/data/akuchlin/1019068230/">
<dc:Creator>akuchlin</dc:Creator>
<s:Run>45</s:Run>
</Description>
</rdf:RDF>
<element />
</metadata>""")
>>>
Again, you should get an HTTP 204 Nothing response.
In RDF, metadata is encoded as a collection of subject,predicate,object triples. For example, in the example above:
<Description about="http://ute.mems-exchange.org/data/akuchlin/1019068230/">
<dc:Creator>akuchlin</dc:Creator>
<s:Run>45</s:Run>
</Description>
http://ute.mems-exchange.org/data/akuchlin/1019068230/, or rather
the data file at that URL, is the subject. Creator and Run are
predicates, with corresponding objects 'akuchlin' and '45'. So this
is saying:
<datafile> http://purl.org/dc/elements/1.1/Creator 'akuchlin' <datafile> http://xml.mems-exchange.org/schema/Run '45'
RDF works in terms of schemas. One such schema is Dublin Core
(http://www.dublincore.org), which specifies Title, Subject,
Creator, and a bunch of additional fields that roughly correspond to
the fields on a card in a library catalog. The MEMS Exchange would
like to associate our own specialized data with these files, such as
the run and step number, so we can just invent our own schema with Run
and Step fields, and give it the identifying URI
http://mems-exchange.org/ns/files/1.0#.
This means that the metadata is nicely expandable. If the MIT
Microvision developers need their own metadata, they can assign an URI
for their schema, such as
http://mtl.mit.edu/xml/something#, and give it fields such
as 'Experiment-type' and 'Driving-signal', or whatever. They can
store and retrieve their own metadata without requiring any changes to
the server; I'm can write simple applications that work with Dublin
Core fields and aren't affected by the presence of extra metadata.
(Should I expand this explanation of RDF? Let me know... In the meantime, see http://www.w3.org/RDF/ for more about RDF.)
This finishes the tutorial section.
Allows listing the available services, and getting the information for a particular service.
The struct describing a service contains a few key/value pairs:
| Key | Value |
|---|---|
id |
(String) Service ID, e.g. "mx-scope" |
application |
(String) Application ID, e.g. "mx-microscope/" |
name |
(String) Name of service ("CNRI Leica INS1000") |
description |
(String) Long description of this service, suitable for use as help text when offering the user a choice of services. |
host |
(String) Hostname of this service. |
port |
(Integer) Port number for this service. |
More keys may be added later.
Lets users request a ticket for a given service. Services can then verify that this ticket is correct, and can trust that the user is in fact who she claims to be. Tickets can be used exactly once.
Each service has a secret key for the Advanced Encryption Standard (AES, formerly known as Rijndael) algorithm; this key is also known to the central server that generates tickets, and is used to generate authentication tickets identifying a user to a service. (If all this reminds you of Kerberos, that's not surprising; it's very heavily influenced by the Kerberos design.)
Tickets contain the following information:
This string is then encrypted with the service's AES key in CBC mode, using an initialization vector of binary zeros. The encrypted data is then encoded as a series of hex digits, and the resulting string comprises the ticket.
On being presented with a ticket, a service must go through the following steps to check it. Invalid tickets should always cause a log entry to be written, because invalid tickets should never be encountered in normal usage.
This interface lets an application check whether user X is permitted to use the application. The authorization database is therefore centralized, and individual services generally won't manage their own databases (though they can, if the author decides to)
Note that there's no defined interface for managing authorizations by adding or removing them. That's done through some other mechanism specific to the implementation of central server. For example, we might store the database as a plain text file and just have the administrators SSH to the machine to edit that file. No such mechanism will be defined in this document, to mitigate flaws in this protocol. If an attacker manages to steal an administrator's password or breaks the ticket format, the attacker won't be able to grant or remove rights from other people.
Currently this interface is very simple, restricted to a yes-or-no decision, so a user can either access all features of a service or none of them. The idea of privileges and/or roles may need to be introduced later, but as we haven't yet needed privileges for the virtual fab system, there's no real evidence for that.
This interface lets an application submit a metadata record to be indexed, and allows getting a location to store the experimental data. This location is a cache; some other application will have to worry about moving it from the cache to some more permanent storage.
The data is then stored by doing an HTTP PUT to the URL. The metadata is stored by doing a PUT to the URL formed by adding '/metadata' to the URL for the data.
This section proposes a generic format for metadata about experimental results (and probably other artifacts as well).
The metadata for a single artifact is represented as an XML document. Usually it will contain a number of RDF triples; RDF can then be downloaded separately and searched.
Applications that want customized information of their own and aren't happy with RDF can include XML elements and attributes from non-RDF namespaces.
An example metadata file might look like this:
<metadata>
<!--Section 1: RDF-->
<RDF:RDF xmlns:s="http://.../myschema/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<RDF:Description about="http://URL/...">
<dc:title>Device #1: microscope image</dc:title>
<dc:creator>akuchlin</dc:creator>
<dc:identifier>http://URL/...</dc:identifier>
<dc:description>An image of a given device.</dc:description>
<!--Definitions for someone else's schema-->
<s:term1>Value for term1</s:term1>
<s:term2>ABC</s:term2>
</RDF:Description>
</RDF:RDF>
<!--Section 2: anything you like-->
<myelement attr='1'>
...
</myelement>
</metadata>
The only RDF vocabulary specified in this document will be the 14 Dublin Core fields. We may eventually hammer out our own more elaborate vocabulary for the MEMS field, but that's left for future work.
Things we may want to think about:
This document was generated using the LaTeX2HTML translator.
LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.