Category Archives: JavavScript

JSON-LD and MongoDB

For the last several months, I’ve been engaged in an interesting project with Wikia. Wikia hosts hundreds of thousands of special-interest wikis for things as varied as pokemon, best cellphone rate comparisons, TV shows and Video Games.

For those of you not aware of Wikia, it is an outgrowth of the MediaWiki and was founded by Jimmy Wales as a for-profit means of using the MediaWiki platform for exactly such interests.

Recently MediaWiki Deutschland started work on WikiData, an effort to use Semantic Web principles to create a factual knowledge base that can be used within Wikis (typically to replace Infobox information, which can vary between different language versions). This is a somewhat different direction than Semantic Media Wiki, which is more about using Wiki markup to express semantic relationships within a Wiki. As it happens JSON-LD is being considered as the data representation model for WikiData.

Linked Data at Wikia

As it turns out, Wikia has been quite interested in leveraging these tools. I did mention that Wikia is a for-profit company; one way they do this is through in-page advertising, but the amount of knowledge curated by the hundreds of thousands of communities is staggering. Unfortunately, native Wiki markup just isn’t that semantic. However, much of the information represented is factual (at least within the world-view of the wiki community).

To that end, I’ve been working on an experiment using JSON-LD and MongoDB to power a parallel structured data representation of much of the information contained in a wiki. The idea is to add a minimal amount of markup (hopefully) to the Wiki text and templates so that information can be represented in the generated HTML using RDFa. This allows the content of the Wiki to be mirrored in a MongoDB-based service using JSON-LD. Once the data has been freed from the context of the limited Wiki markup, it can now be re-purposed outside of the Wiki itself.

Knowledge modeling and data representation

Why use RDFa and not Microdata? The primary driver is the need to use multiple vocabularies to represent information. In my opinion, any new vocabulary needs to take into consideration schema.org; microdata works great with schema.org, and can generate RDF (see Microdata to RDF) as long as you’re constrained to a single vocabulary, don’t need to keep typed data, and don’t need to capture actual HTML markup. Unfortunately, any serious application beyond simple Search Engine Optimization (SEO) does need to use these features. In our case, much of the interesting data to capture are fragments of the Wiki pages themselves. Moreover, the content of any Wiki, much less one that has as much special meaning as, say, a Video Game, needs to describe relationships that are not natively part of the schema.org vocabulary. Schema does provide an extension mechanism partly for this purpose, and recently the ability to tag subjects with an additional type, not part of the primary vocabulary (presumably schema.org) was introduced. But, once the decision is made to use multiple vocabularies, RDFa has better mechanisms in place anyway.

At Wikia, we define a vocabulary as an extension to schema.org, that is, the classes defined within that vocabulary are sub-classes of schema.org classes, although typically the properties are not sub-properties of schema.org properties (we may revisit this). For example, a wikia:VideoGame is a sub-class of schema:CreativeWork, and a wikia:WikiText is a sub-class of schema:WebPageElement. There are additional class and property definitions to describe the structural detail common to Video Games in describing characters, levels, weapons, and so forth. An RDFa description will assert both the native class (e.g., wikia:VideoGame) and the schema.org extension class (e.g. schema:CreativeWork/VideoGame). This allows search engines to make sense of the structured data, without the need to understand an externally defined vocabulary.

However, for Wikia’s purposes, and that of people wanting to work with in the Wikia structured-data echo-system, having a vocabulary that models the information contained within Wikia Wikis can be of great benefit. Key to this is knowning how much to model with classes and properties, and how much to leave to things such as naming conventions and keywords. In fact, there are likely cases where more per-wiki modeling is required, and we are continuing to explore ways in which we can further extend the vocabularies, without imposing a large burden on ontology development, and to keep the data reasonably generically useful.

Linked Data API

Although RDFa structured in HTML can be quite useful as an API itself, modern Single Page Applications are better served through RESTful interfaces with a JSON representation. JSON-LD was developed as a means of expressing Linked Data in JSON. It is fully compatible with RDF. Indeed, many of the concepts used in RDFa can be seen in JSON-LD – Compact IRIs, language- and datatyped-values, globally identified properties, and the basic graph data model of RDF.

Furthermore, a JSON-LD-based service allows resource descriptions, that may be spread across multiple HTML pages, to be consolidated into individual subject definitions. By storing these subject definitions in a JSON-friendly datastore such as MongoDB, the full power of a scaleable document store becomes available to the data otherwise spread out across numerous Wiki pages. But, the fact that the JSON-LD can be fully generated from the RDFa contained in the generated Wiki pages, ensures that the data will remain synchronized.

In the future, with the growth and adoption of systems such as WikiData, we can expect to see much of the factual information currently expressed as Wiki markup moved to outside services. The needs of the Wiki communities remain paramount, as they are at the heart of the data explosion we’ve seen in the hundreds of thousands of Wikis hosted at Wikia and elsewhere, not to mention WikiPedia and related MediaWiki projects.

As the communities become more comfortable with using knowledge stores, such as WikiData and Wikia’s linked data platform, we should see a further explosion in the amount of structured information available on the web in general. The real future, then, relies not only in the efforts of communities to curate their information, but in the ability to use the principles of the Semantic Web and Linked Data to infer connections based on distributed information.

I’ll be speaking more about JSON-LD and MongoDB at NoSQL Now! later this week in San Jose. Slides for my talk are available on slideshare.

BrowserID versus DDOS

BrowserID vs DDOS
How BrowserID saved the RDFa Test Suite from a DDOS (Distributed Denial of Service Attack).

This article is the third in a three-part series on implementing the RDFa
Test Suite
. The first
article

discussed the use of Sinatra, Backbone.js and Bootstrap.js in
creating the test harness. The second
article

discussed the use of JSON-LD. In this article, we focus on our use of
BrowserID in responding to a Distributed Denial of Service Attack
(DDOS).

RDFa Test Suite

Working on the updated RDFa Test Suite has really been a lot of fun.
It was a great opportunity to explore new Web application technologies,
such as Bootstrap.js and Backbone.js. The test suite is a
single-page web application which uses a Sinatra based service to run
individual test cases.

The site was becoming stable, and we were starting to flesh out more test
cases for odd corner cases, when the site started to not respond. Manu
Sporny
, who’s company Digital
Bazaar
is kindly donating hosting for the web
site, noticed that there were a number of Ruby processes that were
consuming available Ruby workers, and causing new requests to
block. The service is fairly resource intensive, as it must invoke an
external processor and run a SPARQL query over the results for each
test. It seemed as if the site was being hammered by a large number of
overzealous search crawlers! Naturally, we put a robots.txt in place,
expecting that conforming search engines would detect the site’s crawl
preferences and back off, but that didn’t happen. Upon further examination
of the server logs, we noted requests were streaming in from all over the
world! Clearly, we were under attack. (Who might wish ill of the RDFa
development effort? Who knows, but most likely this was just an anonymous,
and not specifically malicious attack).

My first thought was to make use a secret api token, configured into the
server and the web app, but that didn’t really do the trick either; it
seemed that modern day malware actually just executes the JavaScript, so
it picks up the API key naturally!

BrowserID to the Rescue!

Okay, how about authentication? It’s typically a pain, and we were
reluctant to put up barriers in front of people who might want to test
their own processors or see how listed processors perform. The two current
contenders are WebID and BrowserID.

WebID has the laudable goal of combining personally maintained profile
information with SSL certificates (it was previously known as FOAF+SSL).
Basically, it’s a mechanism to allow users to use a profile page as
their identity. This could come off of their blog, Facebook, Twitter or
other social networking site. By configuring an SSL certificate into the
browser and pointing to their profile page, a service can determine that
the profile page actually belongs to the user. (There’s much more to it,
you can read more in the WebID
Spec
). A key advantage here
is that the service now has access to all of the self-asserted information
the user want’s to provide about themselves as defined in their profile
page, such as foaf:name, foaf:knows, and so forth. The chief downside
is that the common source of existing user identities in the world haven’t
bought into this, and there’s a competing solution that offers similar
benefits.

BrowserID is a Mozilla initiative to enable people with e-mail
addresses to use those e-mails to login to websites, kind of like
[OpenID][] – only more secure. Basically, as I understand it, a service
wanting to support this would include the BrowserID JavaScript client
code in their application and use a simple Sign In button that invokes
this code. That sends a request off the the identity provider (IDP) to
authenticate the user, which has probably already happened in the past and
maintained in a cookie. The IDP then sends a response which invokes a
callback. The client then does a call back to the service to complete the
login passing the assertion.

The beauty is, using a tool such as the sinatra-browserid Ruby gem,
this becomes dirt simple! Basically, on the API side, put in a call to
authorized? to determine if the user is authorized. If not, either
direct them to a login screen, or in the case of the RDFa Test Suite,
place an informational message telling them why we need them to login, and
identify the BrowserID button at the top of the page.

In the principle entry-point to the test suite on the service side is
/test-suite/check-test/:version/:suite/:num. The only real change to
this method was to check for authorization before performing the test.

# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
  return [403, "Unauthorized access is not allowed"] unless authorized?

  # Get the SPARQL query
  source = File.open(File.expand_path("../tests/#{num}.sparql"))

  # Do host-language specific modifications of the SPARQL query.
  query = SPARQL.parse(source)

  # Invoke the processor and retrieve results, parsed into an RDF graph
  graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))

  # Run the query
  result = query.execute(graph)

  # Return results as JSON
  {:status => result}.to_json
end

In the banner, we add a little bit of Haml:

...
%div.navbar-text.pull-right
  - if email
    %p.email
      Logged in as
      %span.email
        = email
      %a{:href => '/test-suite/logout'}
        (logout)
  - else
    = render_login_button

When the page is returned, the email variable is set if the user is
authorized, so they’ll see the email address if they’ve authenticated, and
a login button otherwise. The render_login_button has handled entirely
by sinatra-browserid; no muss, no fuss!

The only other thing to do is to not show the test cases in the test
suite, unless the user has authenticated, which we can tell because
$("span.email") won’t be empty. In our application.js, we use this to
either show the tests, or an explanation:

// If logged in, create primary test collection and view
if ($("span.email").length > 0) {
  this.testList = new TestCollection([], {version: this.version});
  this.testList.fetch();
  this.testListView = new TestListView({model: this.testList});
} else {
  this.unauthorizedView = new UnauthorizedView();
}

That’s pretty much all there is too it. The only complication I faced is
that, when developing with shotgun, the session ID is changed with each
invocation, so it wasn’t remembering the login. By fixing the session
secret this problem went away. Total time from discovery of the problem to
deployed solution: about 1 hour. Not too bad.

It’s important to note that the RDFa Test Suite is stateless, and we
don’t really need any personal information; we don’t collect information
anywhere, even in our logs. BrowserID basically becomes a gate keeper
to help ward off abuse. It imposes a very low barrier of entry, so it
doesn’t interfere with people using the site anyway they choose.

I do miss other user asserted information, such as the user’s name and
so-forth. OpenID, another single-signon initiative
that has lost momentum lately, provides a Simple Registration
Extension

add-on that allows users to assert simple information such as nickname,
mail, fullname and so forth. IMO, the right way to do this is with
something like FOAF or the schema.org Person class. Perhaps
BrowserID will provide something like this in the future.

The Use of JSON-LD in the RDFa Test Harness

This article is the second in a three-part series on implementing the RDFa Test Suite. The first article discussed the use of Sinatra, Backbone.js and Bootstrap.js in creating the test harness. In this article, we focus on JSON-LD, a Linked Data technology that complements RDFa is creating modern Web applications.

Test Manifest

The RDFa test manifest is a Turtle document used to specify the tests that apply to different versions and host languages in RDFa. Turtle is a great language for representing information in a reasonably human-understandable way. Most people authoring RDF by hand stick to Turtle, because of it’s ease of use and concise way of expressing Linked Data graphs. For example, to specify a specific test entry, we could write some Turtle as follows:

<test-cases/0001> a test:TestCase;
   dc:title "Predicate establishment with @property";
   rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
   rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
   test:classification test:required;
   test:informationResourceInput <test-cases/0001.html>;
   test:informationResourceResults <test-cases/0001.sparql> .

Basically, this defines a (relative) URL identifying the test case, gives it a title, describes the relevant RDFa versions and host languages, says it’s required, and shows the files used to provide input and to test the results. The problem is, this is not a convenient form to use programatically. Modern Web applications make use of JSON for representing data, for one reason because JSON can be represented natively in JavaScript, but also because it has a convenient representation in Ruby and other languages.

Let’s look at the equivalent test representation in JSON-LD:

{
  "@context": "http://rdfa.info/contexts/rdfa-test.jsonld",
  "@graph": [
    {
      "@id": "http://rdfa.info/test-suite/test-cases/0001",
      "@type": "test:TestCase",
      "num": "0001",
      "classification": "test:required",
      "description": "Predicate establishment with @property",
      "input": "http://rdfa.info/test-suite/test-cases/0001.html",
      "results": "http://rdfa.info/test-suite/test-cases/0001.sparql",
      "expectedResults": true,
      "hostLanguages": ["html4","html5","xhtml1","xhtml5","xml"],
      "versions": ["rdfa1.0","rdfa1.1"]
    }
  ]
}

Other than the encapsulating elements, this looks pretty similar to the Turtle representation. There are a couple of differences though: instead of dc:title, we use the term description, instead of rdfatest:hostLanguage, we use hostLanguages. How are these related? The key is looking at the @context value. Looking at http://rdfa.info/contexts/rdfa-test.jsonld, we see the following:

{
  "@context": {
    "dc":         "http://purl.org/dc/terms/",
    "xsd":        "http://www.w3.org/2001/XMLSchema#",
    "rdfatest":   "http://rdfa.info/vocabs/rdfa-test#",
    "test":       "http://www.w3.org/2006/03/test-description#",

    "classification": {"@id": "test:classification"},
    "contributor":    {"@id": "dc:contributor"},
    "description":    {"@id": "dc:title"},
    "expectedResults":{"@id": "test:expectedResults",
                       "@type": "xsd:boolean"},
    "hostLanguages":  {"@id": "rdfatest:hostLanguage",
                       "@container": "@set"},
    "input":          {"@id": "test:informationResourceInput",
                       "@type": "@id"},
    "num":            {"@id": "rdfatest:num"},
    "purpose":        {"@id": "test:purpose"},
    "versions":       {"@id": "rdfatest:rdfaVersion",
                       "@container": "@set"},
    "reference":      {"@id": "test:specificationReference"},
    "results":        {"@id": "test:informationResourceResults",
                       "@type": "@id"}
  }
}

The context does exactly that: it provides a context for interpreting JSON data. Note the definition of hostLanguages: this indicates that hostLanguages is a term definition, meaning that the term is replaced with the @id value, in this case rdfatest:hostLanguage, the same as used in Turtle. Both of these expand to an equivalent IRI http://rdfa.info/vocabs/rdfa-test#hostLanguage. In RDF, and in Linked Data in general, everything is described as a resource, either an IRI, a Literal or a Blank Node (basically a variable representing something we don’t know or don’t want to identify). The "@container": "@set" bit just says to expect that the value of hostLanguages will always be an array, to make processing more convenient.

Because we use terms in JSON Object key positions, this means that access from JavaScript can be quite convenient. Taking a look at the test suite Test model description, we can download the Manifest with an Ajax request and access elements using ‘.’ notation, such as the following:

var filteredTests = _.filter(this.loadedData, function(data) {
  return _.include(data.versions, version) &&
         _.include(data.hostLanguages, hostLanguage);
});

Another advantage in using JSON is that the parse time is negligible. The manifest has about 3000 triples, which can actually take a while to parse as Turtle, but opening and parsing the JSON document is substantially faster.

As with many modern Web applications, the RDFa Test Suite is a single-page application that uses Ajax calls to communicate with the server. The first call is to retrieve the JSON manifest. Subsequent calls retrieve test results, also expressed as JSON. The manifest is used to populate a Backbone.js Collection. When a specific version and hostLanguage is selected, this collection is filtered to show only relevant tests, as is described in the previous example. The Collection then drives a view element, which instantiates a view for each model to be tested.

Collating Test Results

The second area where JSON-LD is used within the RDFa Test Suite is for collating test results. After running a series of tests, a test user can generate EARL test results. Being an RDFa test suite, this report is naturally expressed in RDFa. Here the Backbone.js view technology comes in to play, since it is easy to use an HTML template to generate individual results, with the RDFa markup backed into the template.

The basic EARL template looks like the following:

<script id='earl-item-template' type='text/template'>
  <h4>
    [
     <span property='rdfatest:rdfaVersion'><%= version %></span>
     <span property='rdfatest:hostLanguage'><%= hostLanguage %></span>
    ]
    Test <%= num %>:
    <span property='dc:title'><%= description %></span>
    <span property='earl:mode' resource='earl:automatic' />
  </h4>
  <p property='dc:description'><%= purpose %></p>
  <div class='property processorURL resource detailsURL'
       typeof='earl:Assertion'>
    <span property='earl:assertedBy' resource='' />
    <span class='resource processorURL' property='earl:subject' />
    <span class='resource docURL' rel='earl:test' />
    <p property='earl:result' typeof='earl:TestResult'>
      Result:
      <strong class='resource outcome'
              property='earl:outcome'
              resource=''><%= result %></strong>
    </p>
  </div>
</script>

The Earl view uses this template to generate a report for an individual test entry and fills in attribute or content values from within the view:

var EarlItemView = Backbone.View.extend({
  template: _.template($('#earl-item-template').html()),

  render: function () {
    var JSON = this.model.toJSON();
    JSON.processorURL = this.options.processorURL;

    this.$el.html(this.template(JSON));
    this.$el.attr("resource", this.model.docURL());
    this.$(".property.processorURL")
      .attr("property",JSON.processorURL);
    this.$(".resource.processorURL")
      .attr("resource", JSON.processorURL);
    this.$(".resource.detailsURL")
      .attr("resource", this.model.detailsURL());
    this.$(".resource.docURL")
      .attr("resource", this.model.docURL());
    this.$(".resource.outcome")
      .attr("resource", 'earl:' +
                        this.model.get('result').toLowerCase());
    return this;
  }
});

The result is a test result for a specific processor with a specific RDFa version and host-language. You can see an example report here.

However, this is not the end of it; to exit the W3C Candidate Recommendation phase, it’s necessary to have at least two interoperable implementations. What is needed, then, is a collated report that combines the output from several different processors into a single report. Because each individual report is an information resource representing a specific RDF graph, we can parse all of these documents into a single graph. But, to generate an HTML result, it would be convienent to have all the data available in a format convenient to use with Ruby Haml.

This is where JSON-LD use in languages like Ruby come to play. Ruby has great libraries for working with JSON, which basically transforms the JSON to a combination of Ruby native Hash, Array, String, Number and Boolean values. A JSON-LD representation a test assertion entry looks like the following:

{
  "@id": "http://rdfa.info/test-suite/test-details/rdfa1.1/...",
  "@type": "earl:Assertion",
  "assertedBy": "http://rdfa.info/test-suite/",
  "test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
  "subject": "http://rubygems.org/gems/rdf-rdfa",
  "result": {
    "@type": "earl:TestResult",
    "outcome": "earl:pass"
  }
}

Transforming this to Ruby gives essentially the exact same representation, so we can iterate over this using Ruby Haml. The natural thing to do is see how we can represent EARL test results through a hierarchical test structure.

As it happens, the EARL representation is not actually ideal. Each assertion is listed with a subject that indicates the specifics of the processor, test, version and host language. It indicates that it is asserted by the test suite, the test being run, the processor being tested, and the result of this test. However, I’d like to show the results in a tabular form, with the test suite at the top, followed by sections for each version and host language, and a table with a row for each generic test and a column for each processor. A typical result looks like the following:

Test clj-rdfa librdfa pyRdfa RDF::RDFa
0001
Predicate establishment with @property

PASS

PASS

PASS

PASS

To take advantage of JSON-LD chaining, we really want a data structure that we can easily iterate on. By adding some extra markup to the report, we can do this using JSON-LD Framing, basically a query language for JSON-LD that allows us to change the data into a format we want to use. The frame document allows us to specify how we’d like our output. An abbreviated example is the following:

{
  "@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
  "@type": "earl:Software",
  "rdfa1.1": {
    "@type": "rdfatest:Version",
    "html5": [{"@type": "earl:TestCase"}]
  }
}

This says show items of type earl:Software with a property (associated with the version), referencing an object of type rdfa:Version, which has a property for each host language, which references a list of earl:TestCase items. This gives us a JSON-LD snippet such as the following:

{
  "@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
  "@id": "http://rdfa.info/test-suite/",
  "@type": [
    "earl:Software",
    "doap:Project"
  ],
  "homepage": "http://rdfa.info/",
  "name": "RDFa Test Suite",
  "rdfa1.1": {
    "@type": "rdfatest:Version",
    "html5": [
      {
        "@id": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
        "@type": "earl:TestCase",
        "num": "0001",
        "title": "Predicate establishment with @property",
        "description": "Tests @property ...",
        "mode": "earl:automatic",
        "http://rubygems.org/gems/rdf-rdfa": {
          "@id": "http://rdfa.info/test-suite/...",
          "@type": "earl:Assertion",
          "assertedBy": "http://rdfa.info/test-suite/",
          "test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
          "subject": "http://rubygems.org/gems/rdf-rdfa",
          "result": {
            "@type": "earl:TestResult",
            "outcome": "earl:pass"
          }
        },
        "http://www.w3.org/2012/pyRdfa": { "@type": "earl:Software", ... },
        "https://github.com/niklasl/clj-rdfa": { "@type": "earl:Software", ... },
        "https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
        "https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
        "http://rubygems.org/gems/rdf-rdfa": { "@type": "earl:Software", ... }
      }
    ]
  }
}

We’ve basically wrapped each individual test case in a structure that inverts the information contained within the test case. Now we can use this within a Haml template to create the HTML we’re interested in.

To see the complete EARL report, look here.

Conclusions

JSON-LD is the right technology for dealing with RDF and Linked Data in Web applications. It has a convenient representation for working from within various programming languages, such as JavaScript and Ruby. It’s use in implementing that RDFa Test Suite proves it’s worth as a complementary technology for working with Linked Data on the Web along with RDFa.

Next up, we talk about the Distributed Denial of Service attack against the test suite and how we solved this very easily and quickly using BrowserID.

A new RDFa Test Harness

A new RDFa Test Harness
Implementing the RDFa Test Suite as a modern Web application using Sinatra, Backbone.js and Bootstrap.js.

Recently, RDFa entered the Candidate Recommendation phase for releasing
RDFa Core 1.1, RDFa 1.1 Lite, and XHTML+RDFa 1.1 as W3C Standards.
I’ve been using RDFa for a couple of years, originally as part of the Connected Media Experience,
and lately because
I’ve become passionate about the Semantic Web. For the last 10 months, or so, this has extended to my becoming an Invited Expert
in the W3C, where I’ve worked on RDFa, HTML microdata and JSON-LD.

This is an introductory blog post on the creation of a new RDFa Test Suite. Here we discuss the use of
Sinatra, Backbone.js and Bootstrap.js to run the test suite. Later will come articles on the usefulness
of JSON-LD as a means of driving a test harness, generating test reports,
and the use of BrowserID to deal with Distributed
Denial of Service attacks
that cropped up overnight.

RDFa Test Suite

Along with other RDF parsers and serializers (see sidebar), I have an RDFa parser and serializer.
In implementing the parser, and while working on new features for RDFa 1.1,
the RDFa Test Suite has been an invaluable resource. In my testing, I would use the
test manifest, describing the sets of inputs and expected outputs in the form of a SPARQL ASK query.

A basic RDFa test is a small amount of markup intended to test a single feature.

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/elements/1.1/">
<head>
   <title>Test 0001</title>
</head>
<body>
  <p>
    This photo was taken by
    <span class="author"
          about="photo1.jpg"
          property="dc:creator">Mark Birbeck</span>.</p>
</body>
</html>

In this example, we’re testing that the @about attribute sets the subject, @property sets the property and the
text content sets the object of a single RDF statement. Rendered as Turtle, it would look like the following:

@prefix dc: "http://purl.org/dc/elements/1.1/" .
<photo1.jpg> dc:creator "Mark Birbeck" .

A query to test this looks like the following:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
ASK WHERE {
    <http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/photo1.jpg>
      dc:creator "Mark Birbeck" .
}

Note that the relative IRI in the @about is expanded relative to the document location, as is tested in the SPARQL query.

Using the test suite requires a publicly available endpoint, for which I released the RDF Distiller
to test my implementation.
The test suite works with a provided URL, which invokes the processor with a test document. Basically, it does the following:

  1. The Web application performs a GET on the /test-suite/check-test/:version/:suite/:num service URL along
    with the processor endpoint as a query parameter.
  2. The service invokes the processor endpoint passing the URL of the test document.
  3. The processor then parses that document
    and returns a result in a different RDF format (for example Turtle or RDF/XML).
  4. The processor parses the returned RDF document into a graph, and performs a SPARQL query against that graph.
  5. The result is a true or false value, which determines if the test passes or not.
  6. The result is formatted as JSON and returned the Web application.
  7. The Web application updates the test status in the UI.
  8. If running all tests, the completion event triggers the next test to run.

Sinatra

Sinatra is a great lightweight framework for deploying simple Ruby applications on the web. The needs of this application,
while requiring a lot of different libraries, were really fairly simple. Basically, return a page listing the various tests,
respond to requests for test case source documents, activate a test with a specified processor endpoint and return the results.

The basic setup of the app is fairly straight forward:

# Return the test suite driver page
get '/test-suite/' do
  haml :test_suite
end

# Return a particular test, or SPARQL query
get '/test-suite/test-cases/:version/:host_language/:num' do
  source = File.open(File.expand_path("../tests/#{num}.html"))
  case host_language
  when 'xhtml'
    # do XHTML-specific formatting of the test
  when 'html'
    # do HTML-specific formatting of the test
  when 'xml'
  when 'svg'
  end
end

# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
  # Get the SPARQL query
  source = File.open(File.expand_path("../tests/#{num}.sparql"))

  # Do host-language specific modifications of the SPARQL query.
  query = SPARQL.parse(source)

  # Invoke the processor and retrieve results, parsed into an RDF graph
  graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))

  # Run the query
  result = query.execute(graph)

  # Return results as JSON
  {:status => result}.to_json
end

Backing up the Sinatra application are a number of Ruby Gems for working with Linked Data and SPARQL. In
addition to reading and writing RDFa, there are gems for managing RDF graphs, reading other formats, such as Turtle
and RDF/XML, and running the SPARQL queries.

Driving the test suite is an Web application built using Backbone.js and Bootstrap.js.

Backbone.js

Backbone is a JavaScript model-viewer-controller framework for building responsive applications in
JavaScript. It encourages building modular applications split into multiple classes with weak interdependencies. Models and
Collections are used to maintain application state, and reflect information from a server. The RDFa test suite has
two main models and a collection.

The Version model keeps track of information about what is being run. This includes the RDFa version and host language being tested
along with the current processor endpoint. It looks something like the following:

window.Version = Backbone.Model.extend({
  defaults: {
    processorURL: "http://www.w3.org/2012/pyRdfa/extract?uri=",
    processorName: "pyRdfa",
    processorDOAP: "http://www.w3.org/2012/pyRdfa",

    // List of processors
    processors: {}
  }

  // Appropriate host languages for the current version
  hostLanguages: function() {
    return {
      "rdfa1.0": ["SVG", "XHTML1"],
      "rdfa1.1": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"],
      "rdfa1.1-vocab": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"]
    }[this.get("version")];
  }
});

The Test model, uses the test manifest to instantiate a number of
Test model instances. Changing information in the Version model causes different tests to be enabled or disabled,
as appropriate for the given RDFa version and host language. It also affects URL generation for retrieving and running
different tests. In addition to instantiating tests, the Test Collection also allows the complete sequence of tests
to be run, by listening to an event for a completion event from running a test on the first test model and initiating
the test of the next.

Styling the User Interface

I’m no designer, but I like a good looking and efficient user interface. Fortunately, the people at Twitter do too,
and they released Bootstrap.js as a means of tackling common problems. I won’t go into detail here, but check out
their example page to get an idea of the things you can do with Bootstrap.
What I immediately noticed about it is that I didn’t really need to worry about layout. Note that you can even run
the Test Suite from an iPhone!

Data Driven Tests

Of course, returning the test suite HTML is just part of the problem, we also need to get details about each test to
the page, so that it can respond to requests to run specific tests. The tests are managed through a
test manifest, which is kept in Turtle format to make it easy to add
tests. A typical entry looks like the following:

<test-cases/0001> a test:TestCase;
   dc:title "Predicate establishment with @property";
   rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
   rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
   test:classification test:required;
   test:informationResourceInput <test-cases/0001.html>;
   test:informationResourceResults <test-cases/0001.sparql> .

The basically describes an IRI for the test, in this case test-cases/0001 relative to the location of the test suite,
the title of the test, the RDFa versions and host languages it applies to and a reference to the input and result documents.
RDFa has over 200 such tests defined. This is all well and good, but requiring yet another data format is an added complication.
Better to have the tests defined in a format more appropriate for use within an Web application, such as JSON. As it happens
JSON-LD is another specification that is still underway, but proving to be quite flexible and useful for our needs. For a
peek at the JSON-LD version of the RDFa test suite manifest, look here.
More on using JSON-LD, and why it’s such a good match for RDFa in the next post.