Category Archives: Ruby

SPARQL 1.0 for Ruby

I’ve just released version 0.0.2 of the Ruby sparql gem. This version is based on earlier work by Pius and Arto and incorporates from SPARQL Grammar and SPARQL Algebra. Further documentation is available here.

This gem integrates with [RDF.rb][] and uses [rdf-xsd][] to provide additional literal semantics.

Why release SPARQL for Ruby? Probably not because of the killer performance, at least right now. However, I believe it’s important that Ruby have a complete tool chain for manipulating Linked Data (including RDF and SPARQL), and this was the remaining piece.

In spite of the 0.0.2 release number, is is a fully functioning implementation of SPARQL 1.0 semantics and passes all the DAWG data-r2 test cases. The gem makes use of RDF::Query to perform basic BGP operations on RDF::Queryable objects (such as RDF::Repository). The gem has some support for query optimization, but this remains largely unimplemented and will be addressed in future releases. I’d also like to support SPARQL 1.1 queries and udpates at some point.

This is a pure Ruby implementation and does not directly rely on any native libraries (although, some RDF readers such as RDFa and RDF/XML presently do).

The basic strategy is to parse SPARQL and transform it into an S-Expression-based algebra, pretty close to that used by Jena ARQ (SPARQL S-Expressions, or SSE). This allows SSE to be used directly for performing queries, or to parse SPARQL grammar to SSE.

The linkeddata gem has also been updated to have a soft reference to SPARQL, in addition to new processors for RDF::Turtle, JSON::LD, and RDF::Microdata.

The gem is tested on Ruby 1.8.7, 1.9.2 and JRuby. (JRuby has some spec issues, probably due to Nokogiri differences)

Many thanks to Pius Uzamere and helping to make this release happen, and to Arto Bendiken for the work in RDF.rb, SPARQL::Algebra and SPARQL::Grammar that preceded this.

rdf.rb 0.3.4 released

After several months of gathering updates for RDF.rb, we’ve released version 0.3.4 with several new features:

  • Update to BGP query model to support SPARQL semantics,
  • Expandable Litereal support, to allow further implementation of XSD datatypes outside of RDF.rb (see RDF::XSD gem),
  • More advanced content type detection to allow better selection of the appropriate reader from those available on the client. (Includes selecting among HTML types, such as Microdata and RDFa)
  • Improved CLI with the rdf executable providing access to all loaded readers and writers for cross-language serialization and deserialization.

As an example of format detection, consider the following:

require 'linkeddata'
RDF::Graph.load("http://greggkellogg.net/foaf.ttl")

should load Turtle or N3 readers if installed. This becomes more important for ambiguous file types, such as HTML, which could be either RDFa or Microdata, and application/xml, which could be TriX, RDF/XML or even RDFa.

See documentation for more specifics on this version of RDF.rb. Note that I’ve attempted to incorporate suggestions for improving the documentation.

Most of the reader/writer gems have been updated to match this release, in particular JSON::LD, RDF::Microdata, RDF::N3, [RDF::RDFa][], [RDF::RDFXML][], and RDF::Turtle. A future update to the linkeddata gem should reference the latest versions of each, but a simple gem update will work too.

There is a slight semantic change for repositories to support SPARQL: a context of false should not match a variable context. This is straight out of SPARQL semantics. Repository implementors who have provided custom implementations of #query_pattern should check behavior against rdf-spec version 0.3.4 to verify correct operation.

Next up is a release of SPARQL implemented in pure Ruby. This gem provides full support for SPARQL 1.0 queries.

RDF::RDFa update with vocabulary expansion, RDF collections and more

I’ve updated RDF::RDFa with updates from recent changes to RDF Core:

  • Deprecate explicit use of @profile
  • Add rdfa:hasVocabulary when encountering @vocab
  • Implemented Reader#expand to perform vocabulary expansion using RDFS rules 5, 7, 9 and 11.

Additionally, experimental support for RDF Collections (lists) has been added, based on RDF Webapps working group Wiki notes.

Remove RDFa Profiles

RDFa Profiles were a mechanism added to allow groups of terms and prefixes to be defined in an external resource and loaded to affect the processing of an RDFa document. This introduced a problem for some implementations needing to perform a cross-origin GET in order to retrieve the profiles. The working group elected to drop support for user-defined RDFa Profiles (the default profiles defined by RDFa Core and host languages still apply) and replace it with an inference regime using vocabularies. Parsing of @profile has been removed from this version.

Vocabulary Expansion

One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.

As an optional part of RDFa processing, an RDFa processor will perform limited RDFS entailment, specifically rules rdfs5, 7, 9 and 11. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.

RDF::RDFa::Reader implements this using the #expand method, which looks for rdfa:hasVocabulary properties within the output graph and performs such expansion. See an example in the usage section.

RDF Collections (lists)

One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:

[ a schema:MusicPlayList;
  schema:name "Classic Rock Playlist";
  schema:numTracks 5;
  schema:tracks (
    [ a schema:MusicRecording; schema:name "Sweet Home Alabama";       schema:byArtist "Lynard Skynard"]
    [ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
    [ a schema:MusicRecording; schema:name "Sharp Dressed Man";        schema:byArtist "ZZ Top"]
    [ a schema:MusicRecording; schema:name "Old Time Rock and Roll";   schema:byArtist "Bob Seger"]
    [ a schema:MusicRecording; schema:name "Hurt So Good";             schema:byArtist "John Cougar"]
  )
]

defines a playlist with an ordered set of tracks. RDFa adds the @member attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <meta property="numTracks" content="5"/>

  <div rel="tracks" member="">
    <div typeof="MusicRecording">
      1.<span property="name">Sweet Home Alabama</span> -
      <span property="byArtist">Lynard Skynard</span>
     </div>

    <div typeof="MusicRecording">
      2.<span property="name">Shook you all Night Long</span> -
      <span property="byArtist">AC/DC</span>
    </div>

    <div typeof="MusicRecording">
      3.<span property="name">Sharp Dressed Man</span> -
      <span property="byArtist">ZZ Top</span>
    </div>

    <div typeof="MusicRecording">
      4.<span property="name">Old Time Rock and Roll</span>
      <span property="byArtist">Bob Seger</span>
    </div>

    <div typeof="MusicRecording">
      5.<span property="name">Hurt So Good</span>
      <span property="byArtist">John Cougar</span>
    </div>
  </div>
</div>

This basically does the same thing, but places each track in an rdf:List in the defined order.

You can try both these and other RDF gems a the distiller.

RDF::N3 no longer accepts text/turtle or :ttl

With the release of RDF::Turtle, starting with version 0.3.5, RDF::N3 no longer asserts that it is a reader for Turtle. This includes MIME Types text/turtle, application/turtle, application/x-turtle. Or the .ttl extension or :ttl or :turtle formats. Of course, N3 remains reasonably compatible with Turtle, but the recent RDF 1.1 Working Group publication of the Turtle Specification has caused some divergence.

Most notably, in Turtle, the empty prefix (‘:’) is no longer a synonym for <#>. In fact, the empty prefix is no longer defined by default.

RDF::Turtle defines MIME Types text/turtle, text/rdf+turtle, application/turtle and application/x-turtle.

The officially submitted MIME Type for Turtle is text/turtle with default content coding of UTF-8.

As usual, you can try both these and other RDF gems a the distiller At some point, RDF::Turtle will make it into the [linkeddata gem].

SPARQL Algebra

For those intrepid enough, I’ve pushed version 0.0.2 of sparql-algebra. It relies on unreleased changes to RDF.rb and sxp-ruby, so you need to use bundler with the included Gemfile.

SPARQL Algebra implements the s-expression-based SPARQL algebra described in SPARQL 1.1 and Jena. Remaining work needed for _describe_ operator and query optimizations. This is the base for translation from SPARQL Grammar [4], which requires just a bit more work to be fully compliant. Both of these, along with support for an HTTP endpoint and solution serializer, will be sufficient to implement a complete SPARQL solution in pure Ruby.

SPARQL Algebra passes all but four W3C DAWG tests (data-r2), with those four not being worth implementing, in my opinion. As an example of an SSE based on the SPARQL grammar, consider the following:

PREFIX  foaf:  <http://xmlns.com/foaf/0.1/>

SELECT ?mbox ?name
 {
   ?x foaf:mbox ?mbox .
   OPTIONAL { ?x foaf:name  ?name } .
 }

which is equivalent to the following SSE:

(prefix ((foaf: <http://xmlns.com/foaf/0.1/>))
   (project (?mbox ?name)
     (leftjoin
       (bgp (triple ?x foaf:mbox ?mbox))
       (bgp (triple ?x foaf:name ?name)))))

There are outstanding pull requests to RDF.rb and sxp-ruby that are required to release it to RubyGems, but you’re encouraged to play with it and send feedback!

Thanks to Arto and Ben for the initial work they did on this, and other enabling projects, as well as creating an excellent executable test suite!

Update

SPARQL::Grammar now complete, generating SPARQL::Algebra classes, allowing a complete end-to-end SPARQL solution for Ruby.

RDF::RDFa, RDF::RDFXML, and RDF::N3 0.3.0 releases

The Nokogiri-based reader suite for the RDF.rb environment. This version offers substantial performance gains, due to general improvements in RDF.rb as well as a number of improvements in the readers:

General improvements

  • Readers save prefix definitions in :prefixes. Writers use :prefixes, or :standard_prefixes to generate QNames.
  • Readers supports :canonicalize and :validate options

RDF::N3

  • New parser based on Tim-BL’s Predictive Parser supports quoted graphs and variables.
  • Stream-based reader can process an indefinite length input file, vs. the older Treetop-based reader that was a two-pass parser.
  • Substantial performance improvement over previous version, running at about x statements/second on an iMac.
  • From History:
    • New Predictive-Parser based N3 Reader, substantially faster than previous Treetop-based parser
    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
      • In Writer
        • Use only :prefixes for creating QNames.
        • Add :standard_prefixes and :default_namespace options.
        • Use “”” for multi-line quotes, or anything including escaped characters
      • In Reader
        • URI canonicalization and validation.
        • Added :canonicalize, and :intern options.
        • Added #prefixes method returning a hash of prefix definitions.
        • Change :strict option to :validate.
        • Add check to ensure that predicates are not literals, it’s not legal in any RDF variant.
    • RSpec 2 compatibility

RDF::RDFXML

    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
    • In Writer
      • Use only :prefixes for creating QNames.
      • Add :standard_prefixes and :default_namespace options.
      • Improve Writer#to_qname.
      • Don’t try to translate rdf:_1 to rdf:li due to complex corner cases.
      • Fix problems with XMLLiteral, rdf:type and rdf:nodeID serialization.
    • In Reader
      • URI canonicalization and validation.
      • Added :canonicalize, and :intern options.
      • Change :strict option to :validate.
      • Don’t create unnecessary namespaces.
      • Don’t use regexp to substitute base URI in URI serialization.
      • Collect prefixes when extracting mappings.
    • Literal::XML
      • Add all in-scope namespaces, not just those that seem to be used.
    • RSpec 2 compatibility

RDF::RDFa

    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
    • In Writer
      • Use only :prefixes for creating QNames.
      • Add :standard_prefixes and :default_namespace options.
      • Improve Writer#to_qname.
    • In Reader
      • URI canonicalization and validation.
      • Added :canonicalize, and :intern options.
      • Change :strict option to :validate.
      • Don’t create unnecessary namespaces.
      • Don’t use regexp to substitute base URI in URI serialization.
      • Collect prefixes when extracting mappings.
    • Literal::XML
      • Add all in-scope namespaces, not just those that seem to be used.
    • RSpec 2 compatibility

RdfContext and RDF::RDFa support for RDFa 1.1 2008-08-03 draft semantics

 I’ve updated both RdfContext and RDF::RDFa gems to support the latest RDFa Core 1.1 Editor’s Draft  (2010-08-03) semantics. This includes support for the following:

  • Use of a Processor Graph to gather information and errors during the course of parsing. Use the :processor_graph option to specify a Graph in which to collect information. There is no published specification for the properties to use, but until there is, each event is saved with a Blank Node subject of type rdfa.UndefinedPrefixError, rdfa:UndefinedTermError, rdfa:HostLanguageMarkupError, rdfa:ProfileReferenceError, rdfa:InformationalMessage, rdfa;MiscellaneousWarning or rdfa:MiscellaneousError. Additionally statements with literals for dc:description, dc:date, rdfa:sequence and rdfa:source (path to HTML Node) are generated.
  • RDFa Profiles allow URI mappings for terms and prefixes along with the specification of a default vocabulary.
  • RDFa 1.1 prefixes the use of @prefix to create prefix mappings, but @xmlns continues to be supported.
  • XMLLiterals must be declared explicitly by setting @datatype=”rdf:XMLLiteral”. In 1.0, any statement with a property that contained anything other than text nodes as children caused an XMLLiteral to be emitted.

Note that until RDFa Core 1.1 is published, all features are subject to change. I will not be attempting to maintain compatibility with draft features that are obsoleted during the standardization process.

RdfContext version 0.5.4 with provisional RDFa 1.1 support

I just released version 0.5.4 of RdfContext to GitHub and GemCutter. This version is notable for including support for RDFa 1.1 parsing. This is still based on an Working Draft, so it will likely change in the future.

RDFa 1.1 includes support for profiles, vocabularies and terms. And supports using URIs or CURIEs or terms anywhere that’s legal within an HTML document. Right now, only the XHTML+RDFa profile is supported.

Default term URI using @vocab.

RDFa 1.1 allows URIs to be expressed using an NCName, called a term, by using the @vocab attribute, an author can define a URI to be used for a bare word to turn it into a URI. Take for example the following:

<div vocab="http://xmlns.com/foaf/0.1/">
   <p about="#me" typeof="Person" property="name">Gregg Kellogg</p>
</div>

will generate the following triples:

<#me> a foaf:Person;
  foaf:name "Gregg Kellogg" .

Profile documents for defining prefixes and terms

A Profile document allows the specification of a set of URI mappings and term mappings in a single document. These documents are RDF formatted, and may or may not be RDFa. The following shows an example Profile document:

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
[ rdfa:prefix "foaf"; rdfa:uri "http://xmlns.com/foaf/0.1/"] .
[ rdfa:prefix "dc"; rdfa:uri "http://purl.org/dc/terms/"] .
[ rda:term "name"; rdfa:uri "http://xmlns.com/foaf/0.1/name"] .
[ rda:term "created"; rdfa:uri "http://purl.org/dc/terms/created"] .

This bit profile results in namespace mappings and a bare terms. Multiple vocabularies may be used together to create a namespace composed of terms from several vocabularies, without needing to describe them explicitly. These may then be used in a document as follows:

<div profile="http://example.com/my_vocab">
  <p about="#me">
    <span property="name">Gregg Kellogg</span>
    is the author of
    <a rel="created"
        resource="http://github.com/gkellogg/rdf_context">
      RdfContext
    </a>
  </p>
</div>

Namespace definitions

RDFa 1.1 deprecates the use of @xmlns for defining namespace prefixes. The @prefix attribute defines one or more mappings between prefixes and URIs. For example:

<div prefix="foaf: http://xmlns.com/foaf/0.1/ dc: http://purl.org/dc/terms/">
  <p about="#me">
    <span property="foaf:name">Gregg Kellogg</span>
    is the author of
    <a rel="dc:created"
        resource="http://github.com/gkellogg/rdf_context">
      RdfContext
    </a>
  </p>
</div>

Defines and uses two different namespace mappings.

URIs Everywhere

In RDFa 1.0, certain attributes took a URI, others a CURIE, and still others either a URI or a Safe CURIE. This is confusing, and RDFa 1.1 now allows either URIs, CURIEs, or SafeCURIEs to be used most anywhere (SafeCURIEs are maintained for backwards compatibility). For example:

<div>
  <p about="#me">
    <span property="http://xmlns.com/foaf/0.1/name">
      Gregg Kellogg
    </span>
    is the author of
    <a rel="http://purl.org/dc/terms/created"
      resource="http://github.com/gkellogg/rdf_context">
      RdfContext
    </a>
  </p>
</div>

Change History

The following is the change log for this version of RdfContext. Note that one change may potentially break existing code: URIRef#namespace no longer throughs an exception if a mapping is not found. Other changes are noted here:

  • RDFa 1.1 parsing supported (based on RDFa Core 1.1 W3C Working Draft 22 April 2010)
  • Fix URIRef#short_name (and consequently #base and #namespace) to not extract a non-hierarchical path as a short_name
  • Namespace no longer uses URIRef, but just acts on strings.
  • Namespace#new does not take an optional _fragment_ argument any longer.
  • Added Namespace#to_s to output “prefix: uri” format
  • Graph#qname first trys generating using bound namespaces, then adds well-known namespaces.
  • URIRef#to_qname and #to_namespace No longer generates an exception. Each take either a Hash or an Array of namespaces and tries them from longest to shortest.
  • Improved Turtle and XML serializers in use of namespaces.
  • Generate pending messages if RDFa tests skipped due to lack of Redland installation.
  • Change dcterms: prefix to dc: (fully compatible with previous /elements/ definitions)

RdfContext version 0.5.1 brings Turtle and enhanced RDF/XML serializers

Just pushed version 0.5.1 of RdfContext to GitHub and Gemcutter. This version includes a Serializer framework, including a AbstractSerializer, RecursiveSerializer and Turtle and RDF/XML serializers based on these. The RDF/XML serializer is a big improvement over the previous version, including Typed element names an RDF Container folding using parseType="collection".

RdfContext includes native Ruby parsers for RDF/XML, RDFa and N3-rdf, which includes Turtle and N-Triples. All parsers pass W3C tests (included in specs). It also includes context-aware quad store, with in-memory and SQLite3 storage models.