Category Archives: Semantic Web

RDF.rb 0.3.5 and SPARQL 0.1.0

I added some minor updates to RDF.rb and re-issued versions 0.3.5 for the rdf and linkeddata gems. These updates are mostly to better support HTTP content negotaion and to find appropriate readers and writers based on file extension, mime-type, and content sniffing. There are also some minor fixes to aid jRuby and Ruby 1.9.3 support.

More notably, I’ve released 0.1.0 of the [SPARQL][] gem. The logical behavior is unchanged from the previous release, but it now includes Rack and Sinatra support to easily create middleware for a SPARQL endpoint. When used with the Linked Data gem, this includes a range of RDF serializations for DESCRIBE and CONSTRUCT queries. It also adds HTTP Accept headers to outgoing requests using FROM and FROM NAMED for RDF/XML and Turtle.

As a simple example, the Sinatra example in the README performs a simple query against a small repository:

#!/usr/bin/env ruby -rubygems
require 'sinatra'
require 'sinatra/sparql'

repository = RDF::Repository.new do |graph|
  graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
end

get '/sparql' do
  SPARQL.execute("SELECT * WHERE { ?s ?p ?o }", repository)
end

A minimal SPARQL endpoint can be described as follows:

# Sinatra example
#
# Call as http://localhost:4567/sparql?query=uri,
# where `uri` is the URI of a SPARQL query, or
# a URI-escaped SPARQL query, for example:
#   http://localhost:4567/?query=SELECT%20?s%20?p%20?o%20WHERE%20%7B?s%20?p%20?o%7D
require 'sinatra'
require 'sinatra/sparql'
require 'uri'

get '/' do
  settings.sparql_options.merge!(:standard_prefixes => true)
  repository = RDF::Repository.new do |graph|
    graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
  end
  if params["query"]
    query = query.to_s =~ /^\w:/ ? RDF::Util::File.open_file(params["query"]) : :URI.decode(params["query"].to_s)
    SPARQL.execute(query, repository)
  else
    service_description(:repo => repository)
  end
end

This can be run using ruby -rubygems example.rb, or with rackup or shotgun as rackup example.rb

To load a complete to the query repository, or a full dataset including multiple context, load the repository as follows:

repository = RDF::Repository.load("http://path-to-repo")

This will incur a large startup time for each request, but you can also use a persistent store such as rdf-mongo: repository = RDF::Mongo::Repository.new() This will instantiate a persistent MongoDB store, which can be initialized one time using RDF::Mongo::Repository.load. Subsequent instantiations will use the persistent storage, and have better query performance for larger datasets.

For a more complete implementation, see the RDF Distiller running at http://rdf.greggkellogg.net/sparql and freely available to download and modify for your own purposes.

Follow up questions to public-rdf-ruby.

SPARQL 1.0 for Ruby

I’ve just released version 0.0.2 of the Ruby sparql gem. This version is based on earlier work by Pius and Arto and incorporates from SPARQL Grammar and SPARQL Algebra. Further documentation is available here.

This gem integrates with [RDF.rb][] and uses [rdf-xsd][] to provide additional literal semantics.

Why release SPARQL for Ruby? Probably not because of the killer performance, at least right now. However, I believe it’s important that Ruby have a complete tool chain for manipulating Linked Data (including RDF and SPARQL), and this was the remaining piece.

In spite of the 0.0.2 release number, is is a fully functioning implementation of SPARQL 1.0 semantics and passes all the DAWG data-r2 test cases. The gem makes use of RDF::Query to perform basic BGP operations on RDF::Queryable objects (such as RDF::Repository). The gem has some support for query optimization, but this remains largely unimplemented and will be addressed in future releases. I’d also like to support SPARQL 1.1 queries and udpates at some point.

This is a pure Ruby implementation and does not directly rely on any native libraries (although, some RDF readers such as RDFa and RDF/XML presently do).

The basic strategy is to parse SPARQL and transform it into an S-Expression-based algebra, pretty close to that used by Jena ARQ (SPARQL S-Expressions, or SSE). This allows SSE to be used directly for performing queries, or to parse SPARQL grammar to SSE.

The linkeddata gem has also been updated to have a soft reference to SPARQL, in addition to new processors for RDF::Turtle, JSON::LD, and RDF::Microdata.

The gem is tested on Ruby 1.8.7, 1.9.2 and JRuby. (JRuby has some spec issues, probably due to Nokogiri differences)

Many thanks to Pius Uzamere and helping to make this release happen, and to Arto Bendiken for the work in RDF.rb, SPARQL::Algebra and SPARQL::Grammar that preceded this.

rdf.rb 0.3.4 released

After several months of gathering updates for RDF.rb, we’ve released version 0.3.4 with several new features:

  • Update to BGP query model to support SPARQL semantics,
  • Expandable Litereal support, to allow further implementation of XSD datatypes outside of RDF.rb (see RDF::XSD gem),
  • More advanced content type detection to allow better selection of the appropriate reader from those available on the client. (Includes selecting among HTML types, such as Microdata and RDFa)
  • Improved CLI with the rdf executable providing access to all loaded readers and writers for cross-language serialization and deserialization.

As an example of format detection, consider the following:

require 'linkeddata'
RDF::Graph.load("http://greggkellogg.net/foaf.ttl")

should load Turtle or N3 readers if installed. This becomes more important for ambiguous file types, such as HTML, which could be either RDFa or Microdata, and application/xml, which could be TriX, RDF/XML or even RDFa.

See documentation for more specifics on this version of RDF.rb. Note that I’ve attempted to incorporate suggestions for improving the documentation.

Most of the reader/writer gems have been updated to match this release, in particular JSON::LD, RDF::Microdata, RDF::N3, [RDF::RDFa][], [RDF::RDFXML][], and RDF::Turtle. A future update to the linkeddata gem should reference the latest versions of each, but a simple gem update will work too.

There is a slight semantic change for repositories to support SPARQL: a context of false should not match a variable context. This is straight out of SPARQL semantics. Repository implementors who have provided custom implementations of #query_pattern should check behavior against rdf-spec version 0.3.4 to verify correct operation.

Next up is a release of SPARQL implemented in pure Ruby. This gem provides full support for SPARQL 1.0 queries.

RDF::RDFa update with vocabulary expansion, RDF collections and more

I’ve updated RDF::RDFa with updates from recent changes to RDF Core:

  • Deprecate explicit use of @profile
  • Add rdfa:hasVocabulary when encountering @vocab
  • Implemented Reader#expand to perform vocabulary expansion using RDFS rules 5, 7, 9 and 11.

Additionally, experimental support for RDF Collections (lists) has been added, based on RDF Webapps working group Wiki notes.

Remove RDFa Profiles

RDFa Profiles were a mechanism added to allow groups of terms and prefixes to be defined in an external resource and loaded to affect the processing of an RDFa document. This introduced a problem for some implementations needing to perform a cross-origin GET in order to retrieve the profiles. The working group elected to drop support for user-defined RDFa Profiles (the default profiles defined by RDFa Core and host languages still apply) and replace it with an inference regime using vocabularies. Parsing of @profile has been removed from this version.

Vocabulary Expansion

One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.

As an optional part of RDFa processing, an RDFa processor will perform limited RDFS entailment, specifically rules rdfs5, 7, 9 and 11. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.

RDF::RDFa::Reader implements this using the #expand method, which looks for rdfa:hasVocabulary properties within the output graph and performs such expansion. See an example in the usage section.

RDF Collections (lists)

One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:

[ a schema:MusicPlayList;
  schema:name "Classic Rock Playlist";
  schema:numTracks 5;
  schema:tracks (
    [ a schema:MusicRecording; schema:name "Sweet Home Alabama";       schema:byArtist "Lynard Skynard"]
    [ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
    [ a schema:MusicRecording; schema:name "Sharp Dressed Man";        schema:byArtist "ZZ Top"]
    [ a schema:MusicRecording; schema:name "Old Time Rock and Roll";   schema:byArtist "Bob Seger"]
    [ a schema:MusicRecording; schema:name "Hurt So Good";             schema:byArtist "John Cougar"]
  )
]

defines a playlist with an ordered set of tracks. RDFa adds the @member attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <meta property="numTracks" content="5"/>

  <div rel="tracks" member="">
    <div typeof="MusicRecording">
      1.<span property="name">Sweet Home Alabama</span> -
      <span property="byArtist">Lynard Skynard</span>
     </div>

    <div typeof="MusicRecording">
      2.<span property="name">Shook you all Night Long</span> -
      <span property="byArtist">AC/DC</span>
    </div>

    <div typeof="MusicRecording">
      3.<span property="name">Sharp Dressed Man</span> -
      <span property="byArtist">ZZ Top</span>
    </div>

    <div typeof="MusicRecording">
      4.<span property="name">Old Time Rock and Roll</span>
      <span property="byArtist">Bob Seger</span>
    </div>

    <div typeof="MusicRecording">
      5.<span property="name">Hurt So Good</span>
      <span property="byArtist">John Cougar</span>
    </div>
  </div>
</div>

This basically does the same thing, but places each track in an rdf:List in the defined order.

You can try both these and other RDF gems a the distiller.

RDF::N3 no longer accepts text/turtle or :ttl

With the release of RDF::Turtle, starting with version 0.3.5, RDF::N3 no longer asserts that it is a reader for Turtle. This includes MIME Types text/turtle, application/turtle, application/x-turtle. Or the .ttl extension or :ttl or :turtle formats. Of course, N3 remains reasonably compatible with Turtle, but the recent RDF 1.1 Working Group publication of the Turtle Specification has caused some divergence.

Most notably, in Turtle, the empty prefix (‘:’) is no longer a synonym for <#>. In fact, the empty prefix is no longer defined by default.

RDF::Turtle defines MIME Types text/turtle, text/rdf+turtle, application/turtle and application/x-turtle.

The officially submitted MIME Type for Turtle is text/turtle with default content coding of UTF-8.

As usual, you can try both these and other RDF gems a the distiller At some point, RDF::Turtle will make it into the [linkeddata gem].

Things people get wrong in RDFa markup

Things people get wrong in RDFa markup

Lately, I’ve been looking a lot of both RDFa and Microdata formatted HTML. There are a number of things that authors (even experts) regularly get wrong:

@src and @rel attributes create reverse relation

Having code such as the following:

<img rel="image" src="image.jpg" />
...

You’d think that this would indicate that the icon for the document is

<> xhv:image <image.jpg>

but it actually says:

<image.jpg> xhv:image <> .

The why of this is lost in the haze of history, but people regularly get this wrong. To get what you need, consider something like the following markup:

<span rel="image"><img src="image.jpg" /></span>
...

@rel and @typeof and/or @about shouldn’t be on the same element

Another common mistake is format such as the following:

<div rel="mainContentOfPage" about="#me" typeof="Person">
  <p>
    Name: <span property="name">Gregg Kellogg</span></p>
  <p>
    Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a></p>
</div>

Placing @rel and @about or @typeof on the same element indicates that the @about/@typeof indicate the subject not the object of a relation. To get the desired effect, use @resource (or @href), however, this does not let you set the type of the object resource. Alternatively, use the following type of markup:

<div rel="mainContentOfPage">
  <div about="#me" typeof="Person">
    <p>
      Name: <span property="name">Gregg Kellogg</span></p>
    <p>
      Knows: <a href="http://greggkellogg.net/#me" rel="knows">Myself</a>
    </p>
  </div>
</div>

Another area of common mis-understanding is that the document order of statements within an HTML document is not significant when creating a list of resources. Consider the following example from schema.org/MusicPlaylist:

<div itemscope="" itemtype="http://schema.org/MusicPlaylist">
  <span itemprop="name">Classic Rock Playlist</span>
  <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
    1. <span itemprop="name">Sweet Home Alabama</span> - <span itemprop="byArtist">Lynard Skynard</span>
  </div>
  <div itemprop="tracks" itemscope="" itemtype="http://schema.org/MusicRecording">
    2. <span itemprop="name">Shook you all Night Long</span> - <span itemprop="byArtist">AC/DC</span>
  </div>
  ...
</div>

You would think that this describes a track ordering, but it does not (at least in RDF). Doing this requires RDF List constructs missing from both Microdata and RDFa. In Turtle, you could do it as follows:

@prefix: <http://schema.org> .
[ a :MusicPlaylist;
  :name "Classic Rock Playlist";
  :numTracks 5;
  :tracks (
    [ a :MusicRecording; :name "Sweet Home Alabama"; :byArtist "Lynard Skynard"]
    [a :MusicRecording; :name "Shook you all Night Long"; :byArtist "AC/DC"]
    ...
  )
]

It would seem obvious that an HTML ordered list could be used to generate an RDF List, but it received to achieve enough interest to make it through.

These are just a couple of things that are confusing about RDFa, and offer good fodder for Microdata proponents to complain about the complexity of RDFa markup. It’s important to note that a core goal of RDFa 1.1 is to be compatible with RDFa 1.0 (RDFa in XHTML), in which these decisions were established. Perhaps a reconciliation between Microdata and RDFa could take the best of both:

  • Craft RDF friendly URIs from terms (such as schema:Person above),
  • Reduce amount of document structure needed to describe common use cases,
  • Better intuitive generation of RDF output,
  • Ability to avoid RDF generation and go straight to JSON (perhaps JSON-LD),
  • Use common URI prefixes,
  • RDF Lists,
  • Promote better HTML readability.

That’s my 2 cents (for now)

Update

The RDFa Working Group recently decided to change the behavior of @src in RDFa Core 1.1 to be the same as @href. This means that

<img rel="image" src="image.jpg" />
...

Actually does now generate the following:

<> xhv:image <image.jpg>

Recent updates to Microdata to RDF processing now do place multiple items in a list, but this is subject to further specification.

In RDFa, this can now be done with the @inlist attribute, which places values in an RDF Collection (rdf:List).

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <div rel="tracks" inlist="">
    1. <div typeof="MusicRecording">
          <span property="name">Sweet Home Alabama</span> - <span property="byArtist">Lynard Skynard</span>
  </div>
  2. <div typeof="MusicRecording">
        <span property="name">Shook you all Night Long</span> - <span property="byArtist">AC/DC</span>
  </div>
  ...
</div>

Now generates the following Turtle:

@prefix: <http://schema.org> .
[ a :MusicPlaylist;
  :name "Classic Rock Playlist";
  :tracks (
    [ a :MusicRecording; :name "Sweet Home Alabama"; :byArtist "Lynard Skynard"]
    [ a :MusicRecording; :name "Shook you all Night Long"; :byArtist "AC/DC"]
    ...
  )
]

CME and the Semantic Web

CME and the Semantic Web

Introduction

The Connected Media Experience is a consortium formed to promote technical standards for enhanced digital media packages such as music, movie, television and eBook releases. It’s origins go back to 2007 and was conceived as a platform for providing a rich experience for enjoying media across a variety of devices. Early on, the importance of rich semantic information to inform and enhance an Experience was recognized as an important differentiator. CME is intended to include social aspects allowing users to connect and interact with each other, so a rich means of identifying and describing elements of a release, and the release itself, is an important design consideration.

During the course of development, it became clear that The Semantic Web community, had developed important technology for realizing a system such as CME. The group was introduced to the Music Ontology and RDFa in defining and, along with HTML5, CSS3 and JavaScript, useful in creating portable presentations of these releases.

The Music Industry has also developed rich meta-data release descriptions in the form of DDEX (Digital Data Exchange). However, DDEX’ goals are focused on music distribution for business-to-business transactions, rather than to suit the needs of the consumer.

Ultimately, the group decided against including Semantic Web technologies to describe rich releases, in favor of a publisher-friendly model based on HTML5 and Widgets to achieve presentational objectives, without the rich semantic representational component. (The format does include limited semantic data in a proprietary format). This note attempts to discuss some of the reasons behind this decision, and lessons that the Semantic Web community might learn from the experience.

Background

In 2007, the author was contracted by Gracenote and Warner Music Group to help develop use cases, demonstrations and a high level architecture for a Connected Media Experience (CMX, as it was known at the time). Early demos made use of Adobe Flash and Flex technologies to create compelling experiences on mobile and desktop platforms, and a relational metadata representation using a proprietary (ad-hoc) XML schema. However, this was found to cause many interoperability problems and was ultimately abandoned in favor of open technologies.

As CME progressed, other major music labels including Universal Music Group and Sony Music (then Sony BMG) joined to create the Connected Media Experience Standards Setting Organization for furthering development of the specification and to solicit contributions from other interested industry members to promote such a standard. (The author served as Chairman of the Technical Working Group until March of this year.)

The Music Ontology was introduced as a rich metadata format using RDF and OWL to describe music releases and content. It is based on Friend-of-a-Friend (FOAF) and Functional Requirements of Bibliographic Records (FRBR) to describe albums, tracks, contributors, musical works, performances and releases. The needs of describing releases beyond music ultimately drove CME to create their own Ontology taking elements from FOAF, FRBR, Music Ontology, and DDEX.

For some time, package presentation was considered as a disaggregated release, including individual audio and video tracks combined with a Manifest (specified in RDF/XML using the CME Ontology) to create different user interfaces depending on the capabilities of the particular target devices. With the rise of smart-phones and HTML browsers within other consumer electronics devices, this was eventually changed to an HTML5+RDFa manifest, which could service as both a release description and a presentation, when used in combination with JavaScript and CSS3.

CME Entity Relationship Diagram

Throughout the course of development, CME members had difficulty in accepting the advantages semantic technologies being used, which led to low participation and lack of involvement in generating the specifications. Fundamentally, the difficulty of working with the technologies led the group to abandon a rich semantic representation of a release and settle on more established web technologies and proprietary metadata formats.

Examples

The basic idea of the CME Vocabulary was to allow a simple hierarchal representation of Work/Production/Signal/Manifestation with relationship to a Release/Collection.

Given a particular encoding, say of Hoagy Carmichael’s “Stardust”, a simple Manifestation might be described as follows:

[ a cme:Encoding, mo:MusicalManifestation;
dc:title "Stardust"@en-us;
dc:format "audio/mpeg"^^dc:MediaType;
cme:duration "PT3M53S"^^xsd:duration;
dc:issued "1978-04-01"^^xsd:date
] .

As this represents a specific formatted manifestation of a recorded signal, we can add more information:

<#stardust> a cme:Audio, mo:Signal;
dc:title "Stardust"@en-us;
cme:displayArtist <http://dbpedia.org/data/Willie_Nelson>;
cme:lyrics <http://www.metrolyrics.com/stardust-lyrics-willie-nelson.html>;
mo:isrc "XX-XXX-XX-00000"^^cme:ISRCType;
mo:label <http://dbpedia.org/data/Columbia_Records>;
cme:encoding [ a cme:Encoding, mo:MusicalExpression;
  dc:title "Stardust"@en-us;
  dc:format "audio/mpeg"^^dc:MediaType;
  cme:duration "PT3M53S"^^xsd:duration;
  dc:issued "1978-04-01"^^xsd:date
] .

However, there’s more we can say about this recording, for instance, that it was recorded at a particular time with various performers:

[ a mo:Performance;
dc:title "Studio recording of Stardust"@en-us;
dc:created "1977-12-12"^^dc:date;
mo:producer <http://dbpedia.org/data/Booker_T._Jones>;
mo:singer <http://dbpedia.org/data/Willie_Nelson>;
mo:performer <http://dbpedia.org/data/Chris_Ethridge>,
  <http://dbpedia.org/data/Paul_English>,
  <http://dbpedia.org/data/Booker_T._Jones>;
cme:expression <#stardust>
] .
<#stardust> a cme:Audio, mo:Signal .

We also know that Hoagy Charmichael composed the song “Stardust”

<http://dbpedia.org/data/Stardust_(song)> a mo:MusicalWork
dc:title "Stardust"@en-us;
mo:composer <http://dbpedia.org/data/Hoagy_Carmichael>;
db:created "1927-10-31"^^xsd:date
mo:performed_in [ a mo:Performance;
  dc:title "Studio recording of Stardust"@en-us;
  dc:created "1977-12-12"^^dc:date;
  mo:producer <http://dbpedia.org/data/Booker_T._Jones>;
  mo:singer <http://dbpedia.org/data/Willie_Nelson>;
  mo:performer <http://dbpedia.org/data/Chris_Ethridge>,
    <http://dbpedia.org/data/Paul_English>,
    <http://dbpedia.org/data/Booker_T._Jones>;
  cme:expression <#stardust>
] .
<#stardust> a cme:Audio, mo:Signal ....

We could continue this to describe multiple performances, expressions or encodings.

A particular encoding might appear on many different albums or playlists, so we can’t encode information such as track number with the cme:Audio. This is encoded in a Collection contained within an Release:

<> a cme:PrimaryRelease;
owl:seeAlso <http://dbpedia.org/data/Stardust_(album)>;
dc:title "Stardust"@en-us;
cme:displayArtist <http://dbpedia.org/data/Willie_Nelson>;
cme:parentalWarning "unspecified"^^cme:ParentalWarningType;
mo:grid "A1-a1788-aaaaaaaaaa-b"^^cme:GRid;
cme:presentation <js/authored.js>;
cme:audioCollection [ a cme:AudioCollection;
  dc:title "Songs"@en-us;
  cme:item [ a cme:Item;
    cme:itemNumber "1";
    cme:expression <#stardust>
  ]
] .
<http://dbpedia.org/data/Stardust_(song)> a mo:MusicalWork ....
<#stardust> a cme:Audio, mo:Signal ....

There is much more that can be said about an album, including links to reviews, alternate performances, videos, photos and so forth. RDF provides an expressive mechanism for describing such rich metadata.

Discovery

RDF is based strongly on the notion of universal resource identifiers to identify particular resources or concepts. Using, the so-called follow your nose principle, a specific agent might use identifiers contained within a release to discover more information about a particular subject; for instance reviews of the album stored on DBPedia or elsewhere.

Extensibility

As a release described as an RDF Graph, using the principle of “Anyone can say anything about anything”, additional information can be authored about a given release. This might be useful for adding premium content such as extra audio tracks, music videos or concert photos. Moreover, consumers may choose to use a CME release as a creative starting point by creating alternate user interface skins, personal pictures or anything else they might be interested in; this is one aspect of the Connected aspect of The Connected Media Experience.

Social Web

Giving CME release elements URIs allows them to be used for other social activities, such as Activity Streams, Facebook “Like” operations, or other mechanisms.

Demise of CME Semantic Releases

In many ways, the music industry is not ready for many of the open aspects of an RDF format; the concept of using existing universal identifiers (such as DBPedia URIs) that they do not directly control can be a barrier, and they are not yet prepared to maintain their own publicly available repository of unique identifiers representing their artists, musical works and releases.

Artists are naturally concerned that their product is presented in a manner consistent with their original design intentions. Understandably, they want to insure that their intellectual product is portrayed as intended. However, this desire can come in conflict with the read/write web where fans often make use of authored material in mashups and other derived works. Coming to a reasonable understanding of fair use, and how this can be moderated remains an important challenge.

The industry has made great strides is in improving their use of ISRC identifiers, which in the past were not always reliable. ISRC, along with GRid, ISWC and ISNI identifiers can be useful in differentiating resources, and typically cannot be dereferenced. They are also not URNs, so they’re not appropriate to form an owl:sameAs relationship with, for example, DBPedia URIs.

The complexity of authoring packages using RDF formats given the lack of well curated metadata was a large complicating factor in CME moving away from an RDF representation. But equally daunting was the complexity of authoring packages using an RDFa description of the package. As originally conceived, much of a package presentation could be based on the raw HTML5+RDfa description of a release. CSS and JavaScript are extremely capable of creating amazing user interfaces. However, the general capability of web designers, as well as the complexity of authoring valid packages, really gets in the way of this. Until tools emerge that allow for the simple authoring of semantically-rich dynamic HTML5 presentations, this will likely remain an opportunity for future music publishers.

It’s important to note that nothing in CME excludes RDF and a rich set of metadata, and we may yet see CME releases that use the original design principles to achieve similar objectives. What won’t be there is a base-level of metadata in every CME release that platforms can depend upon for extending the basic experience.

Lessons for the Music Industry

The concept of rich music (media) releases in an era of pervasive access to free content is an ongoing issue for the music industry. CME is an attempt to provide consumers a reason to own their media, rather that obtain it alternatively. Providing rich curated data about subjects of interest to consumers can be one way in which future exists for content owners who legitimately need to profit from their artistry.

Giving up control about information, including the presentation of artistic works, is a barrier for music publishers. Existing contractual obligations do not necessarily align with the expectations of consumers.

Certainly, the web is full of bad data, and relying on an external service which does not provide content owners a reliable way of ensuring that data is essential. Even getting a handle on their own internal use of identifiers, for instance having a single identifier to describe performers on different releases, much less on performance that cross label boundaries, is a big challenge for legacy systems that were not intended to be used for curating publicly available information.

The major labels do work with metadata services to provide accurate information, and many retailers use these 3rd party information services, along with proprietary identifiers to provide consumers with limited metadata about music releases. For 3rd party information services, there is a cost to maintaining quality metadata, which often means that reliable information remains behind pay walls.

As mentioned above, where there is a large amount of metadata available through various open data sources, it is often of poor quality. Finding a balance of allowing for curation of such data by content owners is an important step in bringing about reliable rich metadata. To some, the lesson here is not one of control, but one of clean-up and publish: “if you don’t give them what they want, they’ll get it from someplace else”.

The very lack of quality metadata about musical releases from the major labels is responsible for the rise of several services that provide such information, for example Gracenote, AMG, MusicBrainz and FreeDB. Providing curated information about music releases in standardized RDF formats is a potential business opportunity for such companies.

Lessons for the Semantic Web Community

RDF was founded by academics to be logically consistent and rich. To a large degree, it continues to be dominated by academic interests. This has led to a rich and consistent representational format with very well thought out elements (e.g., entailment, inference, semantic equivalence, etc.). However, the pace of change can be slow and outreach to the open web community is not necessarily a priority.

There is a fair appreciation within the major music labels of the value and promise of RDF as a means of providing rich metadata. The fact is, though, that proprietary metadata formats are much simpler to implement and manage. According to a key opinion maker: “I bet the average developer can get a simple XML-based music metadata system up and running in less time than it would take to read the Music Ontology document. We can get most (all?) of the benefits of RDF through simpler means.” To be fair, closed world systems are easier to implement and manage; a goal of RDF is to allow for datasets to be shared and mixed, doing so requires shared vocabularies and representations.

RDFa is one example of an RDF technology that came from the open web community and has had astounding uptake. It is estimated that ~4% of all web content now includes some amount of RDFa description 1. But, other areas in need of standardization (e.g., JSON RDF representation) remain mired in controversy and/or apathy.

Given HTML’s strong support for lists (e.g., ol, ul, dl, …), it is amazing that RDFa has no basic markup support for RDF lists. Even if it did, RDF lists are based on linked-lists, rather than flat collections. This makes it almost impossible to query an RDF graph to determine the constituent elements of a container such as a playlist or album without using higher-level semantic constructs (see Ordered List Ontology).

1: http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/

Published on Sat, 07 May 2011 01:21:00 GMT under , .

If you liked this article you can add me to Twitter

  • By zazi Thu, 05 May 2011 19:51:54 GMT

    I bet the average developer will run into the desaster of a more difficult data cleaning task with that simple XML-based music metadata system.
    Semantic Web technology-powered systems have higher initial deployment costs, but they are durable like the Web itself. So in the long term Semantic Web technologies will win!
    We cannot really make the things simpler as they are in reality. The description of music can be quite complex. See, e.g., this partial description of an album of the Beatles. So we have to respect that complexity if we like to represent this knowledge.


  • By John Wright Fri, 06 May 2011 00:52:04 GMT

    Very good and informative case study on RDF/Semantic Web application/adoption (or rather the lack of it). Thanks!


  • By Gregg Kellogg Fri, 06 May 2011 02:47:25 GMT

    Clearly, I really believe that the long-term benefits of using good semantic markup are worth it, and many in the music industry do as well. We may not see this as a standard base-level markup for music releases, but I believe we will see semantically rich music, video and ebook content before too long.


  • By David F. Flanders Fri, 06 May 2011 07:52:41 GMT

    So what do you make of central stores of data like #linkedbrainz that provides a standarised store of RDFa metadata for use, is that helpful or usable to any of the above mentioned stakholders. Thanks for article, very interesting.


  • By Henri Sivonen Fri, 06 May 2011 08:10:38 GMT

    Thank you for taking the time to write this.

    One comment about a bit that makes things look better than they are:

    RDFa is one example of an RDF technology that came from the open web community and has had astounding uptake. It is estimated that ~4% of all web content now includes some amount of RDFa description

    This is rather misleading. As pointed out by your source, this is largely attributable to sites wanting to integrate with three particular big sites: Yahoo!, Google and Facebook. The Yahoo! bit (SearchMonkey) has already been discontinued. Google and Facebook don’t actually implement RDFa processing as specified even though they use syntax that looks like RDFa. This makes it appear that something standards-based is going on even though for all practical purposes, what’s happening is that sites are using Google’s service-specific markup and Facebook’s service-specific markup to be ingested by Google-written code or Facebook-written code–not by serendipitous Semantic Web agents that implement RDFa as specced.


  • By mhepp@computer.org Fri, 06 May 2011 09:53:09 GMT

    Authoring RDFa patterns looses a lot complexity if you

    1. start modeling in Turtle syntax (http://www.w3.org/TeamSubmission/turtle/)

    and

    1. translate that into an RDFa snippet via the tool:

    http://www.ebusiness-unibw.org/tools/rdf2rdfa/

    Many advocats of RDFa don’t realize that the textbook-style of teaching RDFa, closely entangled with visible content, add a lot of unnecessary complexity.

    Martin


  • By Barry Norton Fri, 06 May 2011 13:44:54 GMT

    Thanks for the very interesting article.

    Just on one minor technical point: surely the problem with querying for membership of RDF lists, rather than sequences, is addressed by path expressions in SPARQL 1.1? I.e., it’s not a representational problem, but a problem of the richness of the query language.

    (Of course I agree on the wish for better support in RDFa, as per Turtle/n3)


  • By Gregg Kellogg Fri, 06 May 2011 16:55:38 GMT

    @david, as I note, the majors are concerned about the quality of information. Linkedbrainz doesn’t really address the quality issue, at least as far as the labels are concerned.

    I think having this data available in RDFa is great, but principally because it allows developers to find it and make use of it through mechanisms such as the forthcoming RDF API.


  • By Gregg Kellogg Fri, 06 May 2011 17:00:14 GMT

    @henry this is how standards come about (in the best circumstances) existing behavior is normalized and standardized. The fact that search engines find value in semantic markup makes it more likely people will use it to get noticed. Inevitably, there will be a move to make this data more uniform. Where service-specific markup has a problem is in the limited vocabularies they will process, assuming the values of well-known prefixes. Hopefully, this will be sorted out.


  • By Gregg Kellogg Fri, 06 May 2011 17:03:22 GMT

    @mhepp couldn’t agree more. Note my own RDF-RDFa serializer: http://rdf.greggkellogg.net/distiller. Anything beyond insertion of pre-formatted snippets requires algorithmic HTML+RDFa serialization. Thus the importance of being able to act on such markup using CSS and/or jQuery. As long as HTML is hand-crafted, the use of markup such as RDFa will have limited impact; it’s just too difficult to get it right.


  • By Gregg Kellogg Fri, 06 May 2011 17:05:00 GMT

    @barry I believe SPARQL 1.1 lists do create a syntax to allow for this, but the representation remains fairly expensive, at least in triple stores without built-in support for list entailment.


  • By Kingsley Idehen Fri, 06 May 2011 17:50:44 GMT

    Imagine this scenario.

    The issue had nothing to do with RDF or RDFa and more to do with structured data about music.

    Structured data would boil down to the following:

    1. Every Data Object has a URI based Name

    2. Every Data Object has Representation — an Entity-Attribute-Value Graph

    3. Every Data Object’s Representation was accessible from an Address (e.g. an HTTP URL).

    Now based on the above, go look at any form of structured data produced by any of these music companies or broader Web 2.0 plays, and tell me if you don’t see the very pattern outlined above modulo:

    1. RDF or RDFa

    2. HTTP URI based Names — so most have a UUID or some other conventional unique identifier

    3. Entity-Attribute-Value based Graph for Object Representation.

    The most important thing we need at this juncture is structured data that’s accessible via Web Addresses.

    Until the opportunity costs of Linked Data become palpable to Music Industry decision makers, nothing will happen, and nothing should happen.

    What do I take from this post? Confirmation of practical reality 🙂


Comments are disabled

SPARQL Algebra

For those intrepid enough, I’ve pushed version 0.0.2 of sparql-algebra. It relies on unreleased changes to RDF.rb and sxp-ruby, so you need to use bundler with the included Gemfile.

SPARQL Algebra implements the s-expression-based SPARQL algebra described in SPARQL 1.1 and Jena. Remaining work needed for _describe_ operator and query optimizations. This is the base for translation from SPARQL Grammar [4], which requires just a bit more work to be fully compliant. Both of these, along with support for an HTTP endpoint and solution serializer, will be sufficient to implement a complete SPARQL solution in pure Ruby.

SPARQL Algebra passes all but four W3C DAWG tests (data-r2), with those four not being worth implementing, in my opinion. As an example of an SSE based on the SPARQL grammar, consider the following:

PREFIX  foaf:  <http://xmlns.com/foaf/0.1/>

SELECT ?mbox ?name
 {
   ?x foaf:mbox ?mbox .
   OPTIONAL { ?x foaf:name  ?name } .
 }

which is equivalent to the following SSE:

(prefix ((foaf: <http://xmlns.com/foaf/0.1/>))
   (project (?mbox ?name)
     (leftjoin
       (bgp (triple ?x foaf:mbox ?mbox))
       (bgp (triple ?x foaf:name ?name)))))

There are outstanding pull requests to RDF.rb and sxp-ruby that are required to release it to RubyGems, but you’re encouraged to play with it and send feedback!

Thanks to Arto and Ben for the initial work they did on this, and other enabling projects, as well as creating an excellent executable test suite!

Update

SPARQL::Grammar now complete, generating SPARQL::Algebra classes, allowing a complete end-to-end SPARQL solution for Ruby.

RDF::RDFa, RDF::RDFXML, and RDF::N3 0.3.0 releases

The Nokogiri-based reader suite for the RDF.rb environment. This version offers substantial performance gains, due to general improvements in RDF.rb as well as a number of improvements in the readers:

General improvements

  • Readers save prefix definitions in :prefixes. Writers use :prefixes, or :standard_prefixes to generate QNames.
  • Readers supports :canonicalize and :validate options

RDF::N3

  • New parser based on Tim-BL’s Predictive Parser supports quoted graphs and variables.
  • Stream-based reader can process an indefinite length input file, vs. the older Treetop-based reader that was a two-pass parser.
  • Substantial performance improvement over previous version, running at about x statements/second on an iMac.
  • From History:
    • New Predictive-Parser based N3 Reader, substantially faster than previous Treetop-based parser
    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
      • In Writer
        • Use only :prefixes for creating QNames.
        • Add :standard_prefixes and :default_namespace options.
        • Use “”” for multi-line quotes, or anything including escaped characters
      • In Reader
        • URI canonicalization and validation.
        • Added :canonicalize, and :intern options.
        • Added #prefixes method returning a hash of prefix definitions.
        • Change :strict option to :validate.
        • Add check to ensure that predicates are not literals, it’s not legal in any RDF variant.
    • RSpec 2 compatibility

RDF::RDFXML

    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
    • In Writer
      • Use only :prefixes for creating QNames.
      • Add :standard_prefixes and :default_namespace options.
      • Improve Writer#to_qname.
      • Don’t try to translate rdf:_1 to rdf:li due to complex corner cases.
      • Fix problems with XMLLiteral, rdf:type and rdf:nodeID serialization.
    • In Reader
      • URI canonicalization and validation.
      • Added :canonicalize, and :intern options.
      • Change :strict option to :validate.
      • Don’t create unnecessary namespaces.
      • Don’t use regexp to substitute base URI in URI serialization.
      • Collect prefixes when extracting mappings.
    • Literal::XML
      • Add all in-scope namespaces, not just those that seem to be used.
    • RSpec 2 compatibility

RDF::RDFa

    • RDF.rb 0.3.0 compatibility updates
      • Remove literal_normalization and qname_hacks, add back uri_hacks (until 0.3.0)
      • Use nil for default namespace
    • In Writer
      • Use only :prefixes for creating QNames.
      • Add :standard_prefixes and :default_namespace options.
      • Improve Writer#to_qname.
    • In Reader
      • URI canonicalization and validation.
      • Added :canonicalize, and :intern options.
      • Change :strict option to :validate.
      • Don’t create unnecessary namespaces.
      • Don’t use regexp to substitute base URI in URI serialization.
      • Collect prefixes when extracting mappings.
    • Literal::XML
      • Add all in-scope namespaces, not just those that seem to be used.
    • RSpec 2 compatibility