Tag your website with RDFa according to Schema.org’s guidelines
Tag your website with RDFa according to Schema.org’s guidelines
RDFa (Resource Description Framework in Attributes) is a type of data format recommended by the World Wide Web Consortium (W3C) for embedding RDF statements in HTML, XHTML, and various XML dialects. Programmers use the framework (RDF) to further specify web content with metadata. Doing this helps programs, like web browsers or search engine crawlers, grasp semantic contexts, making the RDF model a fundamental technology of the semantic web. As an XHTML module, RDFa was developed in 2004 and later became the W3C-recommended embedding framework in 2008. Version 1.1 of RDFa was released in 2012 and is compatible with both HTML and HTML5. A lightweight version, called RDFa Lite also appeared on the market at the same time.
RDF in HTML
RDFa is just one of many ways to annotate RDF statements. In the course of this process, Resource Description Framework expressions are embedded into HTML in order to rehash web content written for human users into machine-readable structured data. This function makes RDFa comparable to other data formats designed for semantic tagging, such as Microformats or Microdata. However, unlike these other names, RDFa only defines the metasyntax for semantic tagging purposes. Describing these semantic contexts with metadata requires a unified vocabulary, and programmers have a wide selection to choose from, including options like: FOAF, SKOS, Dublin Core, or SIOC. Google, Bing, and Yahoo! recommend tagging in accordance with Schema.org –a vocabulary developed as a part of a joint project from the search engines in order to standardise structured data.
RDFa markup in practice
When embedding metadata, RDFa specifications introduce a host of new attributes that expand the functions of various document tagging languages (e.g. HTML, XHTML, or HTML5). The following list displays the attributes of the RDFa Lite subset:
Attribute | Description |
---|---|
vocab | The vocab attribute defines the vocabulary that forms the basis of tagging elements with RDFa (e.g. Schema.org). |
typeof | With the typeof attribute, elements are assigned certain themes (i.e. ‘types’) according to the selected vocabulary. |
property | This attribute assigns properties to elements. |
resource | The resource attribute allows programmers to assign individual terms, known as ‘identifiers’, to different elements. |
prefix | The prefix attribute gives programmers the possibility of specifying more than one vocabulary in cases where the initial vocabulary isn’t sufficient for the desired tagging. |
HTML tags without individual semantics are a good choice for RDFa attributes. This is why metadata is often found in div or span tags. Essentially, RDF statements can be integrated via RDFa into all HTML tags. This is generally carried out by using the basic schema found below:
Basic RDFa syntax schema:
<div vocab="http://Schema.org/" typeof="Schema">
<span property="Characteristic">text element</span>
<div>
Labelling a mailing address with RDFa
The following code shows a statement of contact data in classic HTML format as seen on countless web pages:
HTML code of a mailing address
<p>
Google Inc.<br>
P.O. Box 1234<br>
Mountain View, CA<br>
94043<br>
United States<br>
</p>
While human visitors instantly recognise that this <p> tag-defined text paragraph refers to address information, programs like web browsers and search engine crawlers require additional metadata in order to pick up on the information’s meaning:
RDFa markup of a mailing address:
<p vocab="http://Schema.org/" typeof="PostalAddress"><br>
<span property="name">Google Inc.</span><br>
P.O. Box <span property="postOfficeBoxNumber">1234</span><br>
<span property="addressLocality">River View</span>,<br>
<span property="addressRegion">Lancashire</span><br>
<span property="postalCode">12021</span><br>
<span property="addressCountry">United Kingdom</span><br>
</p>
In line 01, the HTML tag, <p>, works as a basis for the RDFa attributes ‘vocab’ and ‘typeof’. In this specific case, programs that read code tagged in this way are able to recognise two distinct bits of information: that all the entities contained within the <p> tag are in accordance with Schema.org’s vocabulary and that they have also been assigned to the type, ‘PostalAddress’. As per Schema.org, each type can be assigned specific properties. With regard to the example above, the RDFa attribute ‘property’ is implemented in order to make the address details machine readable. Here, the address details ‘name’, ‘postOfficeBoxNumber’, ‘addressLocality’, ‘addressRegion’, ‘postalCode’, and ‘addressCountry’ with their corresponding values are tagged as properties of the type, ‘PostalAddress’. This allows programs that read out HTML code in order to assess how information like ‘Google Inc.’ or ‘94043’ should be interpreted.
Labelling web content with RDFa for Rich Snippets
Above all else, structured data exists to help facilitate web searches. Site owners who semantically tag information on their homepage enable search engines to extract this data and display it with the help of features like Rich Snippets. These contain excerpts of web content that are displayed in the search engine result lists (SERPs), allowing them to stand out more. To this effect, semantic annotation contributes to a website’s search engine optimisation. The market leader, Google, supports RDFa markup for Rich Snippets for data types like recipes, user reviews, software, and news articles. Rich Snippets for videos are only supported in more current formats, like Microdata and JSON-LD. And for events, Rich Snippets need to be tagged with JSON-LD. Below is an example on how web content can edited for Rich Snippets:
Tagging product reviews with RDFa
Product reviews that appear in the SERPS as Rich Snippets generally contain product names, an image of the product, a rating (usually in a 1-5 star format), and user reviews (which include the author’s name, titles and dates). The following code shows how this information is tagged for machine readability using RDFa:
RDFa-Markup of a product review:
<div vocab="http://Schema.org/" typeof="Product">
<img property="image" src=" productphoto.jpg" alt="image description"/>
<span property="name">product name</span>
<div property="review" typeof="Review"> Review:
<span property="reviewRating" typeof="Rating">
<span property="ratingValue">5</span> -
</span>
<b>‘<span property="name">Review Title</span>‘</b> by
<span property="author" typeof="Person">
<span property="name">author name</span>
</span>, written on
<meta property="datePublished" content="2006-05-04">May 4 2006
<div property="reviewBody">review text</div>
<span property="publisher" typeof="Organisation">
<meta property="name" content="publisher name">
</span>
</div>
</div>
In the first line of code, Schema.org is tagged as the markup vocabulary. The ‘typeof’ attribute defines the lines 01 to 18 as belonging to the standard type, ‘Product’. Schema.org’s vocabulary allows products to exhibit a host of properties. In the following example, the product is assigned and semantically labeled: a name (property=name), an image (property=‘image’) and a user review (property=‘review’). RDFa syntax allows properties to be described as types, which can then also be assigned their own further individual properties. To this effect, the property ‘Review’ in line 04 can be defined as the type ‘Review’ and then be further specified as the example below shows:
Excerpt:
<div property="review" typeof="Review"> Review:
Many user reports contain star-themed reviews. In order to make these machine-readable, the type ‘Review’ is assigned to the property ‘reviewRating’. This property can then be described as the type ‘Rating’ by using a further property called ‘ratingValue’ (lines 05 to 06).
Snippet
<div property="review" typeof="Review"> Review:
<span property="reviewRating" typeof="Rating">
<span property="ratingValue">5</span> -
Further properties of the type ‘Review’ are: title (property=‘name’), author (property=‘author’), date of publication (property=‘datePublished’), review text (property=‘reviewBody’), and information about the publisher (property=‘publisher’). The properties on author and publication information can again be defined as specific types (e.g. people or organisations) and provided with further properties (e.g. ‘name’). It’s important to take into account that every secondary typeof attribute within the HTML tag is nested within the primary typeof attribute.
This is just one example of the complexity of the RDFa markup. While it enables detailed annotations, using it proves to be significantly more complicated than more modern data formats, like JSON-LD.