Content versus data

Content is text, in our handbook it’s structured text, which can also contain data. Let me show you an example what I mean. Let’s say this is an extract from an encyclopedia:

”As of 1 January 2015, the population of China was estimated to be 1 300 000 000 people.”

We need to maintain this encyclopedia. We want to spare time and also we want to make it highly consistent so therefor we try to automate as much as possible. So in this use case we identify the date and the population figure as data embedded into the text. We mark them with special markup and also add a unique identifier to make them addressable. Something like this:

As of <date id=”china-population-date”>1 January 2015</date> the population of China was estimated
to be <population-figure id=”china-population-figure”>1 300 000 000</population-figure> people.

The syntax is really not important here, it simply demonstrates how it can be marked up. It adds semantics. Before we added this markup, this semantics was implicit - obvious for a human reader, but not for computers. This markup makes the implicit semantics explicit, easy to process for computers.

Now if we have an external source (for example a database or an internet service we trust, which has the updated population figures) then we can keep this article up-to-date automatically without any human interaction.

Working with content it’s very often the case that we have data mixed into the content. Then it’s usually worth to mark it up properly, since then we can

  • update
  • extract
  • validate
  • style it for publishing

results matching ""

    No results matching ""