Single-source, multi-channel publishing
The content source is generic, not authored specifically to any publishing platform, but it can be published to any. Both for human and machine consumption.
Some years back the typical data and content flow was limited to the domain of a company: they authored the content and published on paper or on the company website for human readers. Nowadays the automated data and content flow goes through across companies, therefore it’s not only made for human consumptions, but also for machine readers.
The XML editorial content has to be converted to different publishing formats, but before we do that, usually we have a publishing XML in between. The editorial XML is clean, non-redundant and independent of any publishing platform. This latter is also true for the publishing XML, which simply contains what’s common between the different publishing formats (web, voice, paper… etc). We get the publish XML by enriching the editorial XML with
- explicit link information needed to generate hyperlinks
- data from external sources
- content merge from other content types
- granularity change: split or merge of the same content type
- generated text, for example indexes or table of content
The following diagram is a simplified view:
Maybe it’s not apparent, but paper publishing might also require a bit different XML, what we can load into FrameMaker, InDesign which will then produce PDF or use XSL-FO to create that. Quite recently people also started to use CSS for print media.
There is an XML content flow here, which has several stages requires slightly different XML. There is a lot in common, so obviously we don’t create separate schemas for each stage, instead we build a modular schema and the different stages makes the structure layers.
I don’t recommend to use different schema languages for the different layers. Keeping the structure “in-sync” between different schema languages is non-trivial and quite cumbersome. In fact there is no good way to automate it nor verify if they’re consistent.
Nowadays it’s not enough to expose the content in human readable formats, but there is a great demand for machine readable access.:
- other companies want to integrate and reuse the content in their workflow / publishing
- search engines want to “understand” what the content is about
So what’s the difference between the human and machine interfaces? Exactly the same what we’ve been talking about a lot in this book: machines need explicit semantics. Human readers have background knowledge to properly interpret text, machines are dumb.
Web search engines don’t simply want to present a list of web pages which match the search criteria, instead they want to answer questions. For this they have to understand the most possible. The simplest way of doing this if the content has semantics.
The web page describes one specific pharmacy store:
- phone number
- opening hours
… as HTML of course, which is just formatted text, still there are several ways to add semantics to HTML:
Then the search engine using this information can directly answer questions like:
- Pharmacy in Oslo downtown?
- Where is the nearest open pharmacy?
Maybe it’s needless to say, if we’re on the XML editorial platform, adding semantics to our HTML pages is really easy.
Machines don’t need to read HTML, they can read directly our publishing XML. How to expose this? One way of using web services. SOAP is enterprise-ish, but REST is slowly becoming the preferred one for many. The request has input parameters and then we get data back, XML, JSON… whatever format the service defines. Also the input parameters are defined by the service. It’s almost like a proprietary query language.
There are also standard query languages to fetch data for 3rd parties and it seems the world goes towards this. For data it’s SPARQL and for XML it’s XQuery. The great deal with these is that I don’t have to invent my own query language (... and change it every second day because it never gets perfect), instead I just expose my data. Anyone understand my data can also query it in a very flexible way.