Publishing pipes

It’s somewhat similar to uptranslation, but actually it’s the other way around, this is “downhill”, downtranslation. We go from a structured information to unstructured, formatted documents. It’s a much easier direction, no need for a heuristical approach, it’s exact and quite well controlled.

Sketching a possible pipe example:

  1. fetch content
  2. enrich content with data, link info
  3. merge / split
  4. index or table of content generation
  5. format conversion
  6. test / validation
  7. persist content

XML content publishing pipes are conceptually similar to UNIX-like pipelines. Programs are chained, the first program’s output is the input for the second one, while the second one’s output is the input of the third one… etc. This way we can split a complex process to small - relatively simple - subprocesses / actions what’s easier to test and maintain, also we can reuse the common, generic actions in other pipes.

Usually we have one pipe per content type per publishing platform, so if we deal with a large number of content types in our CMS (our content is diverse) and target many publishing platform, then we have to count on many publishing pipes, which can be challenging to build and maintain. The challenge here is not complex algorithms (if we manage to split the process to small, basic steps), but the business logic has a large number of small details, which is in itself challenging to have properly documented and kept in mind during the maintenance.

The only plausible solution is to build a test framework (unit tests) parallel with the pipeline implementation. At the time when we implement a feature, we make test data and a unit test for it. This is the way we test the implementation correctness at the first place, also we keep testing it the same way if we have to make a bugfix or add new features.

Just as for the uptranslation, it can be a great advantage to go for a standard XML pipe framework, such as XProc, instead of reinventing the wheel.

