5.4 Migration scenarios

5.4.1 Table archiving

This is the simplest scenario. Our source data comes from a relational database, extracted “as is” without remodelling, serialised (for example) as XML.

The whole ETL process for table archiving can be implemented in a generic way, independent from the database vendor and its table schema. We need to specify where the database is, eventually add some filters if we don't want to archive everything and the process should just work for any database and any arbitrary data.

The challenge starts when we want to define user access to this data, typically implementing some searches (for example using XQuery). For this, we need deep knowledge of the data, also have to know the user expectations.

This search will do then two things:

  • filters the data according to the search criteria
  • aggregate records - collects information which likely spreads amongst many "table XML" documents

The good news: if the user requirements are not clear at the point of archiving, a new search can be added any time later, if the technical level data knowledge is still available.

5.4.2 SIP archiving

The data source can be

  • aggregated
  • non-aggregated

The already aggregated data might require further conversion to fit it into a specific schema, but it should be relatively simple.

The different aggregation scenarios I’ve already detailed in the previous chapter.

While record aggregation is not trivial, it has to be done at some point. Either before we load the data into the archive system (SIP archiving) or if we decide to archive the relational data "as is", then providing end-user access to the data will require more work, since the searches have to do the aggregation on the fly. ... and in real time, which also raises performance concern. If the data set is small, then it's not an issue of course.

SIP archiving has other advantages also. Aggregated records are more self-contained, which makes incremental/live archiving easier. Also, data management is easier; old records can be moved to a slower storage platform. Eventually, different retention policy can be applied on individual (SIP) packages.

results matching ""

    No results matching ""