Tuesday, April 12, 2016

Sitecore content migration strategies

When migrating an existing website onto Sitecore, one of the biggest considerations is around the migration of content. In cases where multiple sites are merging into one, or a multisite environment it can become even more tricky. When it comes to the migration there are two main points to consider, the migration of the content itself and the URL structure after the fact. Even in cases where content URLs remain exactly the same, the media library items (PDFs in particular) will most likely be different.

Initial planning

This first step with content architecture in Sitecore is to identify all of the different unique content types. It's these types that will form the data templates, and early planning can help identify similarities and build an object orientated structure. The benefit of object orientated template structure in Sitecore, is the use of base templates to provide common fields (which means less field duplication). In terms of content migration, the less unique fields, the less tricky it can become. Once you have defined your data structure (in the form of templates) it's time to plan out the architecture of the content itself on the site. A website migration such as this is a great time to hold workshops amongst the business to decide on the best architecture moving forward. It may mean slight reorganisation once the content is brought across but can lead to easier management and a better experience for users. There are many different ways to architect the content on your website, it all comes back to the type of website being built and quite often the structure of the company behind it. If you get the top level architecture right, you can even use it as a facet for search results in Sitecore.

Migrating the content

With content migration there are multiple ways that it can be achieved, and each comes with their own pros and cons.

Manual migration

Manual migration involves content editors moving all of the content from the old site to the new site by hand. This method can generally be the most time consuming of the bunch, however is best suited for websites with smaller amounts of content where automated migration would not save time.

  • Will often lead to a cleaner media library architect where only items which are used are brought over and organisation can be much tidier.
  • It gives a chance to have all content reviewed in terms of fact checking, spell checking and ensuring consistency across the website.
  • It gives content editors more exposure to Sitecore and in turn they are more confident in the system.
  • More time consuming than automated migration.
  • Can be difficult to evenly split content, so multiple people handling the same content types might lead to inconsistencies.
  • Content might be missed in these instances.
  • Delays in content creation can and will affect development.
  • Content editors using key testing servers may mean less test deployments for code.
  • How will links from content to content be handled? If a page being linked to hasn't been created yet.
  • Setup any URL strategies mentioned below at the same time to stop double handling of content.
  • If workflow is enabled, temporarily add a skip button which automatically sets and item to approved/published.

Automated migration

Automated migration is a no-brainer when it comes to websites which have large amounts of content. There are tools out there that specialise in this, however the Sitecore API is simple enough that developers can easily setup custom migrators with business logic included.

  • Faster than manually creating the content.
  • Can automatically spell check all content from a single source.
  • Logic can be built to re-create all content to content links.
  • Any URL strategies may mean extra "content" to create and more development effort.
  • Developers can get carried away building tools (12 hours of effort for a saving in 3 hours content editing).
  • Paid migration tools may not be customised to your needs.
  • Paid tools may cost more than development effort for custom tools.
  • When testing migration tools, set a limit of X items to be moved. Nothing worse than running one for several hours only to see fields mismatched.
  • Don't forget to take into account the image width/heights when migrating into media library. You don't want a thumbnail appearing as full resolution because you simply moved the source and didn't record attributes.

Hybrid approach

A hybrid approach is a great way to get the best of both worlds when it comes to content migration. Large data sets such as news article, media release and blog posts are great candidates for migration. Where key pages might benefit from the pros of a manual migration.

  • In the middle in terms of time to move the content across.
  • Gives content editors more exposure to Sitecore and in turn they are more confident in the system.
  • Potential for duplication of content.
  • Potential for content to be missed.
  • How are shared media library resources handled by both content editors and migration tools?
  • Perhaps an automated migration with manual checking on key pages is a better strategy?
  • Let the automated build run first, that way the content editors have access to any migrated media items.

Link structure changes

So now that all of your content has been migrated over to a shiny new Sitecore install, the only question remaining is how different the link structure is. If you are really lucky, you will have been able to utilise the same link structure in Sitecore as the previous CMS, however for those of you dropping file extensions (*gasp* .php for example) theres some work to be done. In fact it's almost guaranteed that some work will need to be done when it comes to URL redirecting, not only due to site structure changes but the inevitable PDF link changes. There are a number of options available to handle these URL redirects:
  • Sitecore URL rewrite/redirect modules: On the Sitecore marketplace there are quite a few rewriting/redirecting modules available of varying complexity. URL Rewrite is a favourite of mine due to the rich feature set (regular expressions and exact matches) and because it uses 301 redirects which are recommend by Google - best for SEO.
  • Custom item resolvers: Sitecore uses an order of precedence when trying to resolve a web page or media item. So once Sitecore has attempted to find the item (and fails) and before it hits a 404 not found you can inject your own custom resolver for finding the item. Potential use cases are:
    • Business logic to find the correct page
    • Use Lucene search index to find the correct page or suggest multiple pages
    • Redirect URLs from a specific content type to the new area or item (based on name)
  • Custom media handlers: In one case I used a custom media handler to handle for PDFs not found by Sitecore's media handler. It made use of a Lucene index of all PDFs and then searched this index to check for an exact match based on the file name. PDFs were served up for any matches and 404s were raised for any not found.
It's always an awkward moment for new Sitecore users to see the 404 page as the highest entry page/exit page in Sitecore's experience analytics...


Migrating content to Sitecore and handling the resulting changes in link structure with Sitecore is no easy task. Even with automated content migration strategies, there will often be extra manual work in redirecting the old URLs to these new items. Each implementation is different and the best generalised advice I can give is to spend more time in the early stages planning these two factors, this is what leads to better outcomes.

No comments:

Post a Comment