Migrating content into SharePoint 2010 from your legacy CMS

Trucking content from your legacy CMS into SharePoint 2010 - here are the options and the gotcha's.

Gotta love a content management system! Reminds me of when I used to go to the public library when I was a kid. Well, no, not really. But a CMS is way more convenient than a file share, and beats the living daylights out of a floppy disk tray.

But while a CMS may be the one thing that truly unifies a large organization into a true IT enterprise, it also introduces complexities of a whole new kind, on a whole new order - and these complexities are at their most frustrating when this year's CMS becomes next year's, and it's a different CMS.

Most of these CMS migration from A to B, these days, are into SharePoint from something else, SharePoint having jumped out in front of the CMS pack. And that suits me just fine, since SharePoint is my thing these days. But SharePoint has its issues, too, when playing catch with another CMS. No magic answers here - just a heads-up or two, if you're facing a CMS-to-CMS migration.

Metadata blues

The single biggest gotcha in CMS migration is usually metadata. The content of your legacy CMS is well-tagged with metadata that its users spent years working out, and as you get ready to scoop a few hundred thousand files out of the legacy system and into SharePoint, you find out - oopsie, the metadata isn't coming across. Or only some of it is coming across. Or (worst of all) it's coming across wrong.

This is going to happen more often than not, so get ready. You'll have a stretch of R&D time ahead of you, figuring out what works and what doesn't, and what your metadata gaps are going to be. Plan for it. And in doing so, consider these additional points:

  1. There's third-party software out there that can (sometimes) get this done for you, depending on what your legacy CMS is (AvePoint is an example);
  2. There are third-party vendors out there who will step in and get all of the metadata moved, which of course will cost you something (KnowledgeLake is an example);
  3. If your organization was really, really dumb with its data structures (say, for example, that network file shares were replicated in the legacy CMS, with a result of endless file folder trees), then much of your metadata is implicitly embedded in those structures, and you're screwed; there's no way to move that implicit metadata but to manually encode it as you move (time-comsuming for your users, which they will for sure not appreciate);
  4. Finally, a warning: if your organization chooses to move files out of the legacy CMS and into SharePoint manually (say, through bulk copy), SharePoint is going to plug in its own metadata; all the bulk-copied files will be tagged as owned by the person doing the copying, and their creation date will be the time of the move; in addition, you may lose the Title column.

Better the second time around

Any enterprise's first CMS is likely to be similar to one's first car or first romance - a starry-eyed endeavor that lacks quite a bit in execution. Put simply, a legacy CMS, if it was an organization's first, was probably not very well organized.

Note this well: SharePoint is built on SQL Server, so the rules of SQL Server efficiency apply to SharePoint. And your legacy CMS was built on something that isn't SQL Server (LiveLink, for instance, is built on Oracle), so different rules of storage efficiency applied.

Your back-end CMS configuration, then, in a SharePoint migration, is an opportunity to restructure content for optimal performance, post-migration. How does this work? You should parse what's coming over into several buckets: active content (frequently referenced files), peripheral content (files referenced less frequently; read-only content; very large media files), and archival content (almost never referenced but required to be available per compliance rules).

You can then arrange your SQL Server storage optimally, with active content in core storage,  peripheral content in a separate content database, and archival content in SAN storage, accessed through SQL RBS (look it up, it rocks).

What you leave behind

Finally, a quick word about legacy CMS files: you probably don't need to bring every last one of them across. In fact, there could be thousands, or even tens of thousands of files that could go into cold storage and never be entered into your SharePoint CMS at all. Why pack it along if it isn't going to be used?

On the other hand, the only ones who can decide which files are useless and unworthy of migration are, well, the legacy CMS users, and it's a time-consuming thing to sit down and root through everything. Not your problem - just something to consider. Some people just can't throw out their old magazines.