Recovering SCCM Site from a Failed Bad Backup

Posted on

Helped a client recently migrate their existing SCCM environment to new hardware.  We ran into some challenges and thought it would be good to share how we were able to work through the problems to get SCCM successfully migrated.

Going from:

  • Win2k3 Standard x86
  • SQL 2005 x86SP2
  • SCCM 2007 SP2 R2

Going to:

  • Win2k8 R2 Datacenter (x64 of course)
  • SQL 2005 x64SP4
  • SCCM 2007 SP2 R2

At a high level, the operations to migrate a site are:

  • Perform a backup, then shut everything down.
  • Replace the hardware and ensure the configuration is identical – drives, names, paths, etc.
  • Install the same software and pre-requisites
  • Install Configuration Manager using the same settings and paths.
  • Run the repair wizard from the shortcut on the menu and perform a restore. (not from the console)

However, the site repair wizard was failing on the first step of verifying the backup path. The GUI said that the SQL backup files and ConfigMgr inbox files are out of sync and the file stamps are different. Additionally, the RepairWizard.log file has several instances of “Initializer {GUID} will no be run, unsupported application type”.  Additionally, SMSbkup.log states “Backup task completed successfully with zero errors but there could be some warnings, AFTERBACKUP.BAT will be started if available in its predefined location”.  However, looking at the logs more closely above, I see line after line of errors.  Such as:

  • Error: Failed to backup \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\SMS\BackupTemp\SMSbkSiteRegNAL.dat up to D:\SMSBackup\Backup\SiteServer:\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\SMS\BackupTemp\SMSbkSiteRegNAL.dat is not readable.
  • Failed to copy file(s) Backup\SiteServer.
  • Error: Failed to backup \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\SMS\BackupTemp\SMSbkSiteRegSMS.dat up to D:\SMSBackup\Backup\SiteServer:\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\SMS\BackupTemp\SMSbkSiteRegSMS.dat is not readable.
  • Error: Backup Failed  for Component – Backup\SiteServer\SMSbkSiteRegSMS.dat.

So this showed that the site backup was truly not successful and was incomplete.  Upon comparing the contents of the directory against a known good backup in another environment, the backup was missing the following items

  • SMS/NAL registry key backups (as a .dat)
  • logs
  • 1/3 of the inboxes, including the site control
  • data folder
  • srvacct folder

Essentially to workaround the problem and move forward, I

  • Copied those folders straight from the old server’s installation directory (IMPORTANT: Do not recover the srvacct folder!! More info below)
  • Detached the “new” site databases in SQL and attach the “old” databases
  • Ran the site repair wizard EXCEPT selecting to not restore the database

Except for the restore of the SMS/NAL registry keys, the site restore seems to have worked at that point and the site is functioning (activity, inventory, SWD, reporting, etc.).  However, it was still critical to get the registry keys imported.  On the old site, I had exported those registry locations (HKLM\Software\Microsoft) and tried to just import them directly on the server (NOTE that since I was going from a 32-bit OS to a 64-bit OS, I had to a bulk search/replace to add Wow6432Node into the path).  The import action was blocked/prevented.

So, the recommendation was to boot the server into safe mode and then import the registry items.  So we did that but continued to get access denied problems with the SMS key.  So I started a process of elimination by cutting the registry file in half each time until finally we identified the problematic key that was causing the whole value to not import.  The guilty key?  The Certificates location – HKLM\SOFTWARE\Wow6432Node\Microsoft\SMS\MP\Certificates – which is logical that this would cause the entire import to fail.  Which is OK because installing a new MP will generate new cert keys.

Now the site is finally up and running.  Right?  Wrong!  When attempting to use run a task sequence, I received the issue as described in KB2509330 because of restoring the srvacct folder.  The “resolution” is to rebuild the entire server over again from scratch, which is not a good idea because of the effort to get this far.  Fortunately, I had a file system backup of the VM such that the original srvacct folder could be restored.  That backup saved me from having to start from the very beginning!  So, not overwriting the srvacct folder is critical.

Other items needing resolution after the migration

  • Recreate any boot media afterwards with the new site certificates
  • Reinstall an SMP to fix a cert mismatch
  • Fixing client certs by running “ccmsetup.exe RESETKEYINFORMATION=TRUE”

SCCM certificates are like sand, they get into everything :-)  Anyhow, this was quite a process to go through.  MORAL OF THE STORY?  Make sure you have good and complete backup of your site before migrating to new hardware!

Advertisements

4 thoughts on “Recovering SCCM Site from a Failed Bad Backup

    Smita Carneiro said:
    May 14, 2012 at 8:32 pm

    We had a similar issue with the srvacct account. We ended up calling up MS. We had to take out the. Accounts that were used by Configuration Mgr,do a site reset, and then renter the accounts.
    Task Sequences ran after that.

    […] I’ve been asked to create a site recovery wizard step by step guide with screen shots (for a child primary server restoration). It’s pretty straight forward if you’ve good backups in place (except one or two hiccups explained below). If you don’t have a good backup, it’s going to hit to you very badly and you may need to work extra hours to fix the issue. Here is the experience from Nicolas Moseley – Recovering SCCM Site from a Failed Bad Backup. […]

    […] I’ve been asked to create a site recovery wizard step by step guide with screen shots (for a child primary server restoration). It’s pretty straight forward if you’ve good backups in place (except one or two hiccups explained below). If you don’t have a good backup, it’s going to hit to you very badly and you may need to work extra hours to fix the issue. Here is the experience from Nicolas Moseley – Recovering SCCM Site from a Failed Bad Backup. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s