Wednesday, January 2, 2008

Re-installing Operations Manager 2007

In a recent deployment, we encountered several wierd problems. Most of them not documented. So we decided to reinstall Ops Manager and.. try again. These were the steps that we took:

  1. We used the Clean Up tool provided in the Ops Manager Resource Kit. Also downloadable. (see my other post for the link)
  2. We also used the Clean up tool to remove the Agents.
  3. Then we used SQL Management Studio to delete the OperationsMangerDB and OperationsManagerDW

We then proceeded to reinstall OpsMgr 2007.

When we came to Installing Reporting, it game us an error that said.. SRS could not be validated. This is where i remembered a tool that is provided with OpsManager that you need to run to reset the Reporting Server because a different encryption key was used.

Go to the Support Tools folder in the OpsMgr CD and copy the ResetSRS.exe utility into your HDD. Then run either:

ResetSRS.exe MSSQLSERVER (if you had installed SQL using default instance), 'or'

ResetSRS.exe InstanceName (the instance name you installed)

You will then be prompted to enter a credential to be used for authentication. Enter it using this format: DOMAIN\username

After this, launch the Report Services Configuration Wizard. You should see that the Web Authentication section is producing an error. Click on that page and click Apply.

Then Exit.

Proceed to restart the server.

Now you should be able to install OpsMgr Reporting

Hope this helps

Ops Manager 2007 Agent not connecting

I was in a Ops Manager Deployment last week and when we pushed the Agent to the Exchange Server, it got stuck in the Pending view and refused to move. The Operating System was Windows Server 2003 SP1 and had Windows Installer 3.1

Checking the logs at the Agent, we noticed the following error messages:

Event ID: 21016
OpsMgr was unable to set up a communications channel to server.domain.com and there are no failover hosts. Communication will resume when server.domain.com is both available and allows communication from this computer

Event ID: 21001
The OpsMgr Connector could not connect to MSOMHSvc/server.domain.com because mutual authentication failed. Verify the SPN is properly registered in the server and that is properly registered in the server and that, if the server is in a seperate domain, there is a full trust relationship between two domains

Event ID: 20057
Failed to initialize security context for target MSOMSvc/server.domain.com. The error returned is 0x80090303 (The specified target is unknown or unreachable). This error can apply to either the Kerberos or the SChannel package.

We verified that both servers are part of the same domain and DNS lookups were fine.

We then proceeded to check the SPN IDs and found out conflicting records.

To support mutual authentication, the server registers Service Principal Names that are tied to either the computer account or the user account. In this instance somehow, SPNs got registered with the OpsMgr server account as well as the OpsAdmin user account.

To solve this, I did the following:

  1. From the domain controller, open a command prompt and then type the following string: ldifde -f domain.txt
  2. Open the text file in Notepad and then search for the SPN that is reported in the event log. ServiceClass/host.domain.com (in this case look for MSOMHSvc/server.domain.com)
    Note the user accounts under which the SPN is located and the organizational unit the accounts reside in

Use one of the following options to delete the account SPN registrations from the accounts that should not contain registrations to ServiceClass/host.domain.com. (i.e. Typically any accounts containing an SPN registration for SeriviceClass/host.domain.com that services are not explicitly starting with). Make sure you know which credentials you want to keep (in this case the system account or the domain administrator) and see to it that the service is running with the credentials you want to use. Delete the other one.

Using ADSIEdit

  1. Add ADSIEdit to the MMC and bind to the domain using the Domain well known naming context.
  2. Navigate to each user account you previously documented (for my case, I went to the opsadmin user account)as having a duplicate SPN registration and right click the account and select properties.
  3. Scroll through the list of attributes until you see servicePrincipalName, double click servicePrincipalName and remove the duplicate SPN registration and click on OK and exit ADSIEdit.

Then I proceeded to restart the Health Service on the Agent and wallahhhh!!! connected!..

A similar explanation can be found at: http://www2.wolzak.com/index.php?option=com_content&task=view&id=15&Itemid=2