Wednesday, January 2, 2008

Ops Manager 2007 Agent not connecting

I was in a Ops Manager Deployment last week and when we pushed the Agent to the Exchange Server, it got stuck in the Pending view and refused to move. The Operating System was Windows Server 2003 SP1 and had Windows Installer 3.1

Checking the logs at the Agent, we noticed the following error messages:

Event ID: 21016
OpsMgr was unable to set up a communications channel to server.domain.com and there are no failover hosts. Communication will resume when server.domain.com is both available and allows communication from this computer

Event ID: 21001
The OpsMgr Connector could not connect to MSOMHSvc/server.domain.com because mutual authentication failed. Verify the SPN is properly registered in the server and that is properly registered in the server and that, if the server is in a seperate domain, there is a full trust relationship between two domains

Event ID: 20057
Failed to initialize security context for target MSOMSvc/server.domain.com. The error returned is 0x80090303 (The specified target is unknown or unreachable). This error can apply to either the Kerberos or the SChannel package.

We verified that both servers are part of the same domain and DNS lookups were fine.

We then proceeded to check the SPN IDs and found out conflicting records.

To support mutual authentication, the server registers Service Principal Names that are tied to either the computer account or the user account. In this instance somehow, SPNs got registered with the OpsMgr server account as well as the OpsAdmin user account.

To solve this, I did the following:

  1. From the domain controller, open a command prompt and then type the following string: ldifde -f domain.txt
  2. Open the text file in Notepad and then search for the SPN that is reported in the event log. ServiceClass/host.domain.com (in this case look for MSOMHSvc/server.domain.com)
    Note the user accounts under which the SPN is located and the organizational unit the accounts reside in

Use one of the following options to delete the account SPN registrations from the accounts that should not contain registrations to ServiceClass/host.domain.com. (i.e. Typically any accounts containing an SPN registration for SeriviceClass/host.domain.com that services are not explicitly starting with). Make sure you know which credentials you want to keep (in this case the system account or the domain administrator) and see to it that the service is running with the credentials you want to use. Delete the other one.

Using ADSIEdit

  1. Add ADSIEdit to the MMC and bind to the domain using the Domain well known naming context.
  2. Navigate to each user account you previously documented (for my case, I went to the opsadmin user account)as having a duplicate SPN registration and right click the account and select properties.
  3. Scroll through the list of attributes until you see servicePrincipalName, double click servicePrincipalName and remove the duplicate SPN registration and click on OK and exit ADSIEdit.

Then I proceeded to restart the Health Service on the Agent and wallahhhh!!! connected!..

A similar explanation can be found at: http://www2.wolzak.com/index.php?option=com_content&task=view&id=15&Itemid=2

1 comment:

Unknown said...

Are you familiar with setting up a gateway server through NAT routing? I have the same events in the Ops Mgr log and the perimeter folks say they see communication going on between the gateway and RMS.

I'm opening a premier support incident on Monday so I thought I'd give the www one more chance.

Thanks

Ron Hagerman
ron@rons-sandbox.com