Yesterday was a very very bad day on the sysadmin front. A mistake which I made during the setting up and configuration of an iMac, the day before, resulted in one of the longest system downtimes I had ever to get myself out of, the very next day.
What follows is a description of events in chronological order, of the cause, effect and resolution of one of the most difficult to solve problems I ever encountered. Starting with:-
- The iMac. We have acquired a bunch of iMacs for use in the company and one of the customisations we are implementing on every unit is the integration into our own Windows Server Active Directory setup. The setup on the iMac in question was no different from all the others, except for one particular detail: by mistake I entered the Domain Controller’s name in the place where you should set the workstation’s NetBios name. This happened at about 1600HRS. It’s also fair to note that no side-effects were noticed at first. In fact I could still normally and users were unaware of any problems.
- Tuesday Morning. Also known as the ‘day when all hell broke loose’. Coming to work I found that users couldn’t connect to the shares on the server. Said server was behaving erratically. Shares were offline, logging into it took ages, DNS service installed on machine was offline too.
- Issues affecting resolution. A major problem was the server’s login time. It was literally taking ages – about 40 minutes from the moment you enter the admin password to the time you actually see the desktop. This meant that every reboot and continuation of problem solving practices where taking longer than usual.
It is interesting to note that one of the symptoms that was encountered had to do with DNS forward lookup zones stored in AD. These were not accessible throughout the whole problem. A noted bug in Windows, specifically this one here mentions this and even the long login times as symptoms. It had thrown me off track completely until I realised it was not the solution I was looking for.
Anyway, to cut the long story short, the iMac had replaced the server’s SID with it’s own, thereby causing an endless source of problems for the Active Directory’s authentication and replication mechanisms.
Eventually most of these problems were resolved by removing the offending iMac from AD, and re-establishing the DC’s as the owner of the SID. This was accomplished by logging into the server and running the netdom.exe and changing the Kerberos password for the machine’s domain account.
Read More »»