Posts from — March 2011
Time Is An Illusion, Anyway
This week has been rough.
I’ve been working through a new strategy that promises to make our company’s deployments to production at least ten times faster than our current process. While it’s going to make things easier for us in the end, getting there is anything but easy.
I’ve gotten permission errors.
I’ve hit ACL problems.
I’ve come across registry access violations.
I’ve had writeable UNC shares turn read-only.
I’ve seen application pools die and IIS refuse to start.
I watched installers fail to run remotely, successfully run locally, then fail to run locally under the exact same circumstances two minutes later.
I even had UAC end up locking my account because it decided that I’m not trustworthy enough to enter a password for a installation tool from an unelevated command prompt.
It’s been a never ending stream of failure and frustration.
I thought I’d seen it all. Then today happened.
I left the office last night feeling really good about where things were at. I finally worked through all of the permission problems, all of the path access issues. Everything was running under the right accounts, and stuff was flowing through the system and installing flawlessly.
Put a checkmark on the list, I’m done!
I planned on coming in this morning, updating some release instructions, then moving on to the second piece of the process. The second piece is very similar to the first, so I figured I’d breeze through the day and have everything up and running in time to leave work during rush hour. I want to preface the release instructions with a link to the progress I made, so I go to the site that installed flawlessly last night and try to pull it up.
IIS Yellow Screen Of Death
Ah crap. That sucks. I sort of expected something like that, though. We’re doing things that are wild and new for our company, and most of the problems we’re seeing are the result of inexperience with the technology. I figured the site would have died sometime in the middle of the night for some reason or another. I figured that it would be something we’d just overlooked, something easy to fix.
I didn’t figure on getting an error message that was last seen in 1997.
An error occurred loading a configuration file: Failed to start monitoring changes to ‘[filename]’ because the network BIOS command limit has been reached.
…
The network BIOS command limit has been reached…? WTF?
Am I running low on “System Resources” now, too?
What in the hell does that mean? Network BIOS command limit…
I do a search and immediately find many other people with the same problem, all of whom are saying that the KB article the error message points at is completely useless. Instead, they say you have to pull a registry setting out of thin air, and that’ll solve your problems. ((Under HKLM\Software\Microsoft\ASP.NET, create a new value called “FCNMode” and set it to 2 to change the way IIS monitors file change notifications, or set it to 1 to disable the notifications altogether. Other sites suggest the same value, but under the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\ASP.NET key instead. Try both. Then reboot.)) Turns out that the way IIS monitors file system changes on UNC shares can clog the tubes, and that magical reg key tries to unclog them. Eventually, I get it to work and carry on with the tasks of the day.
It all goes fairly well and smoothly. As I had hoped, the experience from setting up the first piece of the deployment process translated very closely to setting up the second piece. It only took two hours to get through the same amount of the process that had taken three days the first go around. And most of that two hours was spent keeping detailed instructions so that next time only takes half an hour and the time after that can be fully automated.
All is well.
In the very last part of the setup, I had to install a Windows component, then reboot the box. ((Why they haven’t figured out the whole rebooting thing for the case where I just installed a minor application, I don’t know, but whatever…)) I give the box a minute, then try to remote back in. Terminal Services Client blinks at me, but does nothing. I try again and again it blinks. A third time, and I get a failed to connect message. Fine, maybe it’s still rebooting. I give it another minute and try again.
Remote Desktop cannot connect to the remote computer because the authentication certificate received from the remote computer is expired or invalid. In some cases, this error might also be caused by a large time discrepancy between the client and server computers.
Uh, okay… Never seen that error before. Did the network flip out, is the server confused? Maybe I just have to give it a bit more time before trying to connect again.
So I give it another minute. It gives me the error again.
The second sentence intrigues me. A large time discrepancy, eh? Hmmm…
Last year, during a brief foray into temporal mechanics, I learned all about how Windows deals with time and time synchronization. One of the things I found was a command line tool in Windows called “w32tm”. That program has a command line switch, “/stripchart”, which can show you how far off your system’s clock is from that of another system. Normally, when dealing with a pair of computers on a corporate network, tied to a domain, you’ll find that the clocks will only differ by a few seconds at most. You can run w32tm /stripchart and watch as the two machines drift around in time, speeding up, slowing down, and dancing around the time synchronization point. Try, for instance, “w32tm /stripchart /computer:time.windows.com”, and see where you are in comparison to the Windows time server.
Anyway, w32tm’s stripchart seemed like the perfect tool to investigate this potential “large time discrepancy”. What would it be, an hour, two hours, maybe even a day?
The current time is 3/3/2011 4:52:43 PM. 16:52:43 d:+00.0020000s o:-31607977.8967224s [@                         |                         ]
The current time is correct. I ran it at 4:52 PM on March 3rd, 2011. “d:” is, I believe, the “delay”, the round trip time between the servers. 2 ms response time. Not bad. And “o:” is the time offset, in seconds, between your computer and the remote computer.
o:, in this case, is -31.6 million seconds.
In case you don’t feel like doing the math, 31.6 million seconds is one year, 19 hours, 59 minutes, and 37 seconds.
The computer rebooted and came back one year, 19 hours, 59 minutes, and 37 seconds in the past. I think I’d agree with the error’s assessment of a “large time discrepancy”.
The computer now believes that it is March 2nd, 2010, at around 8:50 PM. And I can’t convince it otherwise. If I go on the box and try to change the time manually, it immediately corrects itself to March 2nd, 2010.
Now, ordinarily, I’d say that such an error was the result of time synchronization service gone awry, or maybe some failing system hardware. However, given the events of this past week and all of the strange behavior that I’ve seen, I believe it is equally likely that that particular computer has actually travelled back in time and is running in our datacenter exactly one year, 19 hours, 59 minutes and 37 seconds ago.
Now, if you’ll excuse me, I have a hole in time to exploit and profit from.
March 3, 2011 No Comments