Every once in a while, a server performance situation comes along that can still surprise me. I just recently ran into one of these.
Our client called and reported that they were experiencing significant performance problems on their somewhat older Sitecore 6.2 site. The site has been up and running for quite a long time now, and they haven’t had problems like these before, so this was something new. There had been no recent code changes, and no major content additions, so the obvious culprits were out.
I took a look at their main site and their conference site, both of which were coming out of the same instance of Sitecore. The main site was really slow, and the conference site was so bad that it was occasionally timing out. Ok, it was time to put on Danger Zone and get down to business (the Top Gun soundtrack was the first CD I ever bought, don’t hate).
Today was the first day of one of their major conferences, which meant that lots of eyeballs were going to be on the site, which also means that, of course, this is when a problem would occur. The conference wasn’t causing too much of a traffic spike (by the numbers) though, so it didn’t even seem like a traffic increase was really to blame.
Since the client hosts the site on servers they own themselves, they first took the smart step of throwing some more resources at the problem by dialing up the RAM and CPU available to the virtual machine. This didn’t really help too much though, and the problem persisted. Eventually they called us in to take a look.
The first thing I took a look at was the worker process. After the cursory app pool reset, and seeing that the problem wasn’t clearing up, I used IIS’s “Worker Processes” tool (available from the server node inside IIS Manager) to see what was going on. Of course, it was the process running the site we were worried about that was causing the problem, and drilling down further to the process allowed me to see what requests it was working on serving up.
Something looked funny here though. Most of the longest running requests all had a state of “SendResponse”.
This indicated that it wasn’t IIS having a hard time generating pages; it was having a hard time sending them to users.
A little Googling about the SendResponse state showed me that a similar circumstance occurred for a site that was being mostly used by users on poor mobile connections.
Ahh, so that’s it!
I’ve been to a number of conferences, and a constant at them is terrible wifi availability. Either connections are spotty, or you’re sharing them with 1000s of your fellow attendees. Since most of the traffic to the conference’s site was coming from the users at the conference, it stood to reason that these poor connections were causing the large number of responses stalled in the SendResponse state.
Since it was an older site, it was running under .NET 2.0, with classic pipeline mode, and 32 bit applications installed. This meant that all of the modern niceties of performance improvements (like support for more than 1.5 gigs of RAM) aren’t available. All this meant that the responses, even for users on good connections, were being clogged up by the responses trickling out to the users stuck on the horrible wifi.
So, how did we solve this?
I presented this information to the client, and that information made its way back to the convention center’s management who, all day, had been insisting that “there was nothing wrong with their wireless.” They finally acquiesced to investigating it and found out that, lo and behold, their router was legitimately broken and needed to be replaced.
Once that properly working hardware was in place, everything started working right again on the server.
It really does go to show you the huge number of variables that could be causing poor website performance, many of which might not even be under your control. Thank goodness IIS has some good tools for inspecting what’s going on with itself.