31. August 2008 06:37
Recently I was troubleshooting slow performance of customer's server. After going through regular check such as amount or RAM, processor power using task manager that didn't reveal anything useful I run Performance Monitor (perfmon).
By default perfmon shows three counters (on Windows server 2oo3) and one of them is "Average disk queue length".
Looking at the picture below, you can see (highlighted and circled in the green) that average disk queue length was over 3 almost all of the time.
Average disk queue length indicates the average number of both read and write requests that were queued for the selected disk during the sample interval. In other words, on this particular server, there are more requests for reading and writing operations that server can handle. Browsing through different recommendations, anything higher than 1 should be investigated as potential bottleneck and should be investigated.
At this point I was getting somewhere with this server. Since disk queue length was high I decided to check if this hard disk was badly defragged (picture below) and I was proven right.
To explain this a bit further. This is a dedicated mail relay server sitting in DMZ, constantly receiving e-mails (and a lot of spam) from the internet. This means it is constantly receiving small files, writing them to hard drive, forwarding them to internal mail servers and then deleting them from the hard disk. This amounts to a lot of reading and writing requests.
Investigating further I discovered that "badmail" folder hasn't been cleaned out in a very long time. It contained more then 100.000 (small) files.
- Emptying badmail folder
- Performing defrag on the hard drive -- numerous times
- I created a batch job that runs several times a day cleaning out "badmail" folder
- I created a batch job that is running defrag on the server every night
Since making these changes while server is still under a lot of stress it is performing much better then before.