Disk performance issues can be hard
to track down but can also cause a wide variety of issues. The disk
performance counter available in Windows are numerous, and being able to
select the right counters for a given situation is a great
troubleshooting skill. Here, we'll review two basic scenarios -
measuring overall disk performance and determining if the disks are a
bottleneck.
These can be measured in perfmon with the following counters:
Here are the results on a test VM. In this test, diskspd was used to simulate an average mixed read/write workload. The results show the following:
The specific perfmon counters are:
Here are the results on a test VM. In this test, diskspd was used to simulate an IO-intensive read/write workload. Here is what the test shows:
Generally speaking, the performance tests can be interpreted with the following:
Measuring Disk Performance
When it comes to disk performance, there are two important considerations: IOPS and byte throughput. IOPS is the raw number of disk operations that are performed per second. Byte throughput is the effective bandwidth the disk is achieving, usually expressed in MB/s. These numbers are closely related - a disk with more IOPS can provide better throughput.These can be measured in perfmon with the following counters:
- Disk Transfers/sec
- Total number of IOPS. This should be about equal to Disk Reads/sec + Disk Writes/sec
- Disk Reads/sec
- Disk read operations per second (IOPS which are read operations)
- Disk Writes/sec
- Disk write operations per second (IOPS which are write operations)
- Disk Bytes/sec
- Total disk throughput per second. This should be about equal to Disk Read Bytes/sec + Disk Write Bytes/sec
- Disk Read Bytes/sec
- Disk read throughput per second
- Disk Write Bytes/sec
- Disk write throughput per second
Here are the results on a test VM. In this test, diskspd was used to simulate an average mixed read/write workload. The results show the following:
- 3,610 IOPS
- 2,872 read IOPS
- 737 write IOPS
- 17.1 MB/s total throughput
- 11.2 MB/s read throughput
- 5.9 MB/s write throughput
Disk Bottlenecks
Determining if storage is a performance bottleneck relies on a different set of counters than the above. Instead of looking at IOPS and throughput, latency and queue lengths needs to be checked. Latency is the amount of time it takes to get a piece of requested data back from the disk and is measured in milliseconds (ms). Queue length refers to the number of outstanding IO requests that are in the queue to be sent to the disk. This is measured as an absolute number of requests.The specific perfmon counters are:
- Avg. Disk sec/Transfer
- The average number of seconds it takes to get a response from the disk. This is the total latency.
- Avg. Disk sec/Read
- The average number of seconds it takes to get a response from the disk for read operations. This is read latency.
- Avg. Disk sec/Write
- The average number of seconds it takes to get a response from the disk for read operations. This is write latency.
- Current Disk Queue Length
- The current number of IO requests in the queue waiting to be sent to the storage system.
- Avg. Disk Read Queue Length
- The average number of read IO requests in the queue waiting to be sent to the storage system. The average is taken over the perfmon sample interval (default of 1 second)
- Avg. Disk Write Queue Length
- The average number of read IO requests in the queue waiting to be sent to the storage system. The average is taken over the perfmon sample interval (default of 1 second)
Here are the results on a test VM. In this test, diskspd was used to simulate an IO-intensive read/write workload. Here is what the test shows:
- Total disk latency: 42 ms (0.042 seconds is equal to 42 milliseconds)
- Read latency: 5 ms
- Write latency: 80 ms
- Total disk queue: 48
- Read queue: 2.7
- Write queue: 45
Generally speaking, the performance tests can be interpreted with the following:
- Disk latency should be below 15 ms. Disk latency above 25 ms can cause noticeable performance issues. Latency above 50 ms is indicative of extremely underperforming storage.
- Disk queues should be no greater twice than the number of physical disks serving the drive. For example, if the underlying storage is a 6 disk RAID 5 array, the total disk queue should be 12 or less. For storage that isn't mapped directly to an array (such as in a private cloud or in Azure), queues should be below 10 or so. Queue length isn't directly indicative of performance issues but can help lead to that conclusion.
No comments:
Post a Comment