See my SQL-only BLOG at http://blogs.mssqltips.com/blogs/chadboyd Fragmentation Station #5 - What it impacts (performance), and what it doesn't

Fragmentation Station #5 - What it impacts (performance), and what it doesn't

by chad 2/18/2008 12:14:00 AM

In post #4 in the series, we talked about ways to avoid each type of fragmentation - in this 5th post in the series we'll discuss how each type of fragmentation impacts performance, what to expect in terms of impacted operations, and when you possibly might not see any impact from fragmentation at all. As with earlier posts again, we will not be covering things we discussed from each of the 4 prior posts, so consider reviewing posts 1, 2, 3, and 4 if anything seems fuzzy.

So, potentially the most common concern and question I get from customers regarding fragmentation after explaining some of the internals, types, what causes it, how to avoid it, etc., is something like "Well, that's great to know, but what's the bottom line? What performance impact does it cause, and why/when should I be concerned?". A good question indeed - and the answer isn't always what many people expect, or what they've been told in the past. As with anything in technology, the real answer always depends on a given scenario, workload, infrastructure, hardware, etc. combination, but there are some generalizations that can be made as rules-of-thumb so to speak. Instead of working through each of the different types of fragmentation, I'm instead going to just point out generalizations to keep in mind for the different types of operations performed against different structures in typical workloads and how they are impacted by each of the different types of fragmentation - additionally, I'm not going to discuss file-level fragmentation much, given that it is a) hard to reproduce consistently and b) usually not an issue in the majority of server installations today. Let's get to it:

  • Most fragmentation only impacts physical IO operations, not logical operations within cache - i.e., if you have a very fragmented system, but the entire database fits in cache/buffer pool, you will not be impacted by fragmentation (with the exception of page density as we discussed in prior posts) during typical operations - obviously, you'd still see a difference for the initial cache warm-up, and possibly for some checkpoint operations. Assume for a moment that you have a system where a given index can:
    • Fit entirely in cache
    • Will remain there for the duration of the service operation

In this scenario, the difference in performance between the index being 99% fragmented vs. 0% fragmented (assuming the same or very similar page densities) would be minimal. The only times you may see a performance impact would be during the initial cache warm-up (i.e. the first and only time the index data was pulled from disk into cache) and potentially during some checkpoint operations.

  • Singleton seeks into an index are not impacted by fragmentation (even seeks that require IO operations) - take the same scenario as outlined in the bullet point above, and you'll end up with a similar result - singleton seeks against a heavily fragmented index vs. the same seeks against a minimally fragmented index would result in very similar performance results. If you think about the way a singleton lookup works, it makes perfect sense - the storage engine will need to traverse and touch the same number of pages to get to a given record within a leaf page no matter where on disk the pages reside. Scans (full and range) are however impacted very significantly - more so with logical fragmentation than extent fragmentation, however noteworthy with both
  • Allocating a single index or heap across multiple files can lead to similar fragmentation-like side effects as extent fragmentation (i.e. interleaving allocations for multiple indexes among each other). Use of multiple files should be managed appropriately.
  • Heaps are a flat structure without a linkage of any sort between pages - this means that heaps are not impacted by logical fragmentation at all, ever (they can't be since there is no logical ordering of the pages). Additionally, given that they are flat structures, they have no root/intermediate pages (like a b-tree) and hence, obviously can't incur fragmentation due to non-leaf page organization. Of course, they also don't support seek operations directly (need an associated nonclustered index for that).
  • Advanced storage systems available today (high-end SANs, large scale DAS systems) can minimize the effects of fragmentation on performance - many have large cache's where many reads are fulfilled from, generally read requests to a single LUN are spread across 10's to 100's of spindles, and the fabric/hardware that carry the data are very, very performant. In one case study I was involved with a few years back, fragmentation impact on smaller IO systems pushed upwards of 250% degradation, whereas the same workload against a large, enterprise scale SAN system only impacted performance by 30%.
  • Data Warehousing/OLAP/DSS type workloads will typically suffer performance impacts to a much greater degree than that of traditional OLTP type workloads (primarily due to the point discussed in the first bullet). DSS type workloads typically perform very large reads on large amounts of data where large performance benefits can be achieved when data in contiguous, allowing the read-ahead manager to perform many large reads ahead of the CPU processing.

That's a good start for each, in the next post, I'm going to provide a script to walk-through on your own machines that you can use to try and validate/understand some of the discussions from above with

[sig]

Currently rated 5.0 by 3 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

Related posts

Powered by BlogEngine.NET 1.1.3.8
Theme by Mads Kristensen

About the author, Chad


Chad is an Architect, Administrator, and Developer with technologies such as Sql Server (and all related technologies), Windows Server, and Windows Clustering. He currently works with Gratis Internet Inc., an online marketing firm based in Washington, DC (www.gratisinternet.com), and also spends a significant amount of time writing, talking, presenting and blogging about Sql Server in the local DC area and online at mssqltips.com and his personal site, chadhoc.net. In the past, Chad has worked with companies and organizations such as Microsoft Corporation and The American Red Cross, and provided consulting/support services at companies such as Bank of America, HP, Citigroup, Qualcomm, Scottrade, TJX, SunTrust, and Zurich Financial Services. For over 2 years with Microsoft Corporation Chad was responsible for providing onsite and remote support, guidance, and advice with SQL Server products to some of Microsoft’s foremost enterprise customers running the largest, most complex SQL Server installations and configurations in the world. This included all SQL Server products and versions, including SQL Server 7.0, 2000, and 2005, the SQL Server database engine, Reporting Services, SSIS/DTS, Notification Services, and Analysis Services on both 32 and 64 bit systems. Chad's primary responsibilities today include troubleshooting critical server situations, performance tuning and monitoring, disaster recovery planning and execution, architectural guidance for new Sql Server related deployments, and delivering deep technical workshops/presentations/proof-of-concept sessions covering a variety of technologies and functionality.

E-mail me Send mail

Calendar

<<  July 2009  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Pages

    Recent posts

    Recent comments

      Authors

      Categories


        Archive

        Blogroll

        Download OPML file OPML

        Disclaimer

        The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

        © Copyright 2009

        Sign in