SQL Server Storage andIndex Structures Physical Data Organization Indexes B-Trees SQL Server Data Access Clustered and Non-Clustered Creating, Altering, Dropping Indexes Choosing your Indexes Maintaining your Indexes
SQL Server Storage Hierarchy • Database • Extent • 8 contiguous 64K data pages • Once extent full, next record will take up a whole additional extent. • Pre-allocating space saves time.
SQL Server Storage Hierarchy • Page • 64K bytes • # of records/page varies with bytes/record • Types of Pages: Data and Index pages • Page Split • When page becomes full, it splits • New page allocated: ½ data from old page moved to new • Rows • 8060 bytes and 1024 columns
B-tree Key Points to Remember • Tree portion includes key attributes only • ordered as in create index statement • Keys packed in index pages • Fewer bytes per key -> more keys per page/extent -> fewer page faults per access. • Clustered indexes have records at leafs • Records are in data pages • Data pages sequentially linked • Non-Clustered indexes point into heap or tree portion of clustered index
Create Index Statement • Create [unique] [clustered | nonclustered] index <indexName> on <table or view> (col. Name> [asc|desc] [,…]) include (<col name> [,…]) (with … pad_index, fillfactor, ignore_dup_key drop_existing, statistics_norecompute sort_in_tempDB, online, allow_row_locks, allow_page_locks, maxdop
Create Index Details • Asc/Desc • Ascending & descending sort order for index • Include (cool!) • Includes col in leaf nodes of clustered index • Allows very fast access to non-key attribute • Useful with very large record – fewer page faults
Create Index “with” Details • Pad_Index= (on|off) • Initial fill-factor for index’s non-leaf pages • Fill Factor = <1% – 100%> • Default is index pages are as full as possible minus two records • Fill factor is how full after index is created • Once split goes to 50% • Ignore_dup_key • Circumvent unique key constraint somewhat • Still get error message, but no rollback • useful for storing unique values but trashing transactions
Create Index “with” Details • Drop_Existing • Any existing index with same name is dropped with this create statement • More efficient than drop index followed by create for clustered index as no need to touch non-clustered indexes or data pages • Statistics_nonrecompute • Default: sql server automates the process of updating the statistics on tables/ indexes • This option says you will maintain stats • DON’T USE THIS!
Create Index “with” Details • Sort_In_tempdb • Only useful when tempdb on physically separate drive • Reads/write for sort compete with read/writes to write data and index pages • This make sense if and only if you understand disk writes - discussion • Online • Keeps table available to users while creating index – sounds good, but ….!!
Create Index “with” Details • Allow row/page locks • Don’t use unless really good • MAXDOP • Overrides system setting for max degree of parallelism while building index • How many processes are used to construct an index. MAXDOP sets limit on how many processors per operation. • Compare and contrast these terms
Create Index “with” Details • ON • Can store index separately from data • Space for index spread across drives • I/O for indexes not compete with physical data retrieval
XML Indexes • Indexes into XML data • Xml VERY unstructured • Column can be of type xml in sql server • Create index on xml column • Page 276 for more details
Implied indexes • created by some constraints • Primary Key • Unique • Can easily end up with duplicate constraints and not realize it
Deciding what indexes go where? • Indexes speed access, but costly to maintain • Almost every update to table requires altering both data pages and every index. • All inserts and deletions affect all indexes • Many updates will affect non-clustered indexes • Sometimes less is more • Not creating an index sometimes may be best • Code for tranasaction have where clause? What columns used? Sort requried?
Deciding what indexes go where? • Selectivity • Indexes, particularly non-clustered indexes, are primarily beneficial in situations where there is a reasonably HIGH LEVEL of Selectivity within the index. • % of values in column that are unique • Higher percentage of unique values, the higher the selectivity • If 80% of parts are either ‘red’ or ‘green’ not very selective
Choosing Clustered Index • Only one per table! - Choose wisely • Default, primary key creates clustered index • Do you really want your prime key to be clustered index? • Option: create table foo myfooExample (column1 int identify primary key nonclustered column2 …. ) • Changing clustered index can be costly • How long? Do I have enough space?
Clustered Indexes Pros & Cons • Pros • Clustered indexes best for queries where columns in question will frequently be the subject of • RANGE query (e.g., between) • Group by with max, min, count • Search can go straight to particular point in data and just keep reading sequentially from there. • Clustered indexes helpful with order by based on clustered key
Clustered Indexes Pros & Cons • The Cons – two situations • Don’t use clustered index on column just because seems thing to do (e.g., primary key default) • Lots of inserts in non-sequential order • Constant page splits, include data page as well as index pages • Choose clustered key that is going to be sequential inserting • Don’t use a clustered index at all perhaps?
Column Order Matters • (P#, S#, Qty) • P# S# together are primary key • One index that includes all columns is not useful in all situations! • Only end up storing data a second time. • Clustered index of P#S# not same as S#P# • P#S# can lookup P# fairly easily, but looking up S# requires a linear search. • S#P# can lookup S# fairly easily, but not P#. • Note that even though key of S#P# means can’t lookup P# quickly, are some advantages in include P# in key.
Dropping Indexes • Sometimes makes sense to constantly re-analyze situation and add indexes • DON’T FORGET TO DROP INDEXES!! • Big overhead for inserts and deletes • Always ask yourself: “Can I get rid of any of these?” • Drop INDEX <TABLE NAME> <Index name>
Index Tuning Wizard • Hopefully you will evolve to the point you don’t need to use this gadget • But still can be quite handy • Uses workload file generated using sql server profiler (ch 19) • Not ideal to depend on this tool, but it may make some suggestions that you have not thought of.
Maintaining Indexes • Page Splits • Insert/delete order and rate critical • Fragmentation • Not OS fragementation – e.g. defrag tool • Happens when database grows, pages split, and then data eventually deleted. • Btrees great on maintaining balance on insertions, but with deletes, can end up with many pages containing small # of records.
Fragmentation Problems • Wasted space • Sql server allocates an extend at a time • Could end up with an extent, containing single page, with single record. • Thrashing (way too many disk hits) • Could end up with page 1 of data on one extend, page 2 on another, page 3 on the first, page 4 on another, …. • Records all over the place • Bit better for inserts but really bad for reads!
Identifying Fragmentation vs. page splits • DBCC SHOWCONTIG • Page 283 • Demo with northwind
DBREINDEX & Fillfactor • DBCC DBREINDEX • Can drop index and rebuild • Usually best to use drop-existing • Completely rebuilds the index • If supply table name, rebuilds all indexes on table. • Re-establishes base fillfactors etc. • Strongly recommend disallow transactions while doing this. • Rebuilding is probably better.
Summary • Clustered indexes usually faster than non-clustered • Only place non-clustered indexes on columns with high selectivity (>95% of rows are unique on that column) • All data manipulation language statements can benefit, from indexes, but inserts, deletes, and updates are slowed. • Indexes take up space and require page hits.
Summary • Index used only if first column in index is relevant to query • Indexes can hurt as much as they help • Make sure don’t add one by accident. • Indexes can provided structured data performance to unstructured XML, but overhead involved.
Summary • Is there a high level of selectivity on the data? • if yes and is frequently target of where clause, then add index • Have I dropped indexes I no longer need? • Why not? • Do I have a maintenance strategy established? • Why not?
Critical Questions • Are there lots of inserts of modifications to this table? • If yes, keep indexes to minimum • Is this a reporting table? • E.g. not many inserts but lots of reports run many different ways • If yes, more indexes are fine. • Is there a high level of selectivity on the data? • If yes and is frequently target of where clause, then add index