Index

INDEX 

Clustered Index Structure:

Non Clustered Index Structure



* It is another database objects which can be used
          * To reduce searching process
          * To enforce uniqueness
* By default SS search for the rows by following the process called table scan.
* If the table consists of huge data then table scan provides less performance.
* Index is created in tree-like structure which consists of root,node and leaf level.
* At leaf level, index pages are present by default.
* We can place max 250 indexes per table.
* Indexes are automatically placed if we place
          * Primary key (clustered)
          * Unique        (unique non clustered index)
* We can place indexes as follows
          create [unique][clustered/nonclustered] index <indexName> on <tname>/<viewName>(col1,col2,....)
          [include(.....)]

Various Types of indexes available in SQL Server
  • Clustered
  • NonClustered
  • Unique
  • Columnstore
  • Index with included columns
  • Index on computed columns
  • Filtered
  • Spatial
  • XML
  • Full-text
  • Covering Index

Cluster index:
A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
EXP:
CREATE CLUSTERED INDEX IX_TestTable_TestCol1  
    ON dbo.TestTable (TestCol1);

Non-Cluster index:
A NonClustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the NonClustered index contains the NonClustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
EXP:
CREATE NONCLUSTERED INDEX IX_ProductVendor_VendorID  
    ON Purchasing.ProductVendor (BusinessEntityID);

Unique Index:
A unique index ensures that the index key contains no duplicate values and therefore every row in the table or view is in some way unique. Uniqueness can be a property of both clustered and NonClustered indexes.
EXP:
CREATE UNIQUE INDEX uidx_pid
ON Persons (PersonID);

Columnstore index:
A columnstore index is a technology for storing, retrieving, and managing data by using a columnar data format, called a columnstore.
A columnstore is data that's logically organized as a table with rows and columns, and physically stored in a column-wise data format.
A columnstore index can provide a very high level of data compression, typically by 10 times, to significantly reduce your data warehouse storage cost.

Reasons why columnstore indexes are so fast:
·         Columns store values from the same domain and commonly have similar values, which result in high compression rates. I/O bottlenecks in your system are minimized or eliminated, and memory footprint is reduced significantly.
·         High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in memory.
·         Batch execution improves query performance, typically by two to four times, by processing multiple rows together.
·         Queries often select only a few columns from a table, which reduces total I/O from the physical media.
EXP:
CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON SimpleTable; 
GO

Index with Included columns:
A NonClustered index that is extended to include nonkey columns in addition to the key columns.
Exp:
CREATE NONCLUSTERED INDEX IX_Address_PostalCode 
ON Person.Address (PostalCode) 
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID); 
GO

Filtered index:
is a new feature in SQL SERVER. Filtered Index is used to index a portion of rows in a table that means it applies a filter on INDEX which improves query performance, reduces index maintenance costs, and reduce index storage costs compared with full-table indexes.
When we see an Index created with some where clause then that is a FILTERED.

Exp:
CREATE NONCLUSTERED INDEX FIBillOfMaterialsWithEndDate 
ON Production.BillOfMaterials (ComponentID, StartDate) 
WHERE EndDate IS NOT NULL ; 
GO

Spatial index:
spatial index is a type of extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography.
EXP:
If we want to get the Employees whose Title is “Marketing Manager”, for that let’s create an INDEX on EmployeeID whose Title is “Marketing Manager” and then write the SQL Statement to retrieve Employees who are “Marketing Manager”.
CREATE NONCLUSTERED INDEX NCI_Department
ON HumanResources.Employee(EmployeeID)
WHERE Title= 'Marketing Manager'

XML index:
XML indexes can be created on xml data type columns. They index all tags, values and paths over the XML instances in the column and benefit query performance
EXP:
CREATE PRIMARY XML INDEX PXML_ProductModel_CatalogDescription 
    ON Production.ProductModel (CatalogDescription);

Full text index:
A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data.
EXP:
CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID); 
CREATE FULLTEXT CATALOG ft AS DEFAULT
CREATE FULLTEXT INDEX ON HumanResources.JobCandidate(Resume)  
   KEY INDEX ui_ukJobCand  
   WITH STOPLIST = SYSTEM
GO 

Covering Index:
A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.

------------------------------------------------------------------------------------------------------------------------------------------.


* It physically sorts the rows in the table.
* A table can have only ONE clustered index.
* Both data and index pages are merged and stored at third level (Leaf level).
* We can place on columns which are used to search a range of rows,

create clustered index pid_indx on prods(pid)

select * from prods

select * from prods where pid=2
insert prods values(3,'Pencils',500)

Note: A table without clustered index is called HEAP where the rows and pages of the table are not present in any order.


* It cannot sort the rows physically.
* We can place max 249 nonclustered indexes on table.
* Both data and index pages are stored seperately.
* It locates rows either from heap (Table scan) or from clustered index.
* Always we have to place first clustered index then nonclustered.
* If the table is heap the index page consists of
          IndexKeyColvalues              rowreference
* If the table consists of clustered index then index page consists of
          IndexKeyColValues             Clusteredindexkeycolvalues
* Nonclustered indexes are rebuilded when
          * Clustered index is created/droped/modified
Ex: Create nonclustered index on pname column of prods table.
          create index indx1 on prods(pname)
          select * from prods where pname='Books' -- check execution plan

* To disp indexes present on a table
          sp_helpindex <tname>
* To drop index
          drop index prods.pid_indx
* To disp space used by the index
          sp_spaceused prods


Using Included Columns in Non-Clustered Index

* We can maintain regularly used columns in non-clustered index so that no need that the SQL server should take data from heap or clustered   index.
* If the no of rows are more it provides better performance.
Ex:
--step1
USE AdventureWorks
GO
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID)
GO
--step2
SELECT AddressLine1, AddressLine2, City, StateProvinceID, PostalCode
FROM Person.Address
WHERE PostalCode BETWEEN '98000'
AND '99999';
GO


* Percentage of space used in leaf level index pages.
* By default it is 100%.
* To reduce page splits when the data is manipulated in the base table we can set proper FillFactor.
* It allows online index processing
          * While the index rebuilding process is going on users can work with the table.

Question: Where do you find the default Index fill factor and how to change it?
·         The easiest way to find and change the default fill factor value is from Management Studio, right-click the SQL Server and choose properties. In the Server Properties, choose Database Settings, you should see the default fill factor value in the top section. You can change to a desired value there and click OK to save the changes.
·                             The other option of viewing and changing this value is using sp_configure.

Page Split

-Due to regular changes in the table if the index pages are full to allocate memory for the index key columns SS takes remaining rows into new page. This process is called Page split.
-Page split increases size of index and the index pages order changes.

Fragmentation
Fragmentation can be defined as any condition that cause more than optimal amount of disk I/O to be performed in accessing a table or cause the longer disk I/O. Optimal performance of SELECT queries occurs when the data pages of tables are contiguous as possible and pages are fully packed as possible.Fragmentation breaks this rule and reduce the performance of the queries. Fragmentation can happen in two level. One is file system level fragmentation which is called as Logical/Physical Disk Fragmentation and Index level fragmentation. Each of them are described in the below sections.

 

Logical/Physical Disk Fragmentation

Logical fragmentation is the fragmentation of database file in the file system itself like any other files.This occurs when the file system is not able allocate contiguous space for database file.As a result, disk head has to move back and forth to read from the database files. SQL server is completely unaware about this kind of fragmentation and it is not possible to measure the logical disk fragmentation using any script. Logical disk fragmentation can happen due to various reason like
·         Placing database file in the same disk where other files (like OS files and other application files) are kept.
Frequent growth of the database file in smaller chunks.
To remove logical fragmentation we can use the windows fragmentation tool but note that we need to stop the SQL server while running the de-fragmentation tools.Otherwise the de-fragmentation tool will skip database file as it is used by the SQL server. 

The best ways to to avoid the logical fragmentation are:
·         Keep the database files in a separate disk isolated from other application files and log files.
·         While creating new database, estimate the size of database file and allocate enough space to avoid the frequent growth of data files.
·         Specify the database growth option to allocate larger chunks rather than small chunks frequently.

 

Index Level Fragmentation

Index level fragmentation comes in two flavors: Internal Fragmentation and External Fragmentation. If Index level fragmentation is high, it may prevents optimizer from using the available indexes in optimal way.

 

Internal Fragmentation

Internal fragmentation is measured in average page fullness of the index (Page density). A page that is 100% full has no internal fragmentation.In other words, internal fragmentation occur when there is empty space in the index page and this can happen due to insert/update/delete DML operation.Every index page can hold a certain number of records based on the size of the index, but that does not guaranteed that the page always hold maximum number records.  Internal fragmentation is normally reported as a percentage of fullness in bytes, not in records. An index page that has internal fragmentation 90% may be full in terms of record. The remaining 10% bytes of the pages may not be enough to hold one more record. In a 8KB  pages,  maximum of  8060 bytes can be used by data.Rest of space are used by page header and row offset array.Let us assume that we have index with fixed size of 100 bytes and the index has 800 entries. So we can can store 8060/100= 80 records per page by leaving 60 bytes empty as it is not enough to hold one more records and this index requires 10 pages to store the entire index structure.If you calculate the average fullness of this index, in ideal scenario it will come as 99.26%.



Let us assume that we are deleting the half of the entries randomly of this table which reduce the total number of entries in this index to 400.Now the pages will look like as given  in Fig 2 with total of 40600 bytes free space across 10 pages. If you calculate the the average fullness as Total data size*100/Total page size = 4000*100/80600= 49.62% . It clearly says that, half of the spaces are empty and the index has internal fragmentation. 



How Internal Fragmentation will affect the performance of the SQL server?

1.  Internal Fragmentation will increase the I/O. When you run queries that scan part or complete table/index, if you have internal fragmentation on that table/index, it causes additional page reads. In our example, the entire data can be stored in 5 pages. When the query needs to do index scan it has to read 10 pages instead of 5 pages. Which means 50% more I/O.
2. Internal Fragmentation reduce the efficiency of buffer cache.When indexes has internal fragmentation, it need more space to fit in the buffer.In our case this single index will use 5 additional pages to fit into the buffer which should have used to store other index pages. This will reduce the cache hit ratio.  In turn it will increase the physical I/O. It also increase the logical reads.
3.  This also increase the size of the database file. It need more disk space to store the additional pages and reduce the performance of Backup and Restore.

External Fragmentation

External Fragmentation happens when the logical order of the pages does not match the physical order of the pages. External fragmentation refers to the lack of correlation between the logical sequence of an index and its physical sequence. It is measured as the percentage of out-of-order pages in the leaf pages of an index. An out-of-order page is a page for which the next physical page allocated to the index is not the page pointed to by the next-page pointer in the current leaf page.Let us see the Fig 3 below. It is representation of index with three pages.Data is stored in sequential page. In other terms logical order and physical order are same and it store the index keys from 1 to 16 a.

All pages are completely full except the Page 3


While inserting the value 4 into the table it has to place in the Page 1 between value 3 and 5 but unfortunately Page 1 does not have any free space to occupy one more record. The only option is perform a page split by dividing the Page 1 evenly by leaving half of the data in Page 1 and moving half of the data to new page (Page 4). From Fig 4 we can understand that the logical order of the Page 4 is not matching with the physical order. External Fragmentation can happen due to various reasons:

1.  While allocating pages for new table, SQL server allocate pages from mixed extend till it reaches the 8 pages. There is possibility of having the first 8 pages from 8 different extents. 
2.  When all records are deleted from a page, the page will be de-allocated from the index (The de-allocation of pages will not happen immediately) which create gap and increase the fragmentation.
3.  Once object reached 8 pages size, SQL server will start allocating uniform extent to the objects.While allocation uniform extent to an index, next sequential extent to the current extent might be already allocated to other objects/indexes.

 

How External Fragmentation will affect the performance of the SQL server?


While reading individual rows, external fragmentation will not affect the performance as it directly go to the page and fetch the data.Unordered scans also will not affected by the external fragmentation as it use the IAM pages to find which extents need to be fetched. In the case of ordered index scan, external fragmentation might become a degrading factor for performance. The degradation of the performance is because the disk drive's heads have to jump around on the physical disk, rather than performing just contiguous read operations.Also note that external fragmentation will not affect the performance once the pages are loaded into the buffer pool.

avg_fragmentation_in_percent can have higher value due to various reasons:
·       SQL server storage engine allocates pages from mixed extent to a table or index till the page count reaches eight.Once the page count reaches to eight SQL server storage engine starts assigning full uniform extents to the index. So there is a possibility of having higher fragmentation for small table and rebuilding indexes might increase the fragmentation.For example, let us assume that an index has 7 pages and these pages are allocated from two mixed extent, while rebuilding the index there is possibility of allocating pages from more than 2 extents and maximum of seven extents which in turn increase the fragmentation. 
·     Even the pages are allocated from uniform extent, there is possibility of fragmentation. When the size of index grow, it need more pages in the non leaf level also.If last page allocated to leaf level is 250 and to accommodate more row in the leaf lever index structure might need a page in index level 1, then SQL server storage engine allocate page 251 to the index level 1 which create fragment in the leaf level.
·      Other common reason is the page split due to the DML operations. This I have explained well in my previous post.Rebuild/Reorganize  index may not be effective to fix fragmentation happened due to the fist two reason, but it can reduce the fragmentation caused by the page split or delete operation. 

In our environment we follow the index maintenance as given below:
§  20 to 40 percentage of fragmentation is handled with reorganizing the index.
§  All index which has more 40 percentage fragmentation will considered for rebuild
§  Index which has less than 1000 pages will be ignored by the index maintenance logic.
§  Index which has more than 50K pages and fragmentation between 10 and 20 will also be considered for Reorganize.

Detecting fragmentation, which are as follows:
·   avg_fragmentation_in_percent: This is a percentage value that represents external fragmentation. For a clustered table and leaf level of index pages, this is Logical fragmentation, while for heap, this is Extent fragmentation. The lower this value, the better it is. If this value is higher than 10%, some corrective action should be taken.
·    avg_page_space_used_in_percent: This is an average percentage use of pages that represents to internal fragmentation. Higher the value, the better it is. If this value is lower than 75%, some corrective action should be taken.

Reducing fragmentation:
·      Reducing Fragmentation in a Heap: To reduce the fragmentation of a heap, create a clustered index on the table. Creating the clustered index, rearrange the records in an order, and then place the pages contiguously on disk.
·         Reducing Fragmentation in an Index: There are three choices for reducing fragmentation, and we can choose one according to the percentage of fragmentation:
·         If avg_fragmentation_in_percent > 5% and < 30%, then use ALTER INDEXREORGANIZE: This statement is replacement for DBCC INDEXDEFRAG to reorder the leaf level pages of the index in a logical order. As this is an online operation, the index is available while the statement is running.
·     If avg_fragmentation_in_percent > 30%, then use ALTER INDEX REBUILD: This is replacement for DBCC DBREINDEX to rebuild the index online or offline. In such case, we can also use the drop and re-create index method.
·         (Update: Please note this option is strongly NOT recommended)Drop and re-create the clustered index: Re-creating a clustered index redistributes the data and results in full data pages. The level of fullness can be configured by using the FILLFACTOR option in CREATE INDEX.

To find fragmentation level we can use
          dbcc showcontig
          or
          We can use sys.dm_db_index_physical_stats DMF as follows

          SELECT a.index_id, name, avg_fragmentation_in_percent
          FROM sys.dm_db_index_physical_stats
          (DB_ID('AdventureWorks'),
          OBJECT_ID('Production.Product'), NULL, NULL, NULL)
          AS a JOIN sys.indexes AS b
          ON a.object_id = b.object_id
          AND a.index_id =b.index_id;

* To control fragmentation we can either reorganize the index or rebuild the index.
1. Reorganizing Index
* It is the process of arranging the index pages according to the order of index key column values.
* If the fragmentation level is more than 5 to 8% and less than 28to 30% then we can reorganize the indexes.
* It cannot reduce the index size as well as statistics are not updated.
syn:
          ALTER INDEX <indexName>/<All> on <tname> REORGANIZE

          * It is the process of deleting and creating fresh index.
          * It reduces the size of index and updates statistics
          * If the fragmentation level is more than 30% then we can rebuild indexes.
syn:
          ALTER INDEX <indexName>/<ALL> on <tname> REBUILD
Note:
If we have mentioned ONLINE INDEX PROCESSING option then rebuilding takes space in TEMPDB.
To check consistency of a database we can use DBCC CHECKDB('dbName') it disp if any corrupted pages are present, usage space in tempdb.

Option
DBCC DBREINDEX (SQL 2000)
ALTER INDEX REBUILD
(SQL 2005)
DBCC INDEXDEFRAG (SQL 2000)
ALTER INDEX REORGANIZE (SQL 2005)
Rebuild All Indexes
Yes
Need to run for each index.  In SQL 2005 using the ALTER INDEX you can specify ALL indexes.
Online Operation
No, users will be locked out until complete. In SQL Server 2005 Enterprise Edition you can build indexes online.
Yes, users can still use the table
Transaction Log Impact
Depends on the recovery model of the database
Fully logged operation regardless of the database recovery model
Transaction Log Impact
If set to the full recovery model can consume a lot of  space for operation to complete.
If index is very fragmented this could potentially take up more transaction log space.
Can run in parallel (uses multiple threads)
Yes
No


Clustered Index :-
1. There can be only one Clustered index for a table
2. Usually made on the primary key
3. The logical order of the index matches the physical stored
order of the rows on disk
4. Cluster index is faster than non cluster index because the cluster index has to refer back to the table, if the selected column is not present in index.
6. Cluster index determines the storage order of the rows in the table, and hence does not require additional disk space. 

Non-Clustered Index
1. There can be only 249 Non-Clustered index for a table
2. usually made on the any key
3. the logical order of the index does not match the physical
4. non cluster index stored separate in the table.  additional storage space is required.

stored order of the rows on disk



No comments:

Post a Comment

Popular Posts