13. Can you explain the differences between clustered and non-clustered indexes in SQL and when you would use each?

Overview

Understanding the differences between clustered and non-clustered indexes in SQL is crucial for optimizing database queries. Indexes are used to speed up the retrieval of rows from a table by providing quick access paths to data. The choice between clustered and non-clustered indexes can significantly affect the performance, storage, and maintenance of database systems.

Key Concepts

Index Structure: How clustered and non-clustered indexes are stored and managed internally.
Performance Implications: How each index type affects query performance and data retrieval speed.
Use Cases: Appropriate scenarios for using clustered versus non-clustered indexes based on data access patterns and database design.

Common Interview Questions

Basic Level

What is the primary difference between clustered and non-clustered indexes?
How do you create a clustered index in SQL?

Intermediate Level

Can a table have multiple clustered indexes?

Advanced Level

Discuss the performance implications of using non-clustered indexes on frequently updated columns.

Detailed Answers

1. What is the primary difference between clustered and non-clustered indexes?

Answer: The primary difference lies in how they store data and affect data retrieval. A clustered index sorts and stores the data rows in the table based on the index key. There can be only one clustered index per table because the data rows themselves can only be sorted in one order. In contrast, a non-clustered index creates a separate structure from the data rows. It contains the non-clustered key values and each key value entry has a pointer to the data row that contains the key value.

Key Points:
- Clustered indexes sort the actual data rows.
- Non-clustered indexes have a separate structure which points to the data.
- Only one clustered index is allowed per table, but multiple non-clustered indexes can be created.

Example:

-- Creating a clustered index
CREATE CLUSTERED INDEX IX_EmployeeID ON Employees(EmployeeID);

-- Creating a non-clustered index
CREATE NONCLUSTERED INDEX IX_EmployeeName ON Employees(LastName, FirstName);

2. How do you create a clustered index in SQL?

Answer: To create a clustered index, you use the CREATE CLUSTERED INDEX statement, specifying the index name, the table on which to create the index, and the column(s) to include in the index.

Key Points:
- If a primary key is defined on a table without a clustered index, SQL Server automatically creates a clustered index on the primary key.
- A clustered index can be created on one or multiple columns.
- A table can have only one clustered index because the data rows can be stored in only one order.

Example:

-- Assume Employees is a table with EmployeeID as a column
CREATE CLUSTERED INDEX IX_EmployeeID ON Employees(EmployeeID);

3. Can a table have multiple clustered indexes?

Answer: No, a table cannot have multiple clustered indexes. The reason is that a clustered index determines the physical order of data in a table. Since data can only be stored in one physical order, only one clustered index can be created per table. Attempting to create a second clustered index will result in an error unless the existing clustered index is dropped first.

Key Points:
- A table can only have one clustered index.
- Multiple non-clustered indexes are allowed because they do not affect the physical storage order of the data.
- The physical order constraint is why clustered indexes significantly impact performance and should be chosen carefully.

Example:

-- Attempting to create a second clustered index without dropping the first will cause an error
-- Assuming IX_EmployeeID is a clustered index already created on the Employees table
CREATE CLUSTERED INDEX IX_EmployeeDOB ON Employees(DateOfBirth);
-- This statement will fail because IX_EmployeeID already exists as a clustered index.

4. Discuss the performance implications of using non-clustered indexes on frequently updated columns.

Answer: Using non-clustered indexes on columns that undergo frequent updates can lead to performance degradation. This is because every time a row is inserted, updated, or deleted, the non-clustered index must also be updated to reflect the change. This can increase the time it takes for write operations to complete. Additionally, frequent updates can lead to index fragmentation, which further impairs performance by causing the database engine to perform additional I/O operations during data retrieval.

Key Points:
- Non-clustered indexes on frequently updated columns increase write operation time.
- Index fragmentation caused by updates can degrade query performance.
- Careful consideration of index placement can mitigate performance issues.

Example:

-- Creating a non-clustered index on a frequently updated column
CREATE NONCLUSTERED INDEX IX_EmployeeSalary ON Employees(Salary);
-- Assume Salary column values frequently change. Each change requires an update to IX_EmployeeSalary, potentially affecting performance.

This example illustrates the importance of evaluating the impact of indexing strategy on frequently updated data columns.