How to Use Common Table Expressions (CTEs) in SQL Server for Readable Queries

Common Table Expressions (CTEs) are a powerful SQL Server feature that can dramatically improve query readability and maintainability.

Introduced in SQL Server 2005, CTEs let you define a temporary result set that you can reference within a SELECT, INSERT, UPDATE, DELETE, or MERGE statement.

Basic CTE Syntax

A CTE follows this pattern:

WITH CTE_Name AS (
    -- Your query here
)
SELECT * FROM CTE_Name;

The main components are:

  • The WITH keyword to start the CTE
  • A name for your CTE
  • The AS keyword
  • Parentheses containing your query
  • A statement that references the CTE

Why Use CTEs?

CTEs offer several advantages:

  • Improved readability: Breaking complex queries into named, logical segments
  • Self-referencing capability: Useful for hierarchical or recursive data
  • Query simplification: Reducing nested subqueries
  • Code reusability: Using the same temporary result multiple times in a query

Simple CTE Example

Here's a basic example that calculates average order values by customer category:

-- Without CTE
SELECT 
    c.CustomerCategory,
    SUM(o.TotalAmount) / COUNT(DISTINCT o.OrderID) AS AvgOrderValue
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerCategory;

-- With CTE
WITH OrderSummary AS (
    SELECT 
        c.CustomerCategory,
        o.OrderID,
        o.TotalAmount
    FROM Customers c
    JOIN Orders o ON c.CustomerID = o.CustomerID
)
SELECT 
    CustomerCategory,
    SUM(TotalAmount) / COUNT(DISTINCT OrderID) AS AvgOrderValue
FROM OrderSummary
GROUP BY CustomerCategory;

The CTE version clearly separates the data gathering from the aggregation logic.

Multiple CTEs in a Single Query

You can chain CTEs for even more complex scenarios:

WITH 
CustomerOrders AS (
    SELECT 
        c.CustomerID,
        c.CustomerName,
        COUNT(o.OrderID) AS OrderCount
    FROM Customers c
    LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
    GROUP BY c.CustomerID, c.CustomerName
),
OrderCategories AS (
    SELECT
        CustomerID,
        CASE 
            WHEN OrderCount = 0 THEN 'Inactive'
            WHEN OrderCount BETWEEN 1 AND 5 THEN 'Regular'
            ELSE 'VIP'
        END AS CustomerCategory
    FROM CustomerOrders
)
SELECT 
    c.CustomerName,
    o.CustomerCategory
FROM CustomerOrders c
JOIN OrderCategories o ON c.CustomerID = o.CustomerID
ORDER BY o.CustomerCategory, c.CustomerName;

Recursive CTEs

One of the most powerful CTE features is recursion, which is perfect for hierarchical data like organizational charts or category trees:

WITH EmployeeHierarchy AS (
    -- Anchor member (starting point)
    SELECT 
        EmployeeID,
        EmployeeName,
        ManagerID,
        0 AS Level
    FROM Employees
    WHERE ManagerID IS NULL -- Start with top-level employees
    
    UNION ALL
    
    -- Recursive member (references itself)
    SELECT 
        e.EmployeeID,
        e.EmployeeName,
        e.ManagerID,
        eh.Level + 1
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT 
    EmployeeID,
    EmployeeName,
    Level,
    REPLICATE('--', Level) + EmployeeName AS HierarchyDisplay
FROM EmployeeHierarchy
ORDER BY Level, EmployeeName;

This query produces an indented organization chart starting from top-level managers.

CTEs vs. Temporary Tables or Table Variables

Unlike temporary tables or table variables, CTEs:

  • Exist only during query execution
  • Don't require explicit cleanup
  • Can't have indexes added to them
  • Are primarily for improving query structure and readability

Best Practices

  1. Use meaningful names that describe what the data represents
  2. Keep individual CTEs focused on a single logical operation
  3. Comment complex CTEs to explain their purpose
  4. Consider performance - CTEs are not always more efficient than subqueries
  5. Avoid excessive nesting - if your query becomes too complex, consider stored procedures or multiple queries

When Not to Use CTEs

CTEs might not be the best choice when:

  • You need to reference the same large dataset multiple times (temp tables may be more efficient)
  • You need to add indexes for performance optimization
  • Your recursive CTE might exceed the default recursion limit (100)

By mastering CTEs, you can write SQL that's not only more maintainable but also easier to understand and debug.

3
51

Related

When working with SQL Server, you may often need to count the number of unique values in a specific column. This is useful for analyzing data, detecting duplicates, and understanding dataset distributions.

Using COUNT(DISTINCT column_name)

To count the number of unique values in a column, SQL Server provides the COUNT(DISTINCT column_name) function. Here’s a simple example:

SELECT COUNT(DISTINCT column_name) AS distinct_count
FROM table_name;

This query will return the number of unique values in column_name.

Counting Distinct Values Across Multiple Columns

If you need to count distinct combinations of multiple columns, you can use a subquery:

SELECT COUNT(*) AS distinct_count
FROM (SELECT DISTINCT column1, column2 FROM table_name) AS subquery;

This approach ensures that only unique pairs of column1 and column2 are counted.

Why Use COUNT DISTINCT?

  • Helps in identifying unique entries in a dataset.
  • Useful for reporting and analytics.
  • Efficient way to check for duplicates.

By leveraging COUNT(DISTINCT column_name), you can efficiently analyze your database and extract meaningful insights. Happy querying!

1
118

Reading a file line by line is useful when handling large files without loading everything into memory at once.

✅ Best Practice: Use File.ReadLines() which is more memory efficient.

Example

foreach (string line in File.ReadLines("file.txt"))
{
    Console.WriteLine(line);
}

Why use ReadLines()?

Reads one line at a time, reducing overall memory usage. Ideal for large files (e.g., logs, CSVs).

Alternative: Use StreamReader (More Control)

For scenarios where you need custom processing while reading the contents of the file:

using (StreamReader reader = new StreamReader("file.txt"))
{
    string? line;
    while ((line = reader.ReadLine()) != null)
    {
        Console.WriteLine(line);
    }
}

Why use StreamReader?

Lets you handle exceptions, encoding, and buffering. Supports custom processing (e.g., search for a keyword while reading).

When to Use ReadAllLines()? If you need all lines at once, use:

string[] lines = File.ReadAllLines("file.txt");

Caution: Loads the entire file into memory—avoid for large files!

4
297

Removing duplicates from a list in C# is a common task, especially when working with large datasets. C# provides multiple ways to achieve this efficiently, leveraging built-in collections and LINQ.

Using HashSet (Fastest for Unique Elements)

A HashSet<T> automatically removes duplicates since it only stores unique values. This is one of the fastest methods:

List<int> numbers = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
numbers = new HashSet<int>(numbers).ToList();
Console.WriteLine(string.Join(", ", numbers)); // Output: 1, 2, 3, 4, 5

Using LINQ Distinct (Concise and Readable)

LINQ’s Distinct() method provides an elegant way to remove duplicates:

List<int> numbers = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
numbers = numbers.Distinct().ToList();
Console.WriteLine(string.Join(", ", numbers)); // Output: 1, 2, 3, 4, 5

Removing Duplicates by Custom Property (For Complex Objects)

When working with objects, DistinctBy() from .NET 6+ simplifies duplicate removal based on a property:

using System.Linq;
using System.Collections.Generic;

class Person
{
    public string Name { get; set; }
    public int Age { get; set; }
}

List<Person> people = new List<Person>
{
    new Person { Name = "Alice", Age = 30 },
    new Person { Name = "Bob", Age = 25 },
    new Person { Name = "Alice", Age = 30 }
};

people = people.DistinctBy(p => p.Name).ToList();
Console.WriteLine(string.Join(", ", people.Select(p => p.Name))); // Output: Alice, Bob

For earlier .NET versions, use GroupBy():

people = people.GroupBy(p => p.Name).Select(g => g.First()).ToList();

Performance Considerations

  • HashSet<T> is the fastest but only works for simple types.
  • Distinct() is easy to use but slower than HashSet<T> for large lists.
  • DistinctBy() (or GroupBy()) is useful for complex objects but may have performance trade-offs.

Conclusion

Choosing the best approach depends on the data type and use case. HashSet<T> is ideal for primitive types, Distinct() is simple and readable, and DistinctBy() (or GroupBy()) is effective for objects.

1
359