Skip to main content
Data Warehouse Physical Design
Data Warehouse Physical Design
Data files guidelines
- Create files with an initial size
- Based on the eventual size of the objects that will be stored on them
- This pre-allocates sequential disk blocks and helps avoid fragmentation.
- Disable autogrowth
- If you begin to run out of space in a data file, it is more efficient to explicitly
increase the file size by a large amount rather than rely on incremental
autogrowth.
Filegroups guidelines
- Create at least one filegroup in addition to the primary one, and then set it as the default filegroup so you can separate data tables from system tables.
- Create dedicated filegroups for extremely large fact tables and using them to place those fact tables on their own logical disks.
- If some tables in the data warehouse are loaded on a different schedule from others, consider using filegroups to separate the tables into groups that can be backed up independently.
- If you intend to partition a large fact table, create a filegroup for each one so that older, stable rows can be backed up, and then set as read-only.
Staging tables
- Separate staging database
- Create it on a logical disk distinct from the data warehouse files.
- Into the data warehouse database
- Create a file and filegroup for them on a logical disk
- Separate from the fact and dimension tables.
- An exception to the previous guideline is made for staging tables that will be
switched with partitions to perform fast loads.
- These must be created on the same filegroup as the partition with which they will be
switched.
TempDB
- To avoid fragmentation of data files
- Place it on a dedicated logical disk
- Set its initial size based on how much it is likely to be used.
- Set the growth increment to be quite large to ensure that
performance is not interrupted by frequent growth of
TempDB.
- Creating multiple files for TempDB to help minimize
contention during page free space (PFS) scans as temporary
objects are created and dropped.
Transaction logs
- Set the transaction mode of the Data Warehouse, Staging
Database and TempDB to Simple
- Helps to avoid having to truncate transaction logs
- Additionally, most of the inserts in a data warehouse are
typically performed as bulk load operations which are not
logged.
- To avoid disk resource conflicts between data warehouse I/O
and logging, place the transaction log files for all databases
on a dedicated logical disk.
Data Compression
- SQL Server Enterprise Edition supports data compression at
both page and row level.
- Data compression benefits in a data warehouse
- Reduced storage requirements.
- Improved query performance
- Best practices for data compression in a data warehouse
- Use page compression on all dimension tables and fact table partitions.
- If performance is CPU-bound, revert to row compression on frequently accessed partitions.
Table Partitioning
- Improved query performance
- More granular manageability
- Improved data load performance
- Best practices for partitioning in a DW
- Partition Large Fact Tables
- Partition on an incrementing date key
- Design the partition scheme for ETL and manageability.
- Maintain an empty partition at the start and end of the table
Indexes in DW
- Indexes maximize query performance
- Planning Indexes is the most important part of the database
design process
- Some inexperienced BI professionals are tempted to create
many indexes on all tables to support queries.
Dimension table indexes
- Create a clustered index on the surrogate key column.
- This column is used to join the dimension table to fact tables, and a clustered index will help the query optimizer minimize the number of reads required to filter
fact rows.
- Create a non-clustered index on the alternate key column and
include the SCD current flag, start date, and end date columns.
- This index will improve the performance of lookup operations during ETL data
loads that need to handle slowly-changing dimensions.
- Create nonclustered indexes on frequently searched attributes, and
consider including all members of a hierarchy in a single index.
Fact table indexes
- Create a clustered index on the most commonly-searched
date key.
- Date ranges are the most common filtering criteria in the most data warehouse
workloads, so a clustered index on this key should be particularly effective in
improving overall query performance.
- Create nonclustered indexes on other, frequently-searched
dimension keys.
- Columnstore index on all columns.
Using Views in a DW
- Create a view for each dimension and fact table with
NOLOCK query hint in the view definition Create views with a user-friendly view and column names
- Do not include metadata columns in views
- Create views to combine snowflake dimension tables
- Partition-align indexed views
- Use the SCHEMABINDING option
- Security
Comments
Post a Comment