Data Warehouse Physical Design
Data Warehouse Physical Design
Data files guidelines
- Create files with an initial size
- Based on the eventual size of the objects that will be stored on them
- This pre-allocates sequential disk blocks and helps avoid fragmentation.
- Disable autogrowth
- If you begin to run out of space in a data file, it is more efficient to explicitly increase the file size by a large amount rather than rely on incremental autogrowth.
Filegroups guidelines
- Create at least one filegroup in addition to the primary one, and then set it as the default filegroup so you can separate data tables from system tables.
- Create dedicated filegroups for extremely large fact tables and using them to place those fact tables on their own logical disks.
- If some tables in the data warehouse are loaded on a different schedule from others, consider using filegroups to separate the tables into groups that can be backed up independently.
- If you intend to partition a large fact table, create a filegroup for each one so that older, stable rows can be backed up, and then set as read-only.
Staging tables
- Separate staging database
- Create it on a logical disk distinct from the data warehouse files.
- Into the data warehouse database
- Create a file and filegroup for them on a logical disk
- Separate from the fact and dimension tables.
- An exception to the previous guideline is made for staging tables that will be switched with partitions to perform fast loads.
- These must be created on the same filegroup as the partition with which they will be switched.
TempDB
- To avoid fragmentation of data files
- Place it on a dedicated logical disk
- Set its initial size based on how much it is likely to be used.
- Set the growth increment to be quite large to ensure that performance is not interrupted by frequent growth of TempDB.
- Creating multiple files for TempDB to help minimize contention during page free space (PFS) scans as temporary objects are created and dropped.
Transaction logs
- Set the transaction mode of the Data Warehouse, Staging Database and TempDB to Simple
- Helps to avoid having to truncate transaction logs
- Additionally, most of the inserts in a data warehouse are typically performed as bulk load operations which are not logged.
- To avoid disk resource conflicts between data warehouse I/O and logging, place the transaction log files for all databases on a dedicated logical disk.
Data Compression
- SQL Server Enterprise Edition supports data compression at both page and row level.
- Data compression benefits in a data warehouse
- Reduced storage requirements.
- Improved query performance
- Best practices for data compression in a data warehouse
- Use page compression on all dimension tables and fact table partitions.
- If performance is CPU-bound, revert to row compression on frequently accessed partitions.
Table Partitioning
- Improved query performance
- More granular manageability
- Improved data load performance
- Best practices for partitioning in a DW
- Partition Large Fact Tables
- Partition on an incrementing date key
- Design the partition scheme for ETL and manageability.
- Maintain an empty partition at the start and end of the table
Indexes in DW
- Indexes maximize query performance
- Planning Indexes is the most important part of the database design process
- Some inexperienced BI professionals are tempted to create many indexes on all tables to support queries.
Dimension table indexes
- Create a clustered index on the surrogate key column.
- This column is used to join the dimension table to fact tables, and a clustered index will help the query optimizer minimize the number of reads required to filter fact rows.
- Create a non-clustered index on the alternate key column and include the SCD current flag, start date, and end date columns.
- This index will improve the performance of lookup operations during ETL data loads that need to handle slowly-changing dimensions.
- Create nonclustered indexes on frequently searched attributes, and consider including all members of a hierarchy in a single index.
Fact table indexes
- Create a clustered index on the most commonly-searched date key.
- Date ranges are the most common filtering criteria in the most data warehouse workloads, so a clustered index on this key should be particularly effective in improving overall query performance.
- Create nonclustered indexes on other, frequently-searched dimension keys.
- Columnstore index on all columns.
Using Views in a DW
- Create a view for each dimension and fact table with NOLOCK query hint in the view definition Create views with a user-friendly view and column names
- Do not include metadata columns in views
- Create views to combine snowflake dimension tables
- Partition-align indexed views
- Use the SCHEMABINDING option
- Security
Comments
Post a Comment