DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos.

DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos

2Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database characteristics  Powerful processors  Large main memory size  Out-of-order instruction execution and memory accesses  Sophisticated techniques for hiding I/O latency  Unfortunately: sub-optimal hardware behavior of commercial DBMSs

3Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Related Work  Minimizing stalls due to memory hierarchy o Cache performance improvements: Algorithmic improvements sorting algorithmsclustering Blockingcompression data partitioningcoloring loop fusion Data placement techniques

4Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Query execution on modern processors  Pipeline execution: o receive an instruction o execute it  in sequential stages o store its results into memory L2 CACHE L1 I-CACHEL1 D-CACHE FETCH/DECODE UNIT DISPATCH/EXECUTE UNIT RETIRE UNIT INSTRUCTION POOL TMTM T C +T B +T R

5Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution vs. stall time  Hiding delay of the pipeline (stalls): o Non-blocking caches o Out-of-order execution o Branch prediction  Stalls cannot be fully overlapped  Execution time of a query: T Q = T C + T M + T B + T R - T OVL

6Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution time  T Q = T C + T M + T B + T R – T OVL o T Q : total execution time o T C : computation time o T M : memory stalls L1 D/I-cache, L2 cache, D/I-TLB misses o T B : branch misprediction overhead o T R : resource related stalls functional unit unavailability dependencies platform-specific characteristics o T OVL : overlapped stall time

7Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload  Single-table range selections or two-table equijoin  A memory resident database  Running a single command stream (eliminating dynamic and random parameters, e.g. concurrency control among multiple transactions, isolating basic operations, e.g. sequential access and index selection. No I/O interference)  One basic table: create table R ( a1 integer not null, create table R ( a1 integer not null, a2 integer not null, a2 integer not null, a3 integer not null, a3 integer not null, ) )

8Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload  3 basic queries on R o Sequential range selection: select avg(a3) select avg(a3) from R from R where a2 Lo where a2 Lo o Indexed range selection ( index on R.a2 ) o Sequential join select avg(R.a3) select avg(R.a3) from R,S from R,S where R.a2 = S.a1 where R.a2 = S.a1

9Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Execution time breakdown

10Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls

11Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls  L1 D-cache miss that hits on L2 cache incurs low latency (usually overlapped with other computation)  Low ITLB misses (few instruction pages)  L2 instruction misses too few compared to L1 I-cache misses  L2 data cache misses: 40%-90% of the total  L1 I-cache misses: 20% of the total o difficult to overlap – cause bottleneck to the pipeline o L1 caches are not expected to increase – otherwise, slowdown the processor clock o solution: storing together frequently accessed instructions o Larger data records cause L1 I-cache misses (inclusion with L2/interruprs due to context-switching)

12Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions  Serial bottleneck in the pipeline  Instruction cache misses  20% of the total instructions  Record size and selectivity do not cause any variations  Branch Target Buffer (BTB): store the targets of the last branches executed  Larger BTB improve BTB miss rate

13Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions  Tightly connected to instruction stalls (affects instruction prefetching)

14Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Resource stalls  Dependency stalls: the most important (low instruction-level parallelism)

15Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models NSM (N-ary Storage Model)

16Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - NSM  Stores records contiguously, in slotted disk pages  Start at the begging of each disk page  Most query operators access only a small fraction of each record  Loading the cache with useless data wastes bandwidth, forces replacement of useful information

17Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models DSM (Decomposition Storage Model)

18Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - DSM  Partitions an n-attribute relation into n sub- relations  Saves I/O  Increases main memory utilization  Expensive reconstruction of a record

19Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models PAX (Partition Attributes Across)

20Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - PAX  Stores the same data on each page as NSM  Groups all the values of a each attribute together on a minipage (inter-record spatial locality)  In sequential scan, fully utilization of cache resources  Implementing PAX on a DBMS with NMS requires page-level data manipulation code changes only

21Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models – PAX design  Each page partitioned in n minipages  Fixed-length attributes  Variable-length attributes  The same amount of space as NSM

22Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη NSM record structure  NSM: Fixed-length attribute values stored first  PAX: no need of slot table at the end of a page  NSM takes 4% more storage

23Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Data manipulation  Bulk-loading – Insertions o Variable-length values: minipage boundaries may need to be adjusted (minipage sizes recalculated)  Updates o Variable-length values: stretch or shrink the record page reorganization (NSM) mipage-level reorganization (PAX)  Deletions o PAX: reorganization of minipage contents to minimize fragmentation – cache utilization not affected o NSM: mark deleted record – free space for future insertions

24Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results

25Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results  DSM: sensitive to the number of attributes in the query o When less than 10% → performs well  NSM – PAX: stable performance with increasing number of attributes in the query  PAX’s cache behavior better (1 miss/n records, where n the number of attributes that fit in a cache line)

26Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results  L1 data cache miss penalty small (10 processor cycles)  L2 cache miss stall: 70-80 cycles  Overall processor stall time is 75% less in PAX (4 attributes fit in one cache line/block)  PAX brings only useful data into cache (occupies less space – do not replace other useful in the future data/instructions)

27Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis

28Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis  Selectivity maintained at 50%  DSM even slower than NSM  PAX insensitive to changes in query selectivity NSM incurs more data stalls as more records qualify  PAX incurs about 4 times fewer data cache misses than NSM when scanning records to apply a predicate to an attribute

29Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis

30Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Context-switching  Uncontrolled context-switching can lead to poor performance

31Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Problems in the current design  Thread based execution model (pool of threads): poor cache performance  Too many threads waste resources – too few restrict concurrency (no preallocated number of worker threads)  Context-switching in the middle of a logical operation evict a large working set from cache  Round-robin thread scheduling does not exploit common (for a set of threads) cache contents

32Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach

33Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach  Each stage has its own queue and thread support – communicates and interacts with other stages o New queries queue up in the first stage o Encapsulate into a packet (each packet carries its state and private data) o Pass through the 5 stages  Inside the execution engine a query can issue multiple packets (parallelism)

34Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Benefits of staged DBMS design  Each stage allocates worker threads based on its functionality and the I/O frequency (and not on the number of concurrent clients) – easy tuning  A stage contains DBMS code with one or more logical operators  The thread scheduler repeatedly executes tasks queued up in the same stage: stage affinity to the processor caches  Shared memory systems: query’s state and private data remain in one copy as the packets are routed through different processors

35Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη References  A. Ailamaki, D.J. DeWitt, M.D. Hill, and D.A. Wood. DBMSs on a Modern Processor: Where Does Time Go?, In proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, UK, September 1999.  A. Ailamaki, D.J. DeWitt, and M.D. Hill. Data Page Layouts for Relational Databases on Deep Memory Hierarchies, The VLDB Journal 11(3), 2002.  Stavros Harizopoulos, Anastassia Ailamaki. A Case for Staged Database Systems, CIDR 2003.

DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos.

Παρόμοιες παρουσιάσεις

Παρουσίαση με θέμα: "DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos."— Μεταγράφημα παρουσίασης:

Παρόμοιες παρουσιάσεις

Σχετικά με το έργο

Σχόλια

Είσοδος

Σύνδεση μέσω των κοινωνικών δικτύων:

DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos.

Παρόμοιες παρουσιάσεις

Παρουσίαση με θέμα: "DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos."— Μεταγράφημα παρουσίασης:

Παρόμοιες παρουσιάσεις

Σχετικά με το έργο

Σχόλια