Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε

Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε

DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos.

Παρόμοιες παρουσιάσεις


Παρουσίαση με θέμα: "DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos."— Μεταγράφημα παρουσίασης:

1 DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos

2 2Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database characteristics  Powerful processors  Large main memory size  Out-of-order instruction execution and memory accesses  Sophisticated techniques for hiding I/O latency  Unfortunately: sub-optimal hardware behavior of commercial DBMSs

3 3Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Related Work  Minimizing stalls due to memory hierarchy o Cache performance improvements: Algorithmic improvements sorting algorithmsclustering Blockingcompression data partitioningcoloring loop fusion Data placement techniques

4 4Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Query execution on modern processors  Pipeline execution: o receive an instruction o execute it  in sequential stages o store its results into memory L2 CACHE L1 I-CACHEL1 D-CACHE FETCH/DECODE UNIT DISPATCH/EXECUTE UNIT RETIRE UNIT INSTRUCTION POOL TMTM T C +T B +T R

5 5Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution vs. stall time  Hiding delay of the pipeline (stalls): o Non-blocking caches o Out-of-order execution o Branch prediction  Stalls cannot be fully overlapped  Execution time of a query: T Q = T C + T M + T B + T R - T OVL

6 6Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution time  T Q = T C + T M + T B + T R – T OVL o T Q : total execution time o T C : computation time o T M : memory stalls L1 D/I-cache, L2 cache, D/I-TLB misses o T B : branch misprediction overhead o T R : resource related stalls functional unit unavailability dependencies platform-specific characteristics o T OVL : overlapped stall time

7 7Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload  Single-table range selections or two-table equijoin  A memory resident database  Running a single command stream (eliminating dynamic and random parameters, e.g. concurrency control among multiple transactions, isolating basic operations, e.g. sequential access and index selection. No I/O interference)  One basic table: create table R ( a1 integer not null, create table R ( a1 integer not null, a2 integer not null, a2 integer not null, a3 integer not null, a3 integer not null, ) )

8 8Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload  3 basic queries on R o Sequential range selection: select avg(a3) select avg(a3) from R from R where a2 Lo where a2 Lo o Indexed range selection ( index on R.a2 ) o Sequential join select avg(R.a3) select avg(R.a3) from R,S from R,S where R.a2 = S.a1 where R.a2 = S.a1

9 9Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Execution time breakdown

10 10Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls

11 11Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls  L1 D-cache miss that hits on L2 cache incurs low latency (usually overlapped with other computation)  Low ITLB misses (few instruction pages)  L2 instruction misses too few compared to L1 I-cache misses  L2 data cache misses: 40%-90% of the total  L1 I-cache misses: 20% of the total o difficult to overlap – cause bottleneck to the pipeline o L1 caches are not expected to increase – otherwise, slowdown the processor clock o solution: storing together frequently accessed instructions o Larger data records cause L1 I-cache misses (inclusion with L2/interruprs due to context-switching)

12 12Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions  Serial bottleneck in the pipeline  Instruction cache misses  20% of the total instructions  Record size and selectivity do not cause any variations  Branch Target Buffer (BTB): store the targets of the last branches executed  Larger BTB improve BTB miss rate

13 13Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions  Tightly connected to instruction stalls (affects instruction prefetching)

14 14Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Resource stalls  Dependency stalls: the most important (low instruction-level parallelism)

15 15Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models NSM (N-ary Storage Model)

16 16Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - NSM  Stores records contiguously, in slotted disk pages  Start at the begging of each disk page  Most query operators access only a small fraction of each record  Loading the cache with useless data wastes bandwidth, forces replacement of useful information

17 17Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models DSM (Decomposition Storage Model)

18 18Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - DSM  Partitions an n-attribute relation into n sub- relations  Saves I/O  Increases main memory utilization  Expensive reconstruction of a record

19 19Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models PAX (Partition Attributes Across)

20 20Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - PAX  Stores the same data on each page as NSM  Groups all the values of a each attribute together on a minipage (inter-record spatial locality)  In sequential scan, fully utilization of cache resources  Implementing PAX on a DBMS with NMS requires page-level data manipulation code changes only

21 21Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models – PAX design  Each page partitioned in n minipages  Fixed-length attributes  Variable-length attributes  The same amount of space as NSM

22 22Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη NSM record structure  NSM: Fixed-length attribute values stored first  PAX: no need of slot table at the end of a page  NSM takes 4% more storage

23 23Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Data manipulation  Bulk-loading – Insertions o Variable-length values: minipage boundaries may need to be adjusted (minipage sizes recalculated)  Updates o Variable-length values: stretch or shrink the record page reorganization (NSM) mipage-level reorganization (PAX)  Deletions o PAX: reorganization of minipage contents to minimize fragmentation – cache utilization not affected o NSM: mark deleted record – free space for future insertions

24 24Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results

25 25Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results  DSM: sensitive to the number of attributes in the query o When less than 10% → performs well  NSM – PAX: stable performance with increasing number of attributes in the query  PAX’s cache behavior better (1 miss/n records, where n the number of attributes that fit in a cache line)

26 26Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results  L1 data cache miss penalty small (10 processor cycles)  L2 cache miss stall: cycles  Overall processor stall time is 75% less in PAX (4 attributes fit in one cache line/block)  PAX brings only useful data into cache (occupies less space – do not replace other useful in the future data/instructions)

27 27Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis

28 28Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis  Selectivity maintained at 50%  DSM even slower than NSM  PAX insensitive to changes in query selectivity NSM incurs more data stalls as more records qualify  PAX incurs about 4 times fewer data cache misses than NSM when scanning records to apply a predicate to an attribute

29 29Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis

30 30Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Context-switching  Uncontrolled context-switching can lead to poor performance

31 31Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Problems in the current design  Thread based execution model (pool of threads): poor cache performance  Too many threads waste resources – too few restrict concurrency (no preallocated number of worker threads)  Context-switching in the middle of a logical operation evict a large working set from cache  Round-robin thread scheduling does not exploit common (for a set of threads) cache contents

32 32Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach

33 33Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach  Each stage has its own queue and thread support – communicates and interacts with other stages o New queries queue up in the first stage o Encapsulate into a packet (each packet carries its state and private data) o Pass through the 5 stages  Inside the execution engine a query can issue multiple packets (parallelism)

34 34Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Benefits of staged DBMS design  Each stage allocates worker threads based on its functionality and the I/O frequency (and not on the number of concurrent clients) – easy tuning  A stage contains DBMS code with one or more logical operators  The thread scheduler repeatedly executes tasks queued up in the same stage: stage affinity to the processor caches  Shared memory systems: query’s state and private data remain in one copy as the packets are routed through different processors

35 35Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη References  A. Ailamaki, D.J. DeWitt, M.D. Hill, and D.A. Wood. DBMSs on a Modern Processor: Where Does Time Go?, In proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, UK, September  A. Ailamaki, D.J. DeWitt, and M.D. Hill. Data Page Layouts for Relational Databases on Deep Memory Hierarchies, The VLDB Journal 11(3),  Stavros Harizopoulos, Anastassia Ailamaki. A Case for Staged Database Systems, CIDR 2003.


Κατέβασμα ppt "DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos."

Παρόμοιες παρουσιάσεις


Διαφημίσεις Google