Κατέβασμα παρουσίασης
Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε
ΔημοσίευσεΑρσένιος Καψής Τροποποιήθηκε πριν 9 χρόνια
1
DBMSs on Modern Processors Anastassia Ailamaki David J. DeWitt, Mark D. Hill, David A. Wood, Stavros Harizopoulos
2
2Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database characteristics Powerful processors Large main memory size Out-of-order instruction execution and memory accesses Sophisticated techniques for hiding I/O latency Unfortunately: sub-optimal hardware behavior of commercial DBMSs
3
3Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Related Work Minimizing stalls due to memory hierarchy o Cache performance improvements: Algorithmic improvements sorting algorithmsclustering Blockingcompression data partitioningcoloring loop fusion Data placement techniques
4
4Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Query execution on modern processors Pipeline execution: o receive an instruction o execute it in sequential stages o store its results into memory L2 CACHE L1 I-CACHEL1 D-CACHE FETCH/DECODE UNIT DISPATCH/EXECUTE UNIT RETIRE UNIT INSTRUCTION POOL TMTM T C +T B +T R
5
5Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution vs. stall time Hiding delay of the pipeline (stalls): o Non-blocking caches o Out-of-order execution o Branch prediction Stalls cannot be fully overlapped Execution time of a query: T Q = T C + T M + T B + T R - T OVL
6
6Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Execution time T Q = T C + T M + T B + T R – T OVL o T Q : total execution time o T C : computation time o T M : memory stalls L1 D/I-cache, L2 cache, D/I-TLB misses o T B : branch misprediction overhead o T R : resource related stalls functional unit unavailability dependencies platform-specific characteristics o T OVL : overlapped stall time
7
7Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload Single-table range selections or two-table equijoin A memory resident database Running a single command stream (eliminating dynamic and random parameters, e.g. concurrency control among multiple transactions, isolating basic operations, e.g. sequential access and index selection. No I/O interference) One basic table: create table R ( a1 integer not null, create table R ( a1 integer not null, a2 integer not null, a2 integer not null, a3 integer not null, a3 integer not null, ) )
8
8Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Database workload 3 basic queries on R o Sequential range selection: select avg(a3) select avg(a3) from R from R where a2 Lo where a2 Lo o Indexed range selection ( index on R.a2 ) o Sequential join select avg(R.a3) select avg(R.a3) from R,S from R,S where R.a2 = S.a1 where R.a2 = S.a1
9
9Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Execution time breakdown
10
10Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls
11
11Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Memory stalls L1 D-cache miss that hits on L2 cache incurs low latency (usually overlapped with other computation) Low ITLB misses (few instruction pages) L2 instruction misses too few compared to L1 I-cache misses L2 data cache misses: 40%-90% of the total L1 I-cache misses: 20% of the total o difficult to overlap – cause bottleneck to the pipeline o L1 caches are not expected to increase – otherwise, slowdown the processor clock o solution: storing together frequently accessed instructions o Larger data records cause L1 I-cache misses (inclusion with L2/interruprs due to context-switching)
12
12Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions Serial bottleneck in the pipeline Instruction cache misses 20% of the total instructions Record size and selectivity do not cause any variations Branch Target Buffer (BTB): store the targets of the last branches executed Larger BTB improve BTB miss rate
13
13Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Branch mispredictions Tightly connected to instruction stalls (affects instruction prefetching)
14
14Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results: Resource stalls Dependency stalls: the most important (low instruction-level parallelism)
15
15Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models NSM (N-ary Storage Model)
16
16Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - NSM Stores records contiguously, in slotted disk pages Start at the begging of each disk page Most query operators access only a small fraction of each record Loading the cache with useless data wastes bandwidth, forces replacement of useful information
17
17Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models DSM (Decomposition Storage Model)
18
18Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - DSM Partitions an n-attribute relation into n sub- relations Saves I/O Increases main memory utilization Expensive reconstruction of a record
19
19Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models PAX (Partition Attributes Across)
20
20Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models - PAX Stores the same data on each page as NSM Groups all the values of a each attribute together on a minipage (inter-record spatial locality) In sequential scan, fully utilization of cache resources Implementing PAX on a DBMS with NMS requires page-level data manipulation code changes only
21
21Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Storage models – PAX design Each page partitioned in n minipages Fixed-length attributes Variable-length attributes The same amount of space as NSM
22
22Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη NSM record structure NSM: Fixed-length attribute values stored first PAX: no need of slot table at the end of a page NSM takes 4% more storage
23
23Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Data manipulation Bulk-loading – Insertions o Variable-length values: minipage boundaries may need to be adjusted (minipage sizes recalculated) Updates o Variable-length values: stretch or shrink the record page reorganization (NSM) mipage-level reorganization (PAX) Deletions o PAX: reorganization of minipage contents to minimize fragmentation – cache utilization not affected o NSM: mark deleted record – free space for future insertions
24
24Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results
25
25Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results DSM: sensitive to the number of attributes in the query o When less than 10% → performs well NSM – PAX: stable performance with increasing number of attributes in the query PAX’s cache behavior better (1 miss/n records, where n the number of attributes that fit in a cache line)
26
26Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Results L1 data cache miss penalty small (10 processor cycles) L2 cache miss stall: 70-80 cycles Overall processor stall time is 75% less in PAX (4 attributes fit in one cache line/block) PAX brings only useful data into cache (occupies less space – do not replace other useful in the future data/instructions)
27
27Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis
28
28Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis Selectivity maintained at 50% DSM even slower than NSM PAX insensitive to changes in query selectivity NSM incurs more data stalls as more records qualify PAX incurs about 4 times fewer data cache misses than NSM when scanning records to apply a predicate to an attribute
29
29Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Sensitivity Analysis
30
30Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Context-switching Uncontrolled context-switching can lead to poor performance
31
31Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Problems in the current design Thread based execution model (pool of threads): poor cache performance Too many threads waste resources – too few restrict concurrency (no preallocated number of worker threads) Context-switching in the middle of a logical operation evict a large working set from cache Round-robin thread scheduling does not exploit common (for a set of threads) cache contents
32
32Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach
33
33Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη A staged approach Each stage has its own queue and thread support – communicates and interacts with other stages o New queries queue up in the first stage o Encapsulate into a packet (each packet carries its state and private data) o Pass through the 5 stages Inside the execution engine a query can issue multiple packets (parallelism)
34
34Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη Benefits of staged DBMS design Each stage allocates worker threads based on its functionality and the I/O frequency (and not on the number of concurrent clients) – easy tuning A stage contains DBMS code with one or more logical operators The thread scheduler repeatedly executes tasks queued up in the same stage: stage affinity to the processor caches Shared memory systems: query’s state and private data remain in one copy as the packets are routed through different processors
35
35Προχωρημένα θέματα βάσεων Δεδομένων - Βάλια Αθανασάκη References A. Ailamaki, D.J. DeWitt, M.D. Hill, and D.A. Wood. DBMSs on a Modern Processor: Where Does Time Go?, In proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, UK, September 1999. A. Ailamaki, D.J. DeWitt, and M.D. Hill. Data Page Layouts for Relational Databases on Deep Memory Hierarchies, The VLDB Journal 11(3), 2002. Stavros Harizopoulos, Anastassia Ailamaki. A Case for Staged Database Systems, CIDR 2003.
Παρόμοιες παρουσιάσεις
© 2024 SlidePlayer.gr Inc.
All rights reserved.