Βάσεις Δεδομένων 2001-2002 Ευαγγελία Πιτουρά 1 Query Optimization.

Slides:



Advertisements
Παρόμοιες παρουσιάσεις
Ancient Greek for Everyone: A New Digital Resource for Beginning Greek Unit 4: Conjunctions 2013 edition Wilfred E. Major
Advertisements

Προβολή SPmC TURBOHALER ΑΣθΜΑ ΧΑΠ Subordinated pages Animation step Structure of the pages is clear No animation Simple animation.
SCHOOL YEAR Ms Kefallinou. Language A: Language and Literature is directed towards developing and understanding the constructed nature of meanings.
Πρωτόκολλα για Ασύρματα Δίκτυα και ΑΤΜ. Σιγανός Γεώργιος Multiplexing voice and video packet traffic Εργαστήριο Τηλεπικοινωνιών Πολυτεχνείο Κρήτης “Traffic.
Ancient Greek for Everyone: Unit 3: Greek Nouns supplement: Tips on Translating Greek into English GREK 1001 Fall 2013 M-Th 9:30-10:20 Coates 236 Wilfred.
2013 edition Wilfred E. Major
Business Process Management and Knowledge Toolkit
Πανεπιστήμιο Πατρών, Τμήμα Ηλεκτρολόγων Μηχανικών & Τεχνολογίας Υπολογιστών Αρχιτεκτονικές VLIW Στέφανος Καξίρας {
Βάσεις Δεδομένων Ευαγγελία Πιτουρά 1 Distributed Database Systems.
Hellenic Ministry for the Environment, Spatial Planning and Public Works Greek Experience on the Implementation of IPPC Directive Alexandros Karavanas.
Θεωρία Γραφημάτων Θεμελιώσεις-Αλγόριθμοι-Εφαρμογές
Τεχνολογία ΛογισμικούSlide 1 Έλεγχος Καταψύκτη (Ada) Τεχνολογία ΛογισμικούSlide 39 with Pump, Temperature_dial, Sensor, Globals, Alarm; use Globals ; procedure.
Ανάκτηση Πληροφορίας Αποτίμηση Αποτελεσματικότητας.
Τι θα φέρει το Σύννεφο στη Διαχείριση Δεδομένων: Προκλήσεις και Ευκαιρίες Ελληνικό Συμπόσιο Διαχείρισης Δεδομένων 2010 Ευαγγελία Πιτουρά Τμήμα Πληροφορικής,
Βάσεις Δεδομένων Ευαγγελία Πιτουρά 1 Data Mining.
Προγραμματισμός ΙΙ Διάλεξη #7: Περισσότερες Δομές Ελέγχου Δρ. Νικ. Λιόλιος.
Ελληνικό γραφείο υποστήριξης: Entwined with the teaching of English.
Emission Inventory in Cyprus
Προγραμματισμός ΙΙ Διάλεξη #6: Απλές Δομές Ελέγχου Δρ. Νικ. Λιόλιος.
Πανεπιστήμιο Κύπρου – Τμήμα Πληροφορικής EPL602 Foundations of Web Technologies jQuery Mobile News Site Presented by: Christodoulos Michael Dimitris Stokkos.
1 Please include the following information on this slide: Παρακαλώ, συμπεριλάβετε τις παρακάτω πληροφoρίες στη διαφάνεια: Name Giannakodimou Aliki Kourkouta.
Developing Human Values Through the Cross-curricular Approach.
Δομές Δεδομένων 1 Στοίβα. Δομές Δεδομένων 2 Στοίβα (stack)  Δομή τύπου LIFO: Last In - First Out (τελευταία εισαγωγή – πρώτη εξαγωγή)  Περιορισμένος.
MARIE CURIE  Project about Project  Πειραματικό Λύκειο Πανεπιστημίου Μακεδονίας  Team 3 Ξενίδης Γιώργος Βαρελτζίδου Μαρίνα Γαβριηλίδου Ελένη.
6 Η ΠΑΡΟΥΣΙΑΣΗ: ΠΑΝΤΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΟΙΝΩΝΙΚΩΝ ΚΑΙ ΠΟΛΙΤΙΚΩΝ ΕΠΙΣΤΗΜΩΝ ΤΜΗΜΑ: ΕΠΙΚΟΙΝΩΝΙΑΣ, ΜΕΣΩΝ ΚΑΙ ΠΟΛΙΤΙΣΜΟΥ ΜΑΘΗΜΑ: ΕΙΣΑΓΩΓΗ ΣΤΗ ΔΙΑΦΗΜΙΣΗ.
Προγραμματισμός ΙΙ Διάλεξη #5: Εντολές Ανάθεσης Εντολές Συνθήκης Δρ. Νικ. Λιόλιος.
“ Ἡ ἀ γάπη ἀ νυπόκριτος. ἀ ποστυγο ῦ ντες τ ὸ πονηρόν, κολλώμενοι τ ῷ ἀ γαθ ῷ, τ ῇ φιλαδελφί ᾳ ε ἰ ς ἀ λλήλους φιλόστοργοι, τ ῇ τιμ ῇ ἀ λλήλους προηγούμενοι.
Translation Tips LG New Testament Greek Fall 2012.
ΗΥ Παπαευσταθίου Γιάννης1 Clock generation.
Αριθμητική Επίλυση Διαφορικών Εξισώσεων 1. Συνήθης Δ.Ε. 1 ανεξάρτητη μεταβλητή x 1 εξαρτημένη μεταβλητή y Καθώς και παράγωγοι της y μέχρι n τάξης, στη.
ΑΣΦΑΛΕΙΑ ΑΣΘΕΝΩΝ (PATIENT SAFETY) ωφελέειν ή μη βλάπτειν ωφελέειν = θεραπευτική παρέμβαση μη βλάπτειν = ασφάλεια ασθενών.
Guide to Business Planning The Value Chain © Guide to Business Planning A principal use of value chain analysis is to identify a strategy mismatch between.
Μαθαίνω με “υπότιτλους”
Αντικειμενοστραφής Προγραμματισμός ΙΙ
Αντικειμενοστραφής Προγραμματισμός ΙΙ
Matrix Analytic Techniques
Αλγόριθμοι Ταξινόμησης – Μέρος 3
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ
(ALPHA BANK – EUROBANK – PIRAEUS BANK)
Οσμές στη Σχεδίαση του Λογισμικού
Find: φ σ3 = 400 [lb/ft2] CD test Δσ = 1,000 [lb/ft2] Sand 34˚ 36˚ 38˚
Ερωτήματα Επιλογής σε ACCESS
aka Mathematical Models and Applications
Προχωρημένα Θέματα Τεχνολογίας και Εφαρμογών Βάσεων Δεδομένων
GLY 326 Structural Geology
Find: minimum B [ft] γcon=150 [lb/ft3] γT=120 [lb/ft3] Q φ=36˚
ΤΙ ΕΙΝΑΙ ΤΑ ΜΟΆΙ;.
Find: ρc [in] from load γT=110 [lb/ft3] γT=100 [lb/ft3]
Financial Market Theory
The development of the CoP procedure in the context of the WLTP Transposition Project WLTP CoP Telco, 12 March 2019.
Find: ρc [in] from load (4 layers)
CPSC-608 Database Systems
Modeling Nosocomial Disease Outbreaks using Differential Equations and an Agent Based-Modeling Approach Cody FitzGerald, Adam Boucher Mathematics Department,
Database Programming Using Oracle 11g
Runtime Access to Variables
Production of Supra-regular Spatial Sequences by Macaque Monkeys
A Second Look At ML Chapter Seven Modern Programming Languages.
Cipher Feedback Mode Network Security.
Applications/Requirements for Public-key
Perkins V September 19, 2018 WTCS – Perkins V Planning Update
Baggy Bounds checking by Akritidis, Costa, Castro, and Hand
3rd Meeting of the Eurostat Task Force on Goods sent abroad for processing 6 June 2012 Introduction to the issue Eurostat C1 National accounts methodology,
To Teach Curricular Subjects
Chiltern Hills Academy
I have to take the MAP again?
Kanaka Creek School Teams Session January 30, 2018
Entry 27 – Starter Copy and simplify
Complements White Box Testing Finds a different class of errors
Inheritance and Polymorphism
Μεταγράφημα παρουσίασης:

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 1 Query Optimization

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 2 Example emp(name, age, sal, dno) dept(dno, dname, floor, budget, mgr, ano) acnt(ano, type, balance, bno) bank(bno, bname, address) select name, floor from emp, dept where emp.dno = dept.dno and sal > 100K

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 3 Example Number of emp pages20000 Number of emp tuples Number of emp tuples > 100K10 Number of dept pages10 Number of dept tuples100 Indices of empclustered B+tree on emp.sal (3 levels deep) Indices of deptclustered hashing on dept.dno (average bucket length of 1.2 pages) Number of buffer pages3 Cost of one disk access20ms

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 4 Example Plan 1 Use the B+ tree to find all tuples of emp that satisfy the selection For each one, use the hashing index to fond the corresponding dept tuples (nested loops, using the index on both relations. ( * 1.2) blocks* 20ms/block 0.32 sec

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 5 Example Plan 2 For each dept page, scan the entire emp relation, If an emp tuple agrees on the dno attribute with the tuple on the dept page that satisfies the selection on emp.sal then the emp.dept tuple pairs appears in the result (page-level nested loops, using no index ( * 20000) blocks* 20ms/block ~ 1h

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 6 Example Plan 3 For each dept tuple, scan the entire emp relation and store all emp-dept pairs Then, scan this set, for each one check if it has the same values in the two dno attributes and satisfy the selection on emp.sal (tuple-level formation of the cross product, with subsequent scan to test the join and the (3 * 100 * 20000) blocks* 20ms/block ~ 1 day +

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 7 Overview Query Parser Query Optimizer Code Generator/Interpreter Query Processor Query language (SQL) Relational Calculus Relational and Physical Algebra Record-at-a-time calls Embedded queries run time compile time

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 8 Overview of Query Optimizer Rewriter Planner Cost Model Size-Distribution Estimator Algebraic Space Method- structure space Applies transformation (static) Rewriting Stage (declarative) Planning Stage (procedural) Examines all possible plans for each query produced in the previous stage (through a search strategy to examine the space of execution plans) Execution orders to be consider by the planner Implementation choices for the execution of each ordered series of actions Specifies the arithmetic formulas used to estimate the cost of execution plans

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 9 Overview of Query Optimizer Rewriter Planner Cost Model Size-Distribution Estimator Algebraic Space Method- structure space

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 10 Algebraic Space Rewriter Planner Cost Model Size-Distribution Estimator Algebraic Space Method- structure space select-project- join (SPJ) represented as a tree Enormous number of trees

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 11 Algebraic Space select name, floor from emp, dept where emp.dno = dept.dno and sal > 100K Trees of Plan 1, Plan 2, Plan 3 (pg 10)

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 12 Algebraic Space Restriction 1 Selections and projections are processed on the fly and almost never generate intermediate relations. Selections are processed as relations are accessed for the first time. Projections are processed as the results of other operators are generated. NOTE:: P1 satisfies R1 R1 eliminates only suboptimal query trees, thus the algebraic space module specifies only alternative query tees with join operations only (selection and projection being implicit)

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 13 Algebraic Space Restriction 2 Cross products are never formed, unless the query itself asks for them. Relations are combined always through joins in the query. R1 join R2  R2 join R1 which relations inner and which outer (R1 join R2) join R3  R1 join (R2 join R3) order in which joins are executed Large number => need to further restrict

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 14 Algebraic Space select name, floor, balance from emp, dept, accnt where emp.dno = dept.dno and dept.ano = acnt.ano 3 Trees (pg 11) R2 almost always eliminates suboptimal query trees, thus the algebraic space module specifies only alternative query tees that do not involve cross products

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 15 Algebraic Space Restriction 3 The inner operand of each join is a database relation, never an intermediate result select name, floor, balance, address from emp, dept, accnt, bank where emp.dno = dept.dno and dept.ano = acnt.ano and act.bno = bank.bno 3 Trees (pg 13) T1 satisfies R3

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 16 Algebraic Space Why left-deep  Having original database relations as inners increases the use of any pre-existing indices  Having intermediate relations as outers allows sequences of nested loops joins to be executed in a pipelined fashion (although right-deep favors sequence of hash joins) left-deep (inner being a database relation) right-deep (outer being a database relation, bushy (at least one join between two intermediate results)

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 17 Algebraic Space R3 significantly reduces the number if alternative join trees, thus the algebraic space module of the typical query optimizer only specifies join trees that are left-deep. In summary, typical query optimizers make restrictions R1, R2 and R3 to reduce the size of the space they explore

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 18 Planner Rewriter Planner Cost Model Size-Distribution Estimator Algebraic Space Method- structure space Explores the set of alternative execution plans as specified by the algebraic space and the method-structure space and find the cheapest one as determined by the cost model and the size distribution estimator

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 19 Planner Dynamic Programming A dynamically pruning exhaustive search algorithm: Constructs all alternative join trees (that satisfy R1-R3) by iterating on the number of relations joined so far, always pruning trees that are known to be suboptimal Merge scan - if one is sorted on its join attribute the sorting step may be skipped => take into account the sorted order (if any) in which the result comes out Interesting orders :: orders of intermediate results on any relation attributes that participate in joins

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 20 Planner: dynamic programming Step 1 For each relation in the query, all possible way to access it (i.e., via all existing indices + sequential scan) are obtained Partition these partial result (single-relation) into equivalence classes based on any interesting order in which they produce their result) Estimate the cost (by the cost model module) and retain the cheapest plan in each equivalence class (except of the no-order class if it is not the cheaper one)

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 21 Planner: dynamic programming Step 2 For each pair of relations joined in the query, all possible ways to evaluate their join using the relation access paths retained after Step 1 are obtained. Partition and pruning of these partial (two-relation) plans as above

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 22 Planner: dynamic programming Step i For each set of i -1 relations joined in the query, the cheapest plans to join them for each interesting order are known from the previous step. For each such set (of i- 1 relations joined), all possible ways to join one more relation with it without creating a cross product are evaluated For each set of i relations joined, all generated (partial) plans are partitioned and pruned as before.

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 23 Planner: dynamic programming Step N All possible plans to answer the query (the unique set of N relations in the query) are generated from the plans retained in the previous step. The cheapest plan is the final output of the optimizer, to be used to process the query.

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 24 Planner: dynamic programming Finds the optimal plan among those satisfying restrictions R1-R3. In general, exponential with the number of joins (N) since in the worst case all viable partial plans must be stored in each step. In practice, usually O(N 3 ) Many systems limit the number of joins (~15) See detailed example in the paper

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 25 Planner: randomized algorithm Randomized Algorithms (algorithms that flip coins to make decisions) Operate by searching a graph whose nodes are all the alternative execution plans that can be used to answer a query. Each node has a cost associated with it, and the goal of the algorithm is to find a node with the globally minimum cost Randomized algorithms perform random walks in the graph via a series of moves.

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 26 Planner: randomized algorithm Randomized algorithms perform random walks in the graph via a series of moves. The nodes that can be reached in one move from a node S are called the neighbors of S. Uphill move (resp. downhill) if the cost of the source node is lower (resp. higher) than the cost of the destination node A node is a global minimum if it has the lowest cost among all nodes A node is a local minimum if, in all paths starting at the node, any downhill move comes after at one uphill move.

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 27 Planner: randomized algorithm Simulated Annealing performs a continous random walk accepting downhill moves always ad uphill moves with some probability trying to avoid being caught in a high cost local minimum. Returns the node with the lowest cost visited Iterative Improvement performs a large number of local optimizations. Each one starts at a random node and repeatedly accepts random downhill moves until it reaches a local minimum. Returns th local minimum with the lowest cost found

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 28 Planner For up to 10 joins dynamic programming works better

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 29 Size-Distribution Estimator Rewriter Planner Cost Model Size-Distribution Estimator Algebraic Space Method- structure space Given a query, it estimated the sizes of the results of (sub) queries and the frequency distributions of values in attributes of these results

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 30 Size-Distribution Estimator: Example NameSalaryDepartment Zeus100KGeneral Manager Poseidon80KDefense Pluto80KJustice Aris50KDefense Ermis60KCommerce Apollo60KEnergy Hefestus50KEnergy Hera90KGeneral Manager Athena70KEducation Aphrodite60KDomestic Affairs Demeter60KAgriculture Hestia50KDomestic Affairs Artemis60KEnergy Department Frequency General Manager2 Defense2 Education1 Domestic Affairs2 Agriculture1 Commerce1 Justice1 Energy3 Similarly discuss distribution of frequencies of combinations of arbitrary number of attributes Attribute value independence assumption

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 31 Size-Distribution Estimator: Histograms In a histogram on attribute a of relation R, the domain of a is partitioned into buckets, and a uniform distribution is assumed within each bucket. That is, for any bucket b in the histogram, if the value u i  b, then the frequency f i of u i is approximated by  u j  b f j / |b| Trivial Any subset of the attribute’s domain may form a bucket

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 32 Size-Distribution Estimator: Histograms Department Frequencyin BucketApproximate Frequency General Manager21.75 Defense21.5 Education11.75 Domestic Affairs21.5 Agriculture11.5 Commerce11.5 Justice11.75 Energy buckets

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 33 Size-Distribution Estimator: Histograms Department Frequencyin BucketApproximate Frequency General Manager21.33 Defense21.33 Education11.33 Domestic Affairs22.5 Agriculture11.33 Commerce11.33 Justice11.33 Energy buckets

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 34 Size-Distribution Estimator: Histograms Department Frequencyin BucketApproximate Frequency General Manager21.75 Defense21.5 Education11.75 Domestic Affairs21.5 Agriculture11.5 Commerce11.5 Justice11.75 Energy31.75 Equi-width: the number of consecutive attribute values or the size of the range of attributes values associated with each bucket is the same First bucket 4 values A – D Second bucket 4 values E-Z

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 35 Size-Distribution Estimator: Histograms Department Frequencyin BucketApproximate Frequency General Manager21.33 Defense21.33 Education11.33 Domestic Affairs22.5 Agriculture11.33 Commerce11.33 Justice11.33 Energy32.5 Serial the frequencies of the attribute values associated with each bucket are either all greater or are all less than the frequencies of the attribute values associated with any other bucket

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 36 Parallel Databases  Intra-operator parallelism  Inter-operator parallelism (pipelining and independent parallelism)

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 37 Distributed Databases  communication cost  various forms of joins

Βάσεις Δεδομένων Ευαγγελία Πιτουρά 38 Advanced Query Optimization Semantic query optimization: use integrity constraints to rewrite a given query Global query optimization: multiple queries become available for optimization at the same time (queries with unions, multiple concurrent users, etc) Derive a query plan optimal for the execution of all of them as a group