Extended Operators and Bags vs. Sets
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #6: Extended Operators in Rela7onal Algebra Announcements §  Homework 1 due today §  Project Assignment 2 released –  Due on Friday Feb 22nd, beginning of the class §  NO class and office hours on Wednesday Feb 20th Prakash 2013 VT CS 4604 2 Specific Project Guidelines §  We will create an account and a database for each student. §  A database for each project group will be created. –  The name of the database is the name of your group name. –  Only the members of each project will be able to access the database for their project. §  A webpage detailing how you can access the database is maintained. §  You can create as many tables within a database as you want. Prakash 2013 VT CS 4604 3 General Project Guidelines §  The database schema is not something that should change oUen. –  Think long and hard about your schema. –  DROP may be beVer than ALTER TABLE. §  Do not delete the files containing raw data. §  Read documenta7on for the RDBMS you are using. Prakash 2013 VT CS 4604 4 Bags §  A bag (or mul4-­‐set ) is like a set, but an element may appear more than once. §  Example: {1,2,1,3} is a bag. §  Example: {1,2,3} is also a bag that happens to be a set. Prakash 2013 VT CS 4604 5 Why Bags? §  So far, we said RA and SQL operate on sets §  Real RDBMSs treat rela7ons as bags of tuples. –  SQL, is actually a bag language. §  Performance is one of the main reasons; duplicate elimina7on is expensive since it requires sor7ng. –  Some opera7ons, like projec7on, are much more efficient on bags than sets. §  If we use bag seman7cs, we may have to redefine the meaning of each rela7on algebra operator. Prakash 2013 VT CS 4604 6 Opera0ons on Bags §  Selec7on applies to each tuple, so its effect on bags is like its effect on sets. §  Projec7on also applies to each tuple, but as a bag operator, we do not eliminate duplicates. §  Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate. Prakash 2013 VT CS 4604 7 Bag Seman0cs: Projec0on and Selec0on §  Project: process each tuple independently; a tuple might occur mul7ple 7mes §  Selec7on: process each tuple independently… Prakash 2013 VT CS 4604 8 Bag Union §  An element appears in the union of two bags the sum of the number of 7mes it appears in each bag. §  R U S: if tuple t appears k 7mes in R and l 7mes in S, t appears in R U S k + l 7mes. Prakash 2013 VT CS 4604 9 Bag Intersec0on §  An element appears in the intersec7on of two bags the minimum of the number of 7mes it appears in either. §  R ∩ S: if tuple t appears k 7mes in R and l 7mes in S, t appears min {k, l} 7mes in R ∩ S Prakash 2013 VT CS 4604 10 Bag Difference §  An element appears in the difference R -­‐ S of bags as many 7mes as it appears in R, minus the number of 7mes it appears in S. –  But never less than 0 7mes. §  R −S: if tuple t appears k 7mes in R and l 7mes in S, t appears in R − S max{0, k − l} 7mes. Prakash 2013 VT CS 4604 11 Bag Seman0cs: Products and Joins §  Product (×): If a tuple r appears k 7mes in a rela7on R and tuple s appears l 7mes in a rela7on S, then the tuple <r, s> appears kl 7mes in R × S. §  Theta-­‐join and Natural join ( ./
): Since both can be expressed as applying a selec7on followed by a projec7on to a product, use the seman7cs of selec7on, projec7on, and the product. Prakash 2013 VT CS 4604 12 Extended Operators §  Powerful operators based on basic rela7onal operators and bag seman7cs. §  Sor7ng: convert a rela7on into a list of tuples. §  Duplicate elimina7on: turn a bag into a set by elimina7ng duplicate tuples. §  Grouping: par77on the tuples of a rela7on into groups, based on their values among specified aVributes. §  Aggrega7on: used by the grouping operator and to manipulate/combine aVributes. §  Extended projec7ons: projec7on on steroids. §  Outerjoin: extension of joins that make sure every tuple is in the output. Prakash 2013 VT CS 4604 13 Sor0ng Prakash 2013 VT CS 4604 14 Example: Sor0ng R= (
A
1
3
5
2
4
2
B
)
τB (R) = [(5,2), (1,2), (3,4)]
Prakash 2013 VT CS 4604 15 Duplicate Elimina0on Prakash 2013 VT CS 4604 16 Example: Duplicate Elimina0on R= (
σ(R) =
Prakash 2013 A
1
3
1
B
2
4
2
)
A
1
3
B
2
4
VT CS 4604 17 Extended Projec0on § 
Using the same π L operator, we allow the list L to contain arbitrary expressions involving aVributes, for example: Prakash 2013 • 
Arithme7c on aVributes, e.g., A+B. • 
Duplicate occurrences of the same aVribute. VT CS 4604 18 Example: Extended Projec0on R= (
A
1
3
B )
2
4
π A+B,A,A (R) =
Prakash 2013 A+B
3
7
A1
VT CS 4604 1
3
A2
1
3
19 Aggrega0on Operators §  Operators that summarize or aggregate the values in a single aVribute of a rela7on. §  Operators are the same in rela7onal algebra and SQL. §  All operators treat a rela7on as a bag of tuples. §  SUM: computes the sum of a column with numerical values. §  AVG: computes the average of a column with numerical values. §  MIN and MAX: –  for a column with numerical values, computes the smallest or larges value, respec7vely. –  for a column with string or character values, computes the lexicographically smallest or largest values, respec7vely. §  COUNT: computes the number of tuples in a column. §  In SQL, can use COUNT (*) to count the number of tuples in a rela7on. Prakash 2013 VT CS 4604 20 Example: Aggrega0on R= (
A
1
3
3
B
3
4
2
)
SUM(A) = 7
COUNT(A) = 3
MAX(B) = 4
AVG(B) = 3
Prakash 2013 VT CS 4604 21 Grouping Operator §  How do we answer the query “Count the number of classes and the total enrollment of the classes each department teaches”? §  Can we answer the query using the operators discussed so far? §  We need to group the tuples of Teach by DeptName and then aggregate within each group. §  Use the grouping operator. Prakash 2013 VT CS 4604 22 γ
Applying L(R) Prakash 2013 VT CS 4604 23 Example: Grouping/Aggrega0on R= (
A
1
4
1
B
2
5
2
C
3
6
5
)
Then, average C within
groups:
γA,B,AVG(C) (R) = ??
A
1
4
First, group R by A and B :
A
1
1
4
Prakash 2013 B
2
2
5
B
2
5
AVG(C)
4
6
C
3
5
6
VT CS 4604 24 Outerjoin §  Suppose we join R ∞C S. §  A tuple of R that has no tuple of S with which it joins is said to be dangling. –  Similarly for a tuple of S. §  Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result. Prakash 2013 VT CS 4604 25 Example: Outerjoin R= (
A
1
4
B
2
5
)
S= (
B
2
6
C
3
7
)
(1,2) joins with (2,3), but the other two tuples are dangling.
R OUTERJOIN S
Prakash 2013 A
1
4
NULL
VT CS 4604 B
2
5
6
C
3
NULL
7
26 
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising