The relational algebra is a set of operations that
manipulate relations as they are defined in the relational model and as such describes part of the data manipulation aspect
of this data model. Because of their algebraic properties
these operations are often used in database query optimization
as an intermediate representation of a query to which certain
rewrite rules can be applied to obtain a more efficient version of
the query.
The exact set of operations may differ per definition and also
depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that uses
the labeled generalization of mathematical relations) is used. We
will assume the labeled case here as this is the most common way
the relational model is defined. That means that we assume that
tuples are partial functions from
attribute names to values. The attribute a of a tuple t is
denoted in this article as t(a).
The Basic Operations
The six basic operations of the algebra are the selection, the
projection, the cartesian product, the set union, the
set difference, and the rename. These six operations are
fundamental in the sense that (1) all other operations can be
expressed as combinations of these operations and (2) none of
these six operations can be omitted without losing expressive
power.
The Selection
The selection is a unary operation that is written as
σaθb(R) or
σaθv(R) where a and b
are attribute names, θ a binary operation in the set {<,
≤, =, >, ≥}, v is a value constant and R is a
relation. The selection σaθb(R)
selects all those tuples in R for which θ holds between
the a and the b attribute. The selection
σaθv(R) selects all those
tuples in R for which θ holds between the a
attribute and the value v.
For an example consider the following tables where the first table
gives the relation Person, the second table gives the result
of σAge≥34(Person) and the third table
gives the result of σAge=Weight(Person).
Person
| Name | Age | Weight
|
| Harry | 34 | 80
|
| Sally | 28 | 64
|
| George | 29 | 70
|
| Helena | 54 | 54
|
| Peter | 34 | 80
|
|
σAge≥34(Person)
| Name | Age | Weight
|
| Harry | 34 | 80
|
| Helena | 54 | 54
|
| Peter | 34 | 80
|
|
σAge=Weight(Person)
| Name | Age | Weight
|
| Helena | 54 | 54
|
|
More formally the semantics of the selection is defined as
follows:
- σaθb(R) = { t : t ∈ R, t(a) θ t(b) }
- σaθv(R) = { t : t ∈ R, t(a) θ v }
The result of the selection is only defined if the attribute names
that it mentions are in the header of the relation that it
operates upon.
The Projection
The projection is a unary operation that is written as
πa1,..,an(R)
where a1,..,an is a set of
attribute names. The result of such a projection is defined as the
set that is obtained when all tuples in R are restricted to
the set {a1,..,an}. For an
example consider the following two tables which are the relation
Person and its projection on the attributes Age and
Weight:
Person
| Name | Age | Weight
|
| Harry | 34 | 80
|
| Sally | 28 | 64
|
| George | 29 | 70
|
| Helena | 54 | 54
|
| Peter | 34 | 80
|
|
πAge,Weight(Person)
| Age | Weight
|
| 34 | 80
|
| 28 | 64
|
| 29 | 70
|
| 54 | 54
|
|
Note that Harry and Peter have the same age and weight, but
since the result is a relation and therefore a set this
combination only appears once in the result.
More formally the semantics of the projection is defined as
follows:
- πa1,..,an(R) = { t[a1,..,an] : t ∈ R }
where t[a1,..,an] is the restriction of the tuple t to the set {a1,..,an}, i.e., t[a1,..,an] = { (a' , v) | (a' , v) ∈ t, a' ∈ a1,..,an} }.
The result of a projection
πa1,..,an(R) is
only defined if {a1,..,an} is a
subset of the header of R.
The Cartesian Product
The cartesian product (also called cross product or
cross join) is a binary operation that is very similar to the
Cartesian product in set theory. It is written as R
× S where R and S are relations. The result of
the cartesian product is the set of all combinations of tuples in
R and S. For an example consider the tables Employee
and Dept and their Cartesian product:
Employee
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
|
Dept
| DeptName | Manager
|
| Finance | George
|
| Sales | Harriet
|
|
Employee × Dept
| Name | EmpId | DeptName | Manager
|
| Harry | 3415 | Finance | George
|
| Harry | 3415 | Sales | Harriet
|
| Sally | 2241 | Finance | George
|
| Sally | 2241 | Sales | Harriet
|
|
More formally the semantics of the Cartesian product is
defined as follows:
- R × S = { t ∪ s : t ∈ R, s ∈ S }
To ensure that the result of the cartesian product is again a
relation it is required that the headers of R and S are
disjoint, i.e., do not contain the same attribute.
The Set Union
The set union is a binary operation that is written as R ∪
S and is defined as the set union in set theory. For an
example consider the tables Engineer, Manager and their
union:
Engineer
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
| Harriet | 2202
|
|
Manager
| Name | EmpId
|
| George | 3401
|
| Harriet | 2202
|
|
Engineer ∪ Manager
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
| George | 3401
|
| Harriet | 2202
|
|
Note that Harriet only appears once in the result because
the result is a set.
Formally, the union is defined as follows:
- R ∪ S = { t : t ∈ R ∨ t ∈ S }
The result of the set union is only defined when the two relations
have the same headers.
The Set Difference
The set difference is a binary operation that is written as R
- S and is defined as the usual set difference in set theory.
For an example consider the relations Engineer, Manager
and their difference:
Engineer
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
| Harriet | 2202
|
|
Manager
| Name | EmpId
|
| George | 3401
|
| Harriet | 2202
|
|
Engineer - Manager
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
|
Formally, the semantics of the union is defined as follows:
- R - S = { t : t ∈ R, ¬ t ∈ S }
The result of the set difference is only defined when the two
relations have the same headers.
The Rename
The rename operation is a unary operation that is written as
ρa/b(R) where a and b are
attribute names and R is a relation. The result is identical
to R except that the b field in all tuples is renamed to
an a field. For an example consider the following Employee
relation and its renamed version:
Employee
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
|
ρEmpName/Name(Employee)
| EmpName | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
|
Formally the semantics of the rename operator is defined as
follows:
- ρa/b(R) = { t[a/b] : t ∈ R }
where t[a/b] is defined as the tuple t with the
b attribute renamed to a, i.e., t[a/b] = {
(c, v) | (c, v) ∈ t, c ≠ b }
∪ { (a, t(b)) }. The result of the rename is only
defined when the attribute a did not appear already in the
header of the operand.
The Expressive Power
These operations are enough to express all queries that can be
expressed in the safe parts of tuple calculus and domain calculus which is
essentially the same as first-order logic.
- insert a sketch of proof here
Other Operations expressible with the Basic Operations
Next to the six basic operations some other operations are also
often included even though they can be expressed by combinations
of the basic operations. This is because they either have
interesting algebraic properties or because they can be
implemented more efficiently than their simulations. These
operations are the set intersection, the natural join and
the division.
The Set Intersection
The set intersection is a binary operation that is written as
R ∩ S and is defined as the usual set intersection in
set theory. For an example consider the tables Engineer,
Manager and their intersection:
Engineer
| Name | EmpId
|
| Harry | 3415
|
| Sally | 2241
|
| Harriet | 2202
|
|
Manager
| Name | EmpId
|
| George | 3401
|
| Harriet | 2202
|
|
Engineer ∩ Manager
| Name | EmpId
|
| Harriet | 2202
|
|
Formally, the union is defined as follows:
- R ∩ S = { t : t ∈ R, t ∈ S }
The result of the set intersection is only defined when the two
relations have the same headers. This operation can be simulated
in the basic operations as follows:
- R ∩ S = R - (R - S)
The Natural Join
The natural join is a binary operation that is written as R
|X| S where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and
S that are equal on their common attribute names. For an
example consider the tables Employee and Dept and their
natural join:
Employee
| Name | EmpId | DeptName
|
| Harry | 3415 | Finance
|
| Sally | 2241 | Sales
|
| George | 3401 | Finance
|
| Harriet | 2202 | Sales
|
|
Dept
| DeptName | Manager
|
| Finance | George
|
| Sales | Harriet
|
| Production | Charles
|
|
Employee [X] Dept
| Name | EmpId | DeptName | Manager
|
| Harry | 3415 | Finance | George
|
| Sally | 2241 | Sales | Harriet
|
| George | 3401 | Finance | George
|
| Harriet | 2202 | Sales | Harriet
|
|
The natural join is arguably one of the most important
operators since it allows the combination of relations that are
associated by a foreign key. For example, in the above example
there holds probably a foreign key from Employee.DeptName
to Dept.DeptName and then the natural join of Employee
and Dept combines all employees with their departments. Note
that this works because the foreign key holds between columns with
the same name. If this is not the case such as in the foreign key
from Dept.manager to Emp.emp-number then we have
to rename these columns before we take the natural join. Such a
join is sometimes also referred to as an equijoin (see θ-join).
More formally the semantics of the natural join is defined as
follows:
- R |X| S = { t ∪ s : t ∈ R, s ∈ S, fun(t ∪ s) }
where fun(r) is a predicate that is true for a binary relation
r iff r is a functional binary relation. It is
usually required that R and S must have at least one
common attribute, but if this constraint is omitted then in that
special case the natural join becomes exactly the Cartesian
product as defined above.
The natural join can be simulated with the basic operations as
follows. Assume that a1,...,an are the
attribute names unique to R,
b1,...,bm are the attribute names common
to R and S and c1,...,cm are the
attribute unique to S. Furthermore assume that the attribute
names d1,...,dm are neither in R
nor in S. In a first step we can now rename the common
attribute names in S: : S' :=
ρd1/b1(...ρdm/bm(
S)...) Then we take the Cartesian product and select the
tuples that are to be joined: : T :=
σb1=d1(...σbm=dm(R
× S')...) Finally we take a projection to get rid of
the renamed attributes: : U :=
πa1,...,an,b1,...,bm,c1,...,cm(T)
The Division
The division is a binary operation that is written as R
÷ S. The result consists of the restrictions of tuples
in R to the attribute names unique to R, i.e., in the
header of R but not in the header of S, for which it holds
that all their combinations with tuples in S are present in
R. For an example see the tables CompletedTask,
DBProject and their division:
Completed
| Student | Task
|
| Fred | Database1
|
| Fred | Database2
|
| Fred | Compiler1
|
| Eugene | Database1
|
| Eugene | Compiler1
|
| Sara | Database1
|
| Sara | Database2
|
|
DBProject
| Task
|
| Database1
|
| Database2
|
|
Completed ÷ DBProject
| Student
|
| Fred
|
| Sara
|
|
If DBProject contains all the tasks of the Database
project then the result of the division above contains exactly all
the students that have completed the Database project.
More formally the semantics of the division is defined as follows:
- R ÷ S = { t[a1,...,an] : ∀ s ∈ S ( (t[a1,...,an] ∪ s) ∈ R) }
where {a1,...,an} is the set of
attribute names unique to R and
t[a1,...,an] is the restriction of
t to this set. It is usually required that the attribute names
in the header of S are a subset of those of R because
otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as
follows. We assume that a1,...,an are
the attribute names unique to R and
b1,...,bm are the attribute names of
S. In the first step we project R on its unique attribute
names and construction all combinations with tuples in S:
- T := πa1,...,an(R) × S
In the next step we subtract R from this
relation:
- U := T - R
Note that in U we have the
combinations that "should have been in R but weren't". So if
we now take the projection on the attribute names unique to R
then we have the restrictions of the tuples in R for which not
all combinations with tuples in S were present in R:
- V := πa1,...,an(U)
So what remains to be done is take the projection of R on its
unique attribute names and subtract those in V:
- W := πa1,...,an(R) - V
The Generalized Selection
The generalized selection is a unary operation that is written as
σφ(R) where φ is a propositional
formula that consists of atoms as allowed in the normal selection
and the logical operators ∧ (and), ∨ (or) and ¬
(negation). This selection selects all those tuples in R for
which φ holds. For an example consider the following tables
where the first table gives the relation Person and the second
the result of σAge≥30 ∧
Weight≤60(Person).
Person
| Name | Age | Weight
|
| Harry | 34 | 80
|
| Sally | 28 | 64
|
| George | 29 | 70
|
| Helena | 54 | 54
|
| Peter | 34 | 80
|
|
σAge≥30∧Weight≤60(Person)
| Name | Age | Weight
|
| Helena | 54 | 54
|
|
More formally the semantics of the generalized selection is
defined as follows:
- σφ(R) = { t : t ∈ R, φ(t) }
The result of the selection is
only defined if the attribute names that it mentions are in the
header of the relation that it operates upon.
The simulation of a generalized selection that is not a
fundamental selection with the fundamental operators is defined by
the following rules:
- σφ∧ψ(R) = σφ(R) ∩ σψ(R)
- σφ∨ψ(R) = σφ(R) ∪ σψ(R)
- σ¬φ(R) = R - σφ(R)
The θ-Join and Equijoin
If we want to combine tuples from two relations where the combination condition is not simply the equality of shared columns then it is conventient to have a more general form of join operator, which is the θ-join. The θ-join is a binary operation that is written as R
|X|aθb S or R
|X|aθv S where a and b
are attribute names, θ a binary operation in the set {<,
≤, =, >, ≥}, v is a value constant and R and S are relations. The result if this
operation consists of all combinations of tuples in R and
S that satisfy the equation. The result of the θ-join is only defined if the headers of S and R are disjoint, i.e., do not contain a common attribute.
The simulation of this operation in the fundamental operations is therefore as follows:
- R |X|φ S = σφ(R × S)
In case the operator θ is the equality operator (=) then this join is also called an equijoin.
The Semijoin
The semijoin is similar to the natural join and written as R
|X S where R and S are relations. The result of the
semijoin is the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For an
example consider the tables Employee and Dept and their
natural join:
Employee
| Name | EmpId | DeptName
|
| Harry | 3415 | Finance
|
| Sally | 2241 | Sales
|
| George | 3401 | Finance
|
| Harriet | 2202 | Sales
|
|
Dept
| DeptName | Manager
|
| Sales | Harriet
|
| Production | Charles
|
|
Employee |X Dept
| Name | EmpId | DeptName
|
| Sally | 2241 | Sales
|
| Harriet | 2202 | Sales
|
|
More formally the semantics of the semijoin is defined as
follows:
- R |X S = { t : t ∈ R, s ∈ S, fun(t ∪ s) }
where fun(r) is as in the definition of natural join.
The semijoin can be simulated using the natural join as
follows. Assume that a1,...,an are the
attribute names of R, then it holds that:
- R |X S = πa1,..,an(R |X| S)
Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin.
Operations for null values
The outer join is a binary operation that combines tuples from two relations. There are three types of outer joins.
Left Outer Join
The left outer join is written as R =X S where R and S are relations. The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in R that have no matching tuples in S.
For an consider the tables Employee and Dept and their
left outer join:
Employee
| Name | EmpId | DeptName
|
| Harry | 3415 | Finance
|
| Sally | 2241 | Sales
|
| George | 3401 | Finance
|
| Harriet | 2202 | Sales
|
| Tim | 1123 | Executive
|
|
Dept
| DeptName | Manager
|
| Sales | Harriet
|
| Production | Charles
|
|
Employee =X Dept
| Name | EmpId | DeptName | DeptName | Manager
|
| Harry | 3415 | Finance | null | null
|
| Sally | 2241 | Sales | Sales | Harriet
|
| George | 3401 | Finance | null | null
|
| Harriet | 2202 | Sales | Sales | Harriet
|
| Tim | 1123 | Executive | null | null
|
|
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.
Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in DeptName have tuples of Finance or Executive.
...(maths)...
The left outer join can be simulated using the natural join and set union as
follows:
R =X S = R ∪ (R |X| S)
Right Outer Join
The right outer join behaves almost identically to the left outer join, above, with the exception that all the values from the "other" relation appear in the resulting relation.
The right outer join is written as R X= S where R and S are relations. The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.
For an consider the tables Employee and Dept and their
right outer join:
Employee
| Name | EmpId | DeptName
|
| Harry | 3415 | Finance
|
| Sally | 2241 | Sales
|
| George | 3401 | Finance
|
| Harriet | 2202 | Sales
|
| Tim | 1123 | Executive
|
|
Dept
| DeptName | Manager
|
| Sales | Harriet
|
| Production | Charles
|
|
Employee X= Dept
| Name | EmpId | DeptName | DeptName | Manager
|
| Sally | 2241 | Sales | Sales | Harriet
|
| Harriet | 2202 | Sales | Sales | Harriet
|
| ω | ω | ω | Production | Charles
|
|
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the resulting relation where tuples in DeptName had tuples of Production.
...(maths)...
The right outer join can be simulated using the natural join and set union as
follows:
RX=S = S ∪ (R |X| S)
Outer Join
The outer join or full outer join in effect combines the results of the left and right outer joins.
The full outer join is written as R =X= S where R and S are relations. The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.
For an consider the tables Employee and Dept and their
full outer join:
Employee
| Name | EmpId | DeptName
|
| Harry | 3415 | Finance
|
| Sally | 2241 | Sales
|
| George | 3401 | Finance
|
| Harriet | 2202 | Sales
|
| Tim | 1123 | Executive
|
|
Dept
| DeptName | Manager
|
| Sales | Harriet
|
| Production | Charles
|
|
Employee =X= Dept
| Name | EmpId | DeptName | DeptName | Manager
|
| Harry | 3415 | Finance | ω | ω
|
| Sally | 2241 | Sales | Sales | Harriet
|
| George | 3401 | Finance | ω | ω
|
| Harriet | 2202 | Sales | Sales | Harriet
|
| Tim | 1123 | Executive | ω | ω
|
| ω | ω | ω | Production | Charles
|
|
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
R=X=S = (R=XS) ∪ (RX=S)
or
R=X=S = R ∪ S ∪ (R |X| S)
Operations for Domain Computations
The Aggregation Operation
how is the aggregation opperation performed?
The Extend Operation
Algebraic properties
Use of Algebraic Properties for Query Optimization
Related External Links