Online Encyclopedia

Categories: Information technology | Data management

Database

(Redirected from Data base)

A database is an information set with a regular structure. Its front-end allows data access, searching and sorting routines. Its back-end affords data inputting and updating. A database is usually but not necessarily stored in some machine-readable format accessed by a computer. There are a wide variety of databases, from simple tables stored in a single file to very large databases with many millions of records, stored in rooms full of disk drives or other peripheral electronic storage devices.

Databases resembling modern versions were first developed in the 1960s. A pioneer in the field was Charles Bachman.

The most useful way of classifying databases is by the programming model associated with the database. Several models have been in wide use for some time. Historically, the hierarchical model was implemented first, then the network model, then the relational model overcame with the so-called flat model accompanying it for low-end usage. The first two and the last one were never theoretised and were deemed as data models only as a contrast to the relational model, not having conceptual underpinnings of their own; they have arisen simply out of the realisation of physical constraints and programming, not data, models.

Contents

1 Database models

2 Implementations and indexing

3 Mapping objects into databases

4 Applications of databases

4.1 Database application

5 Transactions and concurrency

6 See also

7 References

Database models

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password might be used as a part of a system security database. Each row would have the specific password associated with a specific user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is the basis of the spreadsheet.

The network model allows multiple tables to be used together through the use of pointers (or references). Some columns contain pointers to different tables instead of data. Thus, the tables are related by references, which can be viewed as a network structure. A particular subset of the network model, the hierarchical model, limits the relationships to a tree structure, instead of the more general directed graph structure implied by the full network model.

Relational databases consist of three components:

relations of n-tuples (tables of rows) of data elements (or attributes or columns); each n-tuple (row) is a collection of data elements (attributes/columns) of the entity represented by that particular n-tuple (row);
a collection of operators, the relational algebra and calculus;
and a collection of integrity constraints, defining the set of consistent database states and changes of state. The integrity constraints can be of four types: domain (AKA type), attribute, relvar and database constraints.

Unlike the hierarchical and network models, there are no explicit pointers whatsoever in the data held in the relational model. In the hierarchical and network models data is accessed by the programmer specifying an access path from pointer to pointer embedded in the data. In the relational model data is accessed using relational algebra. Subsets of n-tuples (rows) in different relations (tables) are joined in cross-products, they are intersected and they are differenced using the values of any of the attributes (columns). This flexibility in relational databases allow users (and programmers) to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made relational databases very popular with businesses.

Any number of declararative programming languages could be invented which would provide users with the means of specifying the relational algebra necessary to access and manipulate the data in relational databases. The de facto standard is Structured Query Language (SQL) although every RDBMS has its own dialect of this English-like declarative programming language.

The relational model is an implementation of the relational algebra and set theory branches of mathematics to the design and working of databases. Perhaps the most important pioneer in this field was Ted Codd. Although this model is the basis for relational database software management systems (RDBMS), very few RDBMS's implement the model entirely rigourously or completely and many have extra features which, if used, violate the theory. Some so called RDBMS's are not relational enough to be worthy of the term - they are DBMS's with relational features.

Implementations and indexing

All of these kinds of database can take advantage of indexing to increase their speed. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used, including b-trees, hashes, and linked lists are all common indexing techniques.

Relational DBMSs have the advantage that indices can be created or dropped without changing existing applications, because applications don't use the indices directly. Instead, the database software decides on behalf of the application which indices to use. The database chooses between many different strategies based on which one it estimates will run the fastest.

Mapping objects into databases

In recent years, the object-oriented paradigm has been applied to databases as well, creating a new programming model known as object databases. These databases attempt to overcome some of the difficulties of using objects with the SQL DBMSs. An object-oriented program allows objects of the same type to have different implementations and behave differently, so long as they have the same interface (polymorphism). This doesn't fit well with a SQL database where user-defined types are difficult to define and use, and where the Two Great Blunders prevail: the identification of classes with tables (the correct identification is of classes with types, and of objects with values), and the usage of pointers.

A variety of ways have been tried for storing objects in a database, but there is little consensus on how this should be done. Implementing object databases undo the benefits of relational model by introducing pointers and making ad-hoc queries more difficult. This is because they are essentially adaptations of obsolete network and hierarchical databases to object-oriented programming. As a result, object databases tend to be used for specialized applications and general-purpose object databases have not been very popular. Instead, objects are often stored in SQL databases using complicated mapping software. At the same time, SQL DBMS vendors have added features to allow objects to be stored more conveniently, drifting even further away from the relational model.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology.

Database application

A database application is a type of computer application dedicated to managing a database. Database applications span a huge variety of needs and purposes, from small user-oriented tools such as an address book, to huge enterprise-wide systems for tasks like accounting.

The term "database application" usually refers to software providing a user interface to a database. The software that actually manages the data is usually called a database management system (DBMS) or (if it is embedded) a database engine.

Examples of database applications include MySQL, Microsoft Access, dBASE, FileMaker, Oracle, Informix, and (to some degree) HyperCard.

In March, 2004, AMR Research (as cited in the CNET News.com article listed in the "References" section) predicted that open source database applications would come into wide acceptance in 2006.

Transactions and concurrency

In addition to their data model, most practical databases attempt to enforce a database transaction model that has desirable data integrity properties. Ideally, the database software should enforce the ACID rules, summarised here:

Atomicity - either all or no operations are completed. (Transactions that can't be finished must be completely undone.)
Consistency - all transactions must leave the database in consistent state.
Isolation - transactions can't interfere with each other's work and incomplete work isn't visible to other transactions.
Durability - successful transactions must persist through crashes.

In practice, many DBMS's allow most of these rules to be relaxed for better performance.

Concurrency control is a method used to ensure transactions are executed in a safe manner and follows the ACID rules. The DBMS must be able to ensure only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.

References