Online Encyclopedia

Categories: Object-oriented programming | Data management

Abstraction (computer science)

In computer science, abstraction refers to two related, but different concepts.

Contents

1 Software development

1.1 Control abstraction

1.1.1 Structured programming

2 Data abstraction

2.1 Abstraction in object oriented programming
2.2 Further reading
2.3 See also

3 Semantics

Software development

In software development, abstraction is the process of combining multiple smaller operations into a single unit that can be referred to by name. It is a technique to factor out details and ease use of code and data. It is by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions; this has the fortunate effect of finessing some of the vexing philosophical issues of abstraction.

Abstraction allows programmers to think simply about a problem, by deferring unimportant detail for later, while still allowing thought about more important goals, in stages of thinking, not all-at-once. For example, in both computing and in mathematics, numbers are concepts in the programming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept.

The concept of abstraction is itself a declarative statement in programming languages such as C++ or Java, using the keywords virtual or abstract, respectively. After such a declaration, it is the responsibility of the programmer to implement a class to instantiate the object of the declaration. Or, if the specification language is UML, for example, the abstract classes are simply left abstract during the architecture and specification phase of the project.

Abstraction can be either that of control or data. Roughly speaking, control abstraction is the abstraction of actions while data abstraction is that of data structures. Control abstraction, seen in structured programming, is use of subprograms and control flows. Data abstraction is primary motivation of introducing datatype and subsequently abstract data types.

Object-oriented programming can be seen as an attempt to abstract both data and code.

Control abstraction

Control abstraction is one of the main purposes of using programming languages. Computer machines understand operations at the very low level such as moving some bits from one location of the memory to another location and producing the sum of two sequences of bits. Programming languages allow this to be done in the higher level. For example,

a = (1 + 2) * 5

Structured programming

Structured programming involves the splitting of complex program tasks into smaller pieces with clear flow control and interfaces between components, with reduction of the complexity potential for side-effects.

In a simple program, this may be trying to ensure that loops have single or obvious exit points and trying, where it's most clear to do so, to have single exit points from functions and procedures.

In a larger system, it may involve breaking down complex tasks into many different modules. Consider a system handling payroll on ships and at shore offices:

The uppermost level may be a menu of typical end user operations.
Within that could be standalone executables or libraries for tasks such as signing on and off employees or printing checks.
Within each of those standalone components there could be many different source files, each containing the program code to handle a part of the problem, with only selected interfaces available to other parts of the program. A sign on program could have source files for each data entry screen and the database interface (which may itself be a standalone third party library or a statically linked set of library routines).
Either the database or the payroll application also has to initiate the process of exchanging data with between ship and shore and that data transfer task will often contain many other components.

These layers produce the effect of isolating the implementation details of one component and its assorted internal methods from the others. This concept was embraced and extended in object-oriented programming.

Data abstraction

Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data type--the interface to the data type--while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour.

For example, one could define an abstract data type called lookup table, where keys are uniquely associated with values, and values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list (which is actually quite efficient for small data sets). As far as client code is concerned, the abstract properties of the type are the same in each case.

Of course, this all relies on getting the details of the interface right in the first place, since any changes there can have major impacts on client code. Another way to look at this is that the interface forms a contract on agreed behaviour between the data type and client code; anything not spelled out in the contract is subject to change without notice.

Languages that implement data abstraction include Ada and Modula-2. Object-oriented languages are commonly claimed to offer data abstraction; however, their inheritance concept tends to put information in the interface that more properly belongs in the implementation; thus, changes to such information ends up impacting client code, leading directly to the fragile base class problem.

Abstraction in object oriented programming

In object-oriented programming theory, abstraction is the facility to define objects that represent abstract "actors" that can perform work, report on and change their state, and "communicate" with other objects in the system. The term encapsulation refers to the hiding of state details, but extending the concept of data type from earlier programming languages to associate behavior most strongly with the data, and standardizing the way that different data types interact, is the beginning of abstraction. When abstraction proceeds into the operations defined, enabling objects of different types to be substituted, it is called polymorphism. When it proceeds in the opposite direction, inside the types or classes, structuring them to simplify a complex set of relationships, it is called delegation or inheritance. These terms are very often used in contradictory ways by users of various object-oriented progamming languages, which offer similar facilities for abstraction.

Various object-oriented progamming languages offer similar facilities for abstraction, all to support a general strategy of polymorphism in object-oriented programming, which includes the substitution of one type for another in the same or similar role. Although it is not as generally supported, a configuration or image or package may predetermine a great many of these bindings at compile-time, link-time, or load-time . This would leave only a minimum of such bindings to change at run-time.

For example, Linda abstracts the concepts of server and shared data-space to facilitate distributed programming.

In CLOS or self, for example, there is less of a class-instance distinction, more use of delegation for polymorphism, and individual objects and functions are abstracted more flexibly to better fit with a shared functional heritage from Lisp.

Another extreme is C++, which relies heavily on templates and overloading and other static bindings at compile-time, which in turn has certain flexibility problems.

Although these are alternate strategies for achieving the same abstraction, they do not fundamentally alter the need to support abstract nouns in code - all programming relies on an ability to abstract verbs as functions, nouns as data structures, and either as processes.

In Java, a simple and widely used language, abstraction is achieved most commonly with an extended data type called a class, while objects of that type are called instances of that class. Other languages facilitate more complex abstractions, and relying overmuch on Java terminology is quite misleading, but Java examples proliferating on the Internet tend to be one of the easiest ways to understand how abstraction applies in different problem domains .

For example, here is a sample Java fragment to represent some common farm "animals" to a level of abstraction suitable to model simple aspects of their hunger and feeding. It defines an Animal class to represent both the state of the animal and its functions:

  class Animal extends LivingThing {
    Location m_loc;
    double m_energy_reserves;
    
    boolean is_hungry() {
      if (m_energy_reserves < 2.5) { return true; }
      else { return false; }
    }
    void eat(Food f) {
      // Consume food
      m_energy_reserves += f.getCalories();
    }
    void moveto(Location l) {
      // Move to new location
      m_loc = l;
    }
  }

With the above definition, one could create objects of type Animal and call their methods like this:

  thePig = new Animal();
  theCow = new Animal();
  if (thePig.is_hungry()) { thePig.eat(table_scraps); }
  if (theCow.is_hungry()) { theCow.eat(grass); }
  theCow.move(theBarn);

In the above example, the class animal is an abstraction used in place of an actual animal, LivingThing is a further abstraction (in this case a generalisation) of animal.

If a more differentiated hierarchy of animals is required to differentiate, say, those who provide milk from those who provide nothing except meat at the end of their lives, that is an intermediary level of abstraction, probably DairyAnimal (cows, goats) who would eat foods suitable to giving good milk, and Animal (pigs, steers) who would eat foods to give the best meat quality.

Such an abstraction could remove the need for the application coder to specify the type of food, so s/he could concentrate instead on the feeding schedule. The two classes could be related using inheritance or stand alone, and varying degrees of polymorphism between the two types could be defined. These facilities tend to vary drastically between languages, but in general each can achieve anything that is possible with any of the others. A great many operation overloads, data type by data type, can have the same effect at compile-time as any degree of inheritance or other means to achieve polymorphism. The class notation is simply a coder's convenience.

Decisions regarding what to abstract and what to keep under the control of the coder are the major concern of object-oriented design and domain analysis - actually determining the relevant relationships in the real world is the concern of object-oriented analysis or legacy analysis .

In general, to determine appropriate abstraction, one must make many small decisions about scope, domain analysis, determine what other systems one must cooperate with, legacy analysis, then perform a detailed object-oriented analysis which is expressed within project time and budget constraints as an object-oriented design. In our simple example, the domain is the barnyard, the live pigs and cows and their eating habits are the legacy constraints, the detailed analysis is that coders must have the flexibility to feed the animals what is available and thus there is no reason to code the type of food into the class itself, and the design is a single simple Animal class of which pigs and cows are instances with the same functions. A decision to differentiate DairyAnimal would change the detailed analysis but the domain and legacy analysis would be unchanged - thus it is entirely under the control of the programmer, and we refer to abstraction in object-oriented programming as distinct from abstraction in domain or legacy analysis.

Semantics

When discussing formal semantics of programming languages, formal methods or abstract interpretation, abstraction refers to the act of considering a less accurate, but safe, definition of the observed program behaviors. For instance, one may observe only the final result of program executions instead of considering all the intermediate steps of executions. Abstraction is defined to a concrete (more precise) model of execution.

Abstraction may be exact or faithful with respect to a property if it is possible to answer a question about the property equally well on the concrete or abstract model. For instance, if we wish to know what the result of the evaluation of a mathematical expression involving only integers +,-,×, is worth modulo n, it is sufficient to perform all operations modulo n (a familiar form of this abstraction is casting out nines).

Abstractions, however, are not necessarily exact, but one requires that they should be sound. That is, it should be possible to get sound answers from them — even though the abstraction may simply yield a result of undecidability. For instance, we may abstract the students in a class by their minimal and maximal ages; if one asks whether a certain person belongs to that class, one may simply compare that person's age with the minimal and maximal ages; if his age lies outside the range, one may safely answer that the person does not belong to the class; if it does not, one may only answer "I don't know".

Abstractions are useful when dealing with computer programs, because non-trivial properties of computer programs are essentially undecidable (see Rice's theorem). As a consequence, automatic methods for deriving information on the behavior of computer programs either have to drop termination (on some occasions, they may fail, crash or never yield out a result), soundness (they may provide false information), or precision (they may answer "I don't know" to some questions).

Abstraction is the core concept of abstract interpretation. Model checking is generally performed on abstract versions of the studied systems.

The opposite of abstraction is concretisation .

This article was originally based on material from the Free On-line Dictionary of Computing and is used under the GFDL.

Categories: Object-oriented programming | Data management

Last updated: 12-13-2004 17:14:50

Your Online Encyclopedia

Online Encylopedia and Dictionary Research Site