Sunday, April 22, 2007

What is abstraction?

In previous posts I discussed computer science, mathematics, and abstraction, arguing that computer science was the best discipline for teaching abstraction. I didn't attempt to define what I meant by abstraction. Jeff Kramer characterized abstraction to be "the process of selection and removal of detail." Here are some dictionary definitions (thanks to that take a similar position.
to consider something as a general quality or characteristic, apart from concrete realities, specific objects, or actual instances. —

to consider theoretically or separately from something else — Oxford Compact Dictionary

to consider (a quality, for example) without reference to a particular example or object. — American Heritage Dictionary

to consider a concept without thinking of a specific example; consider abstractly or theoretically —
The problem with these definitions is that they assume that one already knows what it is that needs to be abstracted, i.e., what the unnecessary details are that should be stripped away, what the general qualities or characteristics are that one is considering, what it is that should be considered theoretically or separately or without reference to a particular example.

But that's not the sense in which abstraction is used in computer science — and which I believe is the more important usage. The ability to abstract is not the ability to consider a quality without reference to particular instances. It is the ability to get to the essence of something, to decide what's more and less fundamental in an situation. Abstraction in this sense is a creative activity. It involves understanding a situation deeply and seeing the essential features. In the process one does strip away the contingent elements and the characteristics that appear because of a particular instance. But the stripping away is done creatively as one determines what is essential and what is superficial (for some particular problem).

In important cases, abstraction involves creating the abstract ideas that characterize the essential elements. That's the hardest but most fulfilling part of abstracting. When one finds a way to see something that makes it clear what's really going on, how all the pieces fit together, that, I would say is the real process of abstraction.

A personal experience illustrated this for me. About a year ago I wrote a paper which I called "Open at the Top; Open at the Bottom; and Continually (but Slowly) Evolving." In it, I talked about the US Postal System as an early example of a communication infrastructure that fostered and enabled innovation. The USPS was, in effect, and early version of the world wide web. Postal addresses were like web sites. It was easy to create one. It allowed individuals and companies to interact with each other. Mail order businesses flourished. Etc.

In the paper I struggled to characterize just what it was about the postal system (and current systems like the web) that allowed this kind of creativity to grow up around them. The title was my answer. Like the web, the postal system defined a protocol for interaction. It didn't matter how that protocol was implemented: the pony express, hand delivery, post office boxes, user of commercial airlines, copper wire, fiber optics, satellites, etc. What was important was that the protocol worked no matter how it was implemented. That's what was meant by open at the bottom.

Also the protocol allowed user to be creative with it. One could establish a mail order business or a personal address just as now one can establish a web business or a personal website. That's what was meant by open at the top.

The protocols can even change, e.g., the price of postage increases; zip codes are introduced; new versions of html become standardized. But as long as the change happens slowly enough the infrastructure persists and people can continue to use it.

I thought this was an important idea and that I had a pretty good sense of what was going on. But it wasn't really crisp.

Late last year, I came across the idea from economics of multi-sided platforms. See, for example, this interview with Andrei Hagiu, which I didn't see until after his book Invisible Engines: How Software Platforms Drive Innovation and Transform Industries was published last Fall. (It can be downloaded free.) The notion of a multi-sided platform was what I needed to complete the abstraction.

From the economic perspective, a multi-sided platform is a business that sells a mechanism that brings two groups of users together. A shopping center is a multi-sided platform. The users are buyers and sellers. Note how different this is from a traditional business in which one sells either a service that one performs or a product that one either (a) constructs from components that one buys from suppliers or (b) buys at wholesale and makes available at retail. The owner of a multi-sided platform sells access to a group of users to the other group of users. Any advertising medium is a multi-sided platform. The insight of recent economics researchers is that the postal service and the world wide web are also multi-sided platforms. The book by Hagiu, et. al. has the further insight that many software systems are multi-sided platforms. An operating system, for example, brings software developers and software users together.

My next step was to conceptualize multi-sided platforms in computer science terms. From a software perspective, a multi-sided platform is a level of abstraction that has multiple kinds of users. We all know that in Computer Science a level of abstraction is a set of operations and data types that are defined independently of their implementation. In computer science we frequently refer to a level of abstraction as a platform. A level of abstraction is, of course, open at the bottom. One doesn't care how the abstractions it defines are implemented as long as they perform as advertised.

A level of abstraction is valuable to the extent that the operations it defines can be used creatively, i.e., open at the top.

Levels of abstractions (or standards, which are specifications of levels of abstraction) may change. But they can't change too fast or too frequently or they will use their user base.

A level of abstraction that has two or more distinct sets of users is a multi-sided platform. It is the multi-sided platforms that enable communication among different groups of users. The browser is a multi-sided platform in that it brings together web site providers with web surfers. The protocol is html.

So finally, the right abstraction for the postal system and the web is a level of abstraction that is used by multiple set of users, i.e., a multi-sided platform. That's the essence of what I was searching for in the paper about the postal service. I discuss this clearer characterization in section 6 of "Putting Complex Systems to Work," a paper I wrote last January. If you compare the two papers, you will see how far the abstraction has developed.

So I think it is the finding and clarifying of some essential qualities that characterizes what's really important about the process of abstraction. Of course this is what creative mathematician do as well. It is what creative thinkers in any discipline do. But I believe that it is this sort of abstraction that the study and practice of Computer Science teaches better than the study of any other discipline.

No comments: