WHAT IS THE NEXT PROGRAMMING PARADIGM?

ACCEPTED FOR THE SECOND INTERNATIONAL SOFTWARE TECHNOLOGY EXCHANGE WORKSHOP, KISTA, SWEDEN, 2012. PRELIMINARY VERSION.

Johan Glimming Thorsten Altenkirch Patrik Jansson
Functor AB University of Nottingham Chalmers University of Technology
j@functor.se txa@cs.nott.ac.uk patrikj@chalmer.se

ABSTRACT

One of the design novelties in C++ over C was its stronger type system. Recently, the designer of C++ has suggested that more advanced type systems are needed than before due to changes in the world as a result of our increasing dependency on software and the limitations of power and performance that we are approaching. A notable technology has emerged over the last decades in parallel with software engineering advances. As yet, it has not made it into the industrial realm.

This technology is known as “dependent types” and is largely founded on over forty years of Swedish world-leading research in the foundations of mathematics and philosophy. Dependent types are types that depend on values allowing us to talk about sequences of a fixed length, square matrices and many other things we want to say about a computation. As a consequence computations themselves and their way into the types.

As we find ourselves becoming more and more dependent on software, we here conjecture that we will also see programs that depend on types in the industry. The implication of such a shift in programming can have dramatical consequences especially in an economy where trust has increasing importance. This new constructive programming paradigm has the potential of changing the methods by which we develop software, offer a workaround for hardware performance boundaries by moving run-time checks to compile-time while it also can improve our trust in software. With this paradigm shift, software will be significantly more reliable in the future.

INTRODUCTION

At the beginning of this year, the creator of the C++ programming language, Bjarne Stroustrup, explained his vision of a future of programming language. While doing so, he capitalized on two key problems in the current world of information technology with respect to computer software [3] The first problem he mentions is the limitation of computing power in the sense that we are reaching the physical limits of how fast a processor can compute data. The second one is the need to make software more reliable.

In this brief exposition we will survey one new technology that we argue can address both these problems. This technology forms what we here call the constructive programming paradigm since is firmly linked to an area of mathematics known as constructive mathematics. Significant progress in the area of constructive mathematics comes from the Swedish mathematician and philosopher Per Martin-Löf, who during the last forty years has developed this research area in a form known simply as (Martin-Löf) type theory. It is well known in theoretical research world-wide. Martin-Lof’s research implies a departure from ordinary classical mathematics (and its foundations), with vast consequences within both philosophy of language and within mathematics and mathematical logic. It remains a very active area of research also within these fields. The insight that this theoretical work in fact constitutes a novel way of programming was made clear by Martin-Löf in the pioneering paper “Constructive Mathematics and Computer Programming” as early as in 1982. During the decades that then followed, substantial theoretical research efforts were devoted to take this approach to programming further.

Only recently, it has become clear that the constructive programming paradigm is viable also in the industry supporting general-purpose programming languages [2] which can solve the problems highlighted this year by Stroustrup. However, this new technology need not, at least not initially, meet the industry’s requirements in its full blown theoretical power. Rather, the technology can take the form of a tool and an extension of the programming language C, see e.g. Stroustrup [3] which is also the first product offered by Functor AB. We can now begin to leverage an enormous investment in basic research and as far as we can see give it a very important role in the future of our information technology society.

FROM TYPES TO TRUST

Types matter! That is what they are for: to classify data with respect to criteria which matter: how they should be stored in memory, whether they can be safely passed as inputs to a given operation, even who is allowed to see them. Dependent types are types expressed in terms of data, explicitly relating their inhabitants to that data. As such, they enable you to express more of what matters about data. While conventional type systems allow us to validate our programs with respect to a fixed set of criteria, dependent types are much more flexible, they realize a continuum of precision from the basic assertions we are used to expect from types up to a complete specification of the program’s behaviour [1].

It becomes, then, the programmer’s choice to what degree he wants to exploit the expressiveness of such a powerful type discipline. While the price for formally certified software may be high, it is good to know that we can pay it in installments and that we are free to decide how far we want to go. Dependent types reduce certification to type checking, hence they provide a means to convince others that the assertions we make about our programs are correct. Dependently typed programs are, by their nature, proof carrying code.

CONSEQUENCES

Dependent types can be used to express schemes of recursion in the form of induction principles, such as constructor-guarded recursion for a given inductive datatype. In the academic prototypes such as the Epigram language, programs invoke recursion schemes explicitly each recursive call must be translatable to an inductive hypothesis arising from such a scheme. Any program that employs only constructor-guarded recursion, or some fancier scheme derived therefrom, is guaranteed to be total.

However, general recursion also takes the form of an induction principle (with a particularly generous inductive hypothesis). We can make general recursion available, but only by explicit appeal to this principle. If we suppress the compile-time computational behaviour of general recursion, we can preserve decidability of type checking, leaving computation by explicitly total programs activated. This opens a range of possibilities. And it means that programs can be proved correct, can have interfaces which are enriched with invariants and constraints that we otherwise would not be able enforce even at run-time since formerly many properties where assumed to be correctly coded by the programmer, perhaps with inserted program comments suggesting what the programmer actually had intended, but we where and still are forced to rely only on testing and static analysis tools to eliminate the range of errors that are associated with today’s practice of programming including in leading industry companies.

The merit of a paradigm based on dependent types, or more specifically on the “type theory-approach” to programming, is that we can enforce properties at a very early stage in the software engineering and programming activity. We consider this to be a long awaited step forward since other engineering disciplines are already firmly founded on such principles. Furthermore, the programmer who actually constructs the code will increasingly also be responsible for the correctness and performance of that code. This means that we can reduce the difficult and costly testing activity. Instead of the black-box “lost and found”-methodology to discover errors, we can get things right by focusing on the most important part: the actual construction phase of the software. In one extreme case, dependent types can even enforce total correctness of a component or even an entire program. On the other extreme, we are able to continue programming exactly in the same manner as we do today. Of course, the optima of these two extremes will be determined in each individual development project.

The task of identifying how much of the dependent type-technology should be exercised in a project, and where, suggests that also new methodologies are called for. Depending on the risks and requirements of a given project, a project manager will certainly have to make new decisions. The new technology will thus have to be integrated or embedded in the methods of program construction in order for us to leverage all of its potential. For example, in one project it may help simply to exclude exception handling at run-time while in another, say a computer game for example, it more than half the number of bugs (as reported by the CEO and CTO of EPIC Games referring to this particular technology) by merely using some rudiments of what the new paradigm has to offer.

PARADIGM MATTER

The constructive programming paradigm seems to revolutionize the nature of programming by reaching a limit of expressibility allowing us to say whatever we want to say about a program in its type; and checkable by a compiler instead of being hidden in comments inaccessible by the machine. This is achieved without artificially putting on a logical superstructure (as in e.g. Spec#) on a programming language but by just extending the use of types in a natural way. Thus we can freely move along the axis from prototype to verified system and pay as we go. In this sense, this is, to our minds, an entirely new paradigm of programming which now lies before our feet to be deployed.

The programming language is, after all, the most important tool of all in programming. All other tools are fundamentally linked to the language or the languages we use, bounded by their expressive power etc. Even the methodologies we exercise are firmly anchored in the programming language. For example, the Uni ed Modeling Language (UML) is often used in mainstream software development in the industry as of today. UML is a language for expressing architecture and design and embodies even a logic, the Object Constraint Language (viz. a logical superstructure above the language above the programming language), to aid various development methods, including so-called agile development methods. Furthermore, UML itself is highly dependent on the underlying programming language itself and it is easy to trace much of UML’s notation directly to the features that are currently available in e.g. C++ and Java.

The emergence of new languages also meant changes in UML e.g. from UML 1 to UML 2. This suggests that even if we use a language above the programming language to express parts of our system, design or otherwise, and even if we generate target code inside the programming language directly from, let’s say, a graphical design tool (or tool for developing software for control systems for that matter), the expressive power is bounded by the underlying programming language. The relationship between programming languages and programming tools or domain-specific languages are in fact even more important and problematic than we sketched in the previous paragraph.

In today’s industry practice the generation of so-called test cases is often linked to the language (tool) which is in the hands of the persons that design (construct, develop) the system rather than the target programming language. The gap between the tool and the programming language means that testing often becomes extremely difficult. In particular, test cases expressed in a domain-specific language frequently has to be interpreted in the target programming language in order to eliminate an error which otherwise cannot be remedied. One cornerstone of so-called agile development methods is the focus on code rather than on other artifacts. This means that programming languages are given an even more prominent role. Thus, the rationale that lies behind agile techniques implies that a new programming paradigm with technological advantages can have substantial impact on productivity, while the same arguments also seem again to say that programming languages are indeed of paramount importance, which after all is perhaps not that surprising despite the often held belief to the contrary.

THE FUTURE

Opportunity doesn’t come without challenges: the design space for programming languages is huge. Academic programming languages such as Twelf, Coq, Agda, Epigram and Idris are vehicles for experiments within this new constructive programming paradigm.

In the talk that is associated to this brief introduction to the technology, we will use the language Idris to illustrate the potential in expressing complex data dependency, eliminating run-time errors of kinds that have had dramatic consequences even in the safety-critical domain, and to integrate logic and programming using the fundamental relationship between constructive mathematics and constructive programming.

At the same time we shall have a glimpse at the huge design space for this new era of strongly typed languages which we will have to navigate if we want to make the full power of the paradigm available to the software industry. We will moreover show how domain-specific languages, embedded in the programming language itself (i.e. not a superstructure like UML) while comparing to Fowler’s corresponding C++ code, can be supported in new ways within the paradigm, again with examples from Idris [2] an academic experimental language with some six years of history behind it already.

We will also show that the new paradigm includes imperative, functional as well as object-oriented programming and that it supports low-level code where needed. Finally, and based on examples from Functor’s collaboration with the EFDA CCFE JET fusion reactor outside Oxford, we will show that even rudiments of the paradigm can bring significant value to software development and even in highly safety-critical environments. Moreover, we will show that the tools can in this way scale up to full-blown constructive programming tools.

Functor’s first product is deployed in a safety-critical environment and surprisingly takes the form of a tool for the C programming language which is typically used for embedded real-time systems as of today’s practice. In this way, we see a path to the future of programming and software development methodology normally being laid out.

REFERENCES

[1] Thorsten Altenkirch, Conor McBride, and James McKinna. Why dependent types matter. Manuscript for an invited talk given at POPL 2006, April 2005.
[2] Edwin Brady. Idris: systems programming meets full dependent types. In Rannjit Jhala and Wouter Swierstra, editors, Proceedings of the 5th ACM Workshop Programming Languages meets Program Verification, at POPL 2011, pages 43{54. ACM, 2011.
[3] Bjarne Stroustrup. Software development for infrastructure. IEEE Computer, 45(1):47{58, January 2012.