Archive

Posts Tagged ‘type-systems’

Static typing: the presumption of guilt

April 14th, 2010 No comments

I was listening to Software Engineering Radio recently; the interview with Gilad Bracha on Newspeak. In it, Gilad talks at length about dynamic typing, and its superiority over static typing. This was the best argument for dynamic typing I’d heard, and it slotted perfectly with my growing respect for dynamic typing.

Some background information: Gilad Bracha co-wrote the second and third Java Language Specifications. He is the creator of Newspeak, a dynamically typed languaged influenced by Self and Smalltalk. While a statically typed language has metadata in the form of type information, Newspeak types can have rich and arbitrary metadata, but its use is not enforced by a type-checker. Tools can be used to read this metadata and give the programmer useful information about the state of the program, but there are no rules its use.

It’s best if I just let Gilad do the talking. I could paraphrase his points but I wouldn’t say it as well.

Gilad Bracha I have a long-standing religious war going on this thing [typing]. I’ve spent a lot of my time working with types; unlike most proponents of dynamic languages I’ve spent a lot of work building static type systems, and I know what they are. And I know what they are good for: they’re mainly good for documentation — documentation for both man and machine. The idea that they make your programs more reliable is a myth. They also have the advantage that it is easier to optimise, but that’s not essential, that’s been proven time again. Most of the hotspot technology that makes Java fast was developed for Self and Smalltalk. It doesn’t necessarily rely on the typing. Some things do, because the typing’s there and people tend to use that. But the real value is that humans who read this can quickly get an idea of what this thing does, and machines can do that, so you can do better name completion, better refactoring, etc., etc.

Markus Völter So with machines you don’t necessarily mean the compiler, but the tooling for the language.

GB Yes. And so I’m all for having what I call pluggable, or optional type systems. The difference is the conventional approach is the type system is part of the language. If you don’t pass the type-checker your program isn’t legal; it won’t run; the compiler will spit it back in your face. The idea with an optional type system is, OK, you write these types; you can run a type-checker; it can tell you what it thinks; it does not affect your semantics; it doesn’t change what the program does, and it certainly doesn’t prevent you from running the program just as if it had no types at all.

MV Why would you run a program whose type-checker, even if it’s optional, tells you your types are wrong?

GB Because what type-checkers do is they attempt to prove that your program conforms to the type system. The type system guarantees you that, if you conform to it, certain types of bad thing won’t happen. That doesn’t mean that if you don’t conform to it bad things will happen. They do not prove that your program is bad; they prove that your program isn’t good, according to their definition of good. And their definition of good is always too restrictive inherently, because the only way to tell if it’s really good is the runtime semantics.

MV But if I try to add 2 to a string, that will never make sense, right?

GB If you add 2 to a string, it may never make sense … but that isn’t really the problem. This is the kind of toy example that theoreticians give you, but that really isn’t the problem. The problems are much more subtle than that. And there are a lot of things that do make sense. It’s sort of like the Anglo-Saxon legal system versus I believe the Napoleonic one: are you innocent until proven guilty or are you guilty until proven innocent? We generally don’t assume that someone should be arrested because he can’t prove that he didn’t do anything wrong. A type system is essentially Napoleonic law. It arrests you because you can’t show that you’re worthy. That’s one issue, the flexibility. There are other issues, which really boil down to documentation.

One of the amusing arguments that people bring up often nowadays: they’ve learned to appreciate IDEs — something that comes from the Smalltalk world, essentially, and the early Lisp world, and so forth — and now they tell you “our IDEs can do such a wonderful job refactoring, and without type information you probably can’t do as well”. And they sort of neglect the fact that where did refactoring come from? It came from Smalltalk. And the Java world, they didn’t invent this, because one of the things these type systems do is it makes it hard to invent things. It makes it hard to do meta-programming, it makes it hard to do things that the language designer didn’t anticipate as typical use-cases. Because then your type-checker says “no, I can’t prove that that’s OK” and it’s very hard to be creative. The wonderful thing about dynamic typing is it lets you express anything that is computable. And type systems don’t — type systems are typically decidable, and they restrict you to a subset. People who favour static type systems say “it’s fine, it’s good enough; all the interesting programs you want to write will work as types”. But that’s ridiculous — once you have a type system, you don’t even know what interesting programs are there.

Categories: Programming Tags: ,

Static vs. dynamic typing: an introduction

February 24th, 2010 3 comments
wood type alphabet on Flickr

A different kind of type ...

This is the first of a series of posts on the differences between static and dynamic typing in programming languages. This post is intended as an introduction to the topic.

What is a type?

The following definition comes from artima.com:

My definition is that a type is metadata about a chunk of memory that classifies the kind of data stored there. This classification usually implicitly specifies what kinds of operations may be performed on the data.

Types are a way of classifying data in computer programs. Getting down to the nitty gritty, the computer’s processor doesn’t care whether a given variable is a number or a string; it’s just a bunch of bits. In fact, in C, the type of a single character and the type of an integer using one byte are the same: char. It is interesting to note that C does not even have a type for strings; instead it uses a sequence of chars arranged sequentially in memory. The only way the computer knows that the string has finished is by ending it with a null byte, written as \0.

C++ is a newer language, and so it has a type for strings, called string. This gives you certain benefits. In C, you need to allocate enough memory to hold a given string; otherwise errors and security vulnerabilities may occur. The C++ string class abstracts this away, meaning that these security holes can be a thing of the past.

Types are a way of telling the computer the purpose of your data. Multiplying two floats together works differently to multiplying two ints – the processor must know whether the values it is multiplying are floats or ints in order to correctly calculate the result. They are also a way of telling other programmers (and yourself in a few days) the purpose of your code. For example, using a date type instead of using an int to hold a datestamp makes it very obvious that the value being manipulated represents a date, and not just an arbitrary number. Of course, while the computer couldn’t care less whether you are incrementing an integer by one or going forward by a day, to a human being this is a crucial distinction.

What is static typing?

Static typing is used by programming languages as diverse as C, Java, F# and Pascal. In a statically typed language, if x is declared as an int, x will always be an int (in the current scope at least). In other words, types are associated with variables. Older statically typed languages require the programmer to declare the type of every variable, which leads to wastes of typing like the following:

Button myButton = new Button();

Newer statically typed languages use type inference to allow the compiler to ascertain the type of a variable implicitly. Note that while the following is possible in a statically typed language with type inference:

myButton = new Button(); //myButton is inferred to be a Button

this is not:

myThing = new Button(); //myThing is inferred to be a Button
myThing = new Frame();  //myThing is a Button, not a Frame

A statically typed language will check types at compile-time. When the compiler gets to the second line in the snippet above, it will fail, because it will not be able to assign an object of type Frame to a variable of type Button. The compiler will check types wherever they are used, and ensure that no inconsistencies occur in the program.

What is dynamic typing?

Amongst the better known dynamically typed languages are Python, Ruby, JavaScript, Erlang and Smalltalk. In a dynamically typed language, any value can be assigned to any variable. There are still types, but the type is associated with the value rather than the variable. Using Python as an example, we could do the following:

myFile = open('foo.txt', 'r') #myFile is a File object
myFile = myFile.read() #myFile is now a string object
myFile = len(myFile) #myFile is now a number

In a dynamically typed language, type checking occurs at run-time. Before every operation is carried out, the interpreter must check to see if it is possible. Here’s an example in Python:

x = 10
y = 5
z = x * y

When the interpreter gets to the third line, it must first check to see if x and y are compatible with the * operator. If the above example had been in C, both x and y would have had to be declared as ints, or the program would not have compiled. The statement x * y would have been compiled to a single multiply instruction, so at run-time, this multiply instruction is carried out with no type checking.

Summary

So, types are basically just metadata about memory being used by your program. When the program is compiled or interpreted, the compiler or interpreter can maintain strict rules about how types can interact, or it can be a bit more laissez faire. There are good reasons for both approaches, and generally the choice is a trade-off between safety and flexibility.

The only way to get a proper idea of the different approaches and feels is to start coding. Python and Ruby are both great examples of modern dynamically typed languages. Have a look at C++, Java or Haskell for statically typed languages.

My next posts on this topic are going to look more deeply into some of the issues surrounding static and dynamic typing.

Categories: Programming Tags: ,

Static vs. dynamic typing

February 19th, 2010 11 comments

Over the next few weeks, I’m going to be writing a number of posts on the subject of static vs. dynamic typing. I think it’s a fascinating area to explore, and it’s one that affects our lives as programmers every day, yet it seems that a lot of people don’t really understand the distinction.

I won’t pretend to be an expert on the subject – I see this as much a learning experience for me as for the reader.

I’m also not going to pretend to be completely neutral – although I started programming in C++, at the moment I mainly program in Python and JavaScript. I’m aiming to keep as neutral as possible, as I don’t believe religion has any place in software development, and I truly believe that there are advantages to both systems.

Coming up

Categories: Programming Tags: ,