Static vs. dynamic typing: an introduction

July 29, 2010

A different kind of type ...

This is the first of a series of posts on the differences between static and dynamic typing in programming languages. This post is intended as an introduction to the topic.

What is a type?

The following definition comes from artima.com:

My definition is that a type is metadata about a chunk of memory that classifies the kind of data stored there. This classification usually implicitly specifies what kinds of operations may be performed on the data.

Types are a way of classifying data in computer programs. Getting down to the nitty gritty, the computer’s processor doesn’t care whether a given variable is a number or a string; it’s just a bunch of bits. In fact, in C, the type of a single character and the type of an integer using one byte are the same: char. It is interesting to note that C does not even have a type for strings; instead it uses a sequence of chars arranged sequentially in memory. The only way the computer knows that the string has finished is by ending it with a null byte, written as \0.

C++ is a newer language, and so it has a type for strings, called string. This gives you certain benefits. In C, you need to allocate enough memory to hold a given string; otherwise errors and security vulnerabilities may occur. The C++ string class abstracts this away, meaning that these security holes can be a thing of the past.

Types are a way of telling the computer the purpose of your data. Multiplying two floats together works differently to multiplying two ints – the processor must know whether the values it is multiplying are floats or ints in order to correctly calculate the result. They are also a way of telling other programmers (and yourself in a few days) the purpose of your code. For example, using a date type instead of using an int to hold a datestamp makes it very obvious that the value being manipulated represents a date, and not just an arbitrary number. Of course, while the computer couldn’t care less whether you are incrementing an integer by one or going forward by a day, to a human being this is a crucial distinction.

What is static typing?

Static typing is used by programming languages as diverse as C, Java, F# and Pascal. In a statically typed language, if x is declared as an int, x will always be an int (in the current scope at least). In other words, types are associated with variables. Older statically typed languages require the programmer to declare the type of every variable, which leads to wastes of typing like the following:

Button myButton = new Button();

Newer statically typed languages use type inference to allow the compiler to ascertain the type of a variable implicitly. Note that while the following is possible in a statically typed language with type inference:

myButton = new Button(); //myButton is inferred to be a Button

this is not:

myThing = new Button(); //myThing is inferred to be a Button
myThing = new Frame();  //myThing is a Button, not a Frame

A statically typed language will check types at compile-time. When the compiler gets to the second line in the snippet above, it will fail, because it will not be able to assign an object of type Frame to a variable of type Button. The compiler will check types wherever they are used, and ensure that no inconsistencies occur in the program.

What is dynamic typing?

Amongst the better known dynamically typed languages are Python, Ruby, JavaScript, Erlang and Smalltalk. In a dynamically typed language, any value can be assigned to any variable. There are still types, but the type is associated with the value rather than the variable. Using Python as an example, we could do the following:

myFile = open('foo.txt', 'r') #myFile is a File object
myFile = myFile.read() #myFile is now a string object
myFile = len(myFile) #myFile is now a number

In a dynamically typed language, type checking occurs at run-time. Before every operation is carried out, the interpreter must check to see if it is possible. Here’s an example in Python:

x = 10
y = 5
z = x * y

When the interpreter gets to the third line, it must first check to see if x and y are compatible with the * operator. If the above example had been in C, both x and y would have had to be declared as ints, or the program would not have compiled. The statement x * y would have been compiled to a single multiply instruction, so at run-time, this multiply instruction is carried out with no type checking.

Summary

So, types are basically just metadata about memory being used by your program. When the program is compiled or interpreted, the compiler or interpreter can maintain strict rules about how types can interact, or it can be a bit more laissez faire. There are good reasons for both approaches, and generally the choice is a trade-off between safety and flexibility.

The only way to get a proper idea of the different approaches and feels is to start coding. Python and Ruby are both great examples of modern dynamically typed languages. Have a look at C++, Java or Haskell for statically typed languages.

My next posts on this topic are going to look more deeply into some of the issues surrounding static and dynamic typing.