Archive

Archive for February, 2011

A short word on nomenclature

February 26th, 2011 15 comments

These symbols have accepted names (in the context of programming):

  • [ ] These are opening and closing brackets. They are square, so are sometimes called square brackets.
  • { } These are opening and closing braces. You could call them curly braces, but there’s no need.
  • ( ) Theses are parentheses. One is a parenthesis. If those are a bit of a mouth- or keyboardful, call them parens. Don’t call them brackets or I’ll destroy your keyboard.
  • ~ This is a tilde.
  • | This is a pipe.

These symbols have multiple accepted names.

  • # I call this a hash because I’m British. Americans call it a pound sign. Pedants call it an octothorpe.
  • * Asterisk, star, splat. Definitely not an “asterix”.
  • ! Exclamation mark/point, bang.
  • < > Less-than and greater-than, but may also be called angle brackets.
  • . Full-stop, period, dot.
  • ? Question mark, query.
Categories: Programming Tags:

Why you should learn brainfuck (or: learn you a brainfuck for great good!)

February 20th, 2011 18 comments

Before I begin, a little disclaimer: there’s no way to write about brainfuck without offending someone. Some people will be upset about the swearing, others about the censorship. I’ve come up with a partial solution: Decensor this article (decensor plugin available on github) If you’re reading this in a feed reader then sorry for the swearing.

What is brainfuck?

Brainfuck is close to being the simplest programming language possible, with only 8 instructions:

> < + - , . [ ]

These instructions move an internal data pointer, increment and decrement the value at the data pointer, input and output data, and provide simple looping.

As an aside: brainfuck was written with the intent of having a language with the smallest-possible compiler. Many compilers for brainfuck are smaller than 200 bytes! See Wikipedia for more details.

Why would I want to learn such a stupid language?

It’s been suggested that you should learn at least 6 programming languages to be a good programmer. The more programming languages you know, the more perspectives you’ll have on problems. Brainfuck is such a simple language to learn, there’s almost no reason not to learn it. If you read the whole of this blog post now, and try out all the code samples in the interpreter, in half-an-hour to an hour you’ll have a new programming language under your belt. Tell me that’s a waste of time :)

Another good reason to learn brainfuck is to understand how basic a Turing-complete programming language can be. A common argument when programmers compare languages is “well they’re all Turing-complete”, meaning that anything you can do in one language you can do in another. Once you’ve learnt brainfuck, you’ll understand just how difficult it can be to use a Turing-complete language, and how that argument holds no water.

Ok, you’ve convinced me. Teach me!

Good. I knew you were one of the smart ones.

Brainfuck doesn’t have variable names – just one long array of cells, each of which contains a number:

address: 0  1  2  3  4  5  6  ...
value:  [0][0][0][0][0][0][0][...]

Each cell is initialized to zero. A data pointer starts off pointing to the first cell:

pointer: v
address: 0  1  2  3  4  5  6  ...
value:  [0][0][0][0][0][0][0][...]

Moving the data pointer and modifying the data

A brainfuck program is a stream of the 8 commands (other characters are allowed in this stream but are ignored). Here’s a basic brainfuck program:

>>>>++

The greater-than command moves the data pointer one cell to the right. The plus command increases the value of the cell under the data pointer by 1. Here’s what the data looks like after we run the above program:

pointer:             v
address: 0  1  2  3  4  5  6  ...
value:  [0][0][0][0][2][0][0][...]

The data pointer moved four cells to the right, and the value of that cell was increased by 2. Pretty simple.

You can probably guess what the less-than and minus commands do:

program: >>>>++<<++-

pointer:       v
address: 0  1  2  3  4  5  6  ...
value:  [0][0][1][0][2][0][0][...]

So, move four cells right, increment twice, move two cells left, increment twice and decrement once.

Well done, you’ve already learnt half of the syntax of brainfuck.

I/O

A programming language isn’t much use without the ability to input and output data. Brainfuck has two commands for I/O – , (comma) and . (full-stop/period). The comma inputs a character from the input into the current cell, and the period outputs the character in the current cell. The ASCII code of the character is used on input and output. You may not be used to interchanging characters and integers, so it’s worth having a look at this chart to see how they map.

Here’s a simple program that inputs a character, increments it once, then outputs it:

program: ,+.
input:   a

pointer:  v     
address:  0  1  2  3  4  5  6  ...
value:  [98][0][0][0][0][0][0][...]

The program takes a character from the input, in this case the letter ‘a’, and puts it into the first data cell. It then increments it, and outputs the value of the cell. A lowercase ‘a’ has the ASCII code 97, and ‘b’ has the ASCII code 98.

Try it out. Go to my brainfuck interpreter, put the string ,+. into the “Code” box and the letter ‘a’ into the “Input” box (or use this link), then press “Run”. Try it with different input, then try using more pluses or minuses.

Every time you use the comma command you remove a character from the input stream:

program: ,>,>,.<.<.
input:   abc

pointer:  v     
address:  0   1   2  3  4  5  6  ...
value:  [97][98][99][0][0][0][0][...]

This program puts the first character of input in the first cell, the second character in the second cell and the third character in the third cell (although it looks like an evil dragon smiley). Try it out to see what happens.

Looping

Ok, that’s three-quarters of the language covered now. All that’s left is looping. This is a bit trickier, but fairly simple once you’ve got the hang of it.

[ and ] (left and right brackets) are used for looping. Anything in between pairs of brackets will loop (you can have nested loops – the pairs match like parentheses (i.e. this is the equivalent of an inner loop) and this is the outer loop (did you see what I did there? (inner inner!))). Of course there’s no point looping forever, so brainfuck needs a way of knowing when to stop. It does this by checking the value of the current data cell to see if it is zero. If it is, execution will skip to after the end of the loop.

To illustrate this I’m going to need to show the position of the current instruction. This is called the instruction pointer, or i_ptr in the figure below.

To start with, the ‘z’ is read into the first cell:

i_ptr:   v
program: ,[.-]
input:   z

pointer:  v     
address:  0   1  2  3  4  5  6  ...
value:  [122][0][0][0][0][0][0][...]

The instruction pointer moves to the next instruction in the program. It checks to see if the value in the current data cell is zero. It isn’t, so the loop is entered.

i_ptr:    v
program: ,[.-]
input:   z

pointer:  v     
address:  0   1  2  3  4  5  6  ...
value:  [122][0][0][0][0][0][0][...]

The value in the current data cell is output (‘z’), then decremented:

i_ptr:      v
program: ,[.-]
input:   z

pointer:  v     
address:  0   1  2  3  4  5  6  ...
value:  [121][0][0][0][0][0][0][...]

The instruction pointer reaches the end of the loop and jumps back to the beginning:

i_ptr:    v
program: ,[.-]
input:   z

pointer:  v     
address:  0   1  2  3  4  5  6  ...
value:  [121][0][0][0][0][0][0][...]

The current cell is still not zero, so the loop is entered again, the new value is output (‘y’), and decremented again. This keeps on going until the value of the first data cell is zero. At this point, the instruction pointer jumps to after the end of the loop, which also happens to be the end of the program:

i_ptr:        v
program: ,[.-]
input:   z

pointer: v     
address: 0  1  2  3  4  5  6  ...
value:  [0][0][0][0][0][0][0][...]

Try it out to see what happens.

Refresher

And that’s the entire language. Here’s a quick recap:

  • > Move the data pointer one cell to the right
  • < Move the data pointer one cell to the left
  • + Increment the value of the cell at the data pointer
  • - Decrement the value of the cell at the data pointer
  • , Take a character from the input and place its value into the current data cell
  • . Output the value of the current data cell as a character
  • [ If the current data cell is zero, skip to after the closing bracket, otherwise continue
  • ] Skip back to the matching opening bracket (a common optimization is to skip over this instruction if the current cell is zero, rather than going back to the opening bracket and checking)

The end

If you’ve read the whole blog post and tried out all the code, you now know brainfuck. Well done, that’s another item for your CV! If you try making anything relatively complex you’ll realise the true value of the abstractions we build software on. All languages have different levels of abstraction, and you should be working at the highest level you can afford. It’s worth dropping down to the level of brainfuck occasionally so you can really appreciate the abstractions you’re given.

Further reading

Categories: Programming Tags: ,

Zen and the art of statefulness

February 12th, 2011 18 comments

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said “Master, I have heard that objects are a very good thing – is this true?” Qc Na looked pityingly at his student and replied, “Foolish pupil – objects are merely a poor man’s closures.”

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire “Lambda: The Ultimate…” series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying “Master, I have diligently studied the matter, and now understand that objects are truly a poor man’s closures.” Qc Na responded by hitting Anton with his stick, saying “When will you learn? Closures are a poor man’s object.” At that moment, Anton became enlightened.

Anton van Straaten

When I first read the above koan some time ago, I didn’t really understand it. I had a very basic idea of closures, but at the time they were just a syntactic oddity to me – something you could do a few cool things with, but not particularly useful. Since then I’ve worked through quite a bit of Structure and Interpretation of Computer Programs and delved into functional programming in JavaScript, which has given me a much deeper understanding of closures. On the other side of the divide, I’ve been doing a lot of Ruby programming, which has helped me grok objects a lot better. I now feel like I can begin to comprehend what the quote was getting at.

My moment of enlightenment came today while reading Test-Driven JavaScript Development and looking at the code for a JavaScript strftime function (abbreviated here for brevity):

Date.prototype.strftime = (function () {
  function strftime(format) {
    var date = this;

    return (format + "").replace(/%([a-zA-Z])/g, function (m, f) {
      //format date based on Date.formats
    });
  }

  // Internal helper
  function zeroPad(num) {
    return (+num < 10 ? "0" : "") + num;
  }

  Date.formats = {
    // Formatting methods
    d: function (date) {
      return zeroPad(date.getDate());
    },
    //...
    //various other format methods
    //...

    // Format shorthands
    F: "%Y-%m-%d",
    D: "%m/%d/%y"
  };

  return strftime;
}());

The above code uses an IIFE (Immediately-invoked function expression) to produce a function with additional data (if Date.formats was instead declared as a local variable, this would be a better example). If it doesn’t make sense, I thoroughly recommend Ben Alman’s post on IFFEs for an overview of the technique. The code executes, and returns a function. The important thing is that one function is used to define another function and its context.

In Ruby, when you define a class, the class definition is executed as Ruby code, unlike in Java, for example, where a class definition is just a syntactic construct read at compile-time, but not executed in the way other Java code is. A Ruby class definition is read at run-time, and builds up a new class as it is interpreted.

In a lot of ways, Ruby class definitions and JavaScript function-defining functions are equivalent. I’ll give you a little example to illustrate:

zen.js

var Cat = function () {
  var age = 1;

  function catYears() {
    return age * 7;
  }

  function birthday() {
    age++;
  }

  return {
    catYears: catYears,
    birthday: birthday
  }
};

var powpow = Cat();
var shorty = Cat();

//yo shorty, it's your birthday:
shorty.birthday();

alert(powpow.catYears()); // => 7
alert(shorty.catYears()); // => 14

zen.rb

class Cat
  def initialize
    @age = 1
  end

  def cat_years
    @age * 7
  end

  def birthday
    @age += 1
  end
end

powpow = Cat.new
shorty = Cat.new

#yo shorty, it's your birthday:
shorty.birthday

puts powpow.cat_years # => 7
puts shorty.cat_years # => 14

Strictly speaking, in zen.js I’m returning an object, but the private data and methods of that object are saved in a closure. So, zen.js stores its state in a closure, while zen.rb stores its state in an object. Every time Cat() is called in zen.js, a new closure is created with a unique age. Every time Cat.new is called in zen.rb, a new object is created with a unique @age. (These two examples aren’t strictly equivalent – each cat in zen.js gets a new copy of the functions, whereas in zen.rb they share the same methods. It’s possible to make the JavaScript version function more like a Ruby class definition, but it takes a bit more code.)

Of course, there’s a lot you can do with JavaScript closures that you can’t do with Ruby objects. And there’s a lot you can do with Ruby objects that can’t be emulated using JavaScript closures. Any time you decide that one is better than the other, just imagine Qc Na hitting you with his stick.

Further reading

Really really simple Ruby metaprogramming

February 5th, 2011 11 comments

Metaprogramming in Ruby has the reputation of being something only the true zen Ruby masters can even hope to understand. They say the only true way to learn metaprogramming is to train with Dave Thomas in his mountain retreat for five years – the Ruby equivalent of this XKCD:

But it’s not that bad. In fact, I’m going to go so far to say that it’s possible to learn to metaprogram in Ruby without even meeting Dave Thomas. Yeah, I know, it doesn’t sound possible. But just you wait…

What is metaprogramming?

Metaprogramming is what makes Ruby awesome. It’s writing code to write code. It’s dynamic code generation. It’s the move from imperative to declarative. It’s a rejection of Java’s endless repetition of boilerplate code. It’s the living embodiment of DRY. Here’s an example:

class Monkey
  def name
    @name
  end

  def name= n
    @name = n
  end
end

I can see you there, in the middle of the classroom, with one arm straining up, the other one holding it up because it’s so damn hard to hold up. OK, Jimmy, what is it? Oh, you don’t need to write all that code in Ruby? You can just use attr_accessor?

Jimmy’s right. That code snippet above could be written like so:

class Monkey
  attr_accessor :name
end

So, attr_accessor‘s magic right? Well, actually, it’s not. Just like in Scooby-Doo where what looked like magic to start off with turned out to be an elaborate hoax, attr_accessor is just a class method of Module. It defines the name and name= methods on Monkey, as if you’d manually defined them.

And that’s all metaprogramming is. Just a way of adding arbitrary code to your application without having to write (or copy-paste) that code.

An (extended) example

The source code for this example is available in my StringifyStuff plugin on github, which in turn was adapted from Ryan Bates’ Railscast Making a Plugin. Incidentally, if you feel you can improve the plugin, fork me on github!

I’m assuming basic knowledge of Ruby, and some familiarity with Rails would be helpful as well, but not essential.

Have a quick peek at the github repo now – the important files to look at for now are init.rb and lib/stringify_time.

First, stringify_time. This file defines a module called StringifyTime, which defines a method called stringify_time:

module StringifyTime
  def stringify_time *names
    #crazy stuff going on in here
  end
end

The other two files, stringify_stuff and stringify_money are similar.

Now, init.rb:

class ActiveRecord::Base
  extend StringifyStuff
  extend StringifyTime
  extend StringifyMoney
end

This extends ActiveRecord::Base with the three modules listed. This now means that any class that inherits from ActiveRecord::Base (e.g. your models) now has the methods defined in each of those modules as class methods.

The StringifyStuff plugin is used like so:

class Project < ActiveRecord::Base
  stringify_stuff
  stringify_time :start_date, :end_date
end

stringify_time is passed a list of symbols representing the relevant model attributes. It will provide useful accessors for those attributes. Let’s have a look at a simplified version of stringify_time:

module StringifyTime
  def stringify_time *names
    names.each do |name|

      define_method "#{name}_string" do
        read_attribute(name) &&
          read_attribute(name).strftime("%e %b %Y").strip()
      end

    end
  end
end

stringify_time is passed a list of symbols. It iterates over these symbols, and for each one calls define_method. define_method is the first clever Ruby metaprogramming trick we’re going to look at. It takes a method name and a block representing the method arguments and body, and magically adds an instance method with that name, arguments and body to the class in which it was called. In this case, it was called from Project, so this will gain a new method with the name "#{name}_string". In this example, we passed in :start_date and :end_date, so two methods will be added to Project: start_date_string and end_date_string.

So far so good. Now though, it gets a little bit hairy, so hold onto your horses (or equivalent equid).

In a normal model instance method, you’d access an attribute using a method of the same name. So if you wanted to get the start_date converted to a string, you’d write:

def my_method
  start_date.to_s
end

The problem with doing this in define_method is that we don’t have start_date as an identifier – it’s held as a symbol. There are two ways to accomplish the above if start_date was passed in as a symbol:

def my_method attr_sym
  read_attribute(attr_sym).to_s #This is a Rails method
end

or:

def my_method attr_sym
  send(attr_sym).to_s #Ruby built-in method
end

For simplicity, I’m using write_attribute, but send is useful to know about too.

So, back to define_method:

      define_method "#{name}_string" do
        read_attribute(name) &&
          read_attribute(name).strftime("%e %b %Y").strip()
      end

So, each time round the loop, name will be one of the symbols passed into stringify_time. Let’s go with start_date to see what happens. define_method will define a new instance method on the Project class called start_date_string. This method will check to see if there is a non-nil attribute called start_date, and if so, call strftime on it. This formatted date will be the return value of the method.

Wrap up

That’s more than enough for one day (I’ve been told off for writing kilo-word posts before). I’ll explain the workings of the rest of the plugin in a future post. If you want to learn more, I highly recommend David Flanagan’s The Ruby Programming Language.

Metaprogramming is writing code that gives you more code. It’s a bit like the Sorcerer’s Apprentice; in fact, I think that should be recommended watching for this subject.

Be the Sorcerer. Don’t be Mickey.

Categories: Programming Tags: , ,