Archive

Author Archive

Ruby vs JavaScript: functions, Procs, blocks and lambdas

January 15th, 2011 103 comments

In my post Why JavaScript is AWESOME I compared Ruby to JavaScript like this:

A language like Ruby is a toolbox with some really neat little tools that do their job really nicely. JavaScript is a leather sheath with a really really sharp knife inside.

What I was partly getting at was how the two languages handle the passing around of code. Both have their own way of working with anonymous functions. They’ve taken very different approaches, so if you’re moving from one language to the other it can be confusing. I’d like to try to explain the differences. Let’s look at JavaScript first.

JavaScript functions

JavaScript has two ways of defining functions – function expressions (FE) and function declarations (FD). It can be a confusing distinction because they look the same. The difference is that a function declaration always has the word function as the first thing on the line.

//Function declaration:
function doSomething() {
    alert("Look ma, I did something!");
}

//Function expression:
var somethingElse = function () {
    alert("This is a different function");
};

The FE can optionally contain a name after the function keyword so it can call itself. I’m not going to go into depth on FD vs FE, but if you’re interested in learning more read Ben Alman’s piece on Immediately-Invoked Function Expressions, which has a good description and some useful links.

You can think of an FE as a function literal, just like {a: 'cat', b: 'dog'} is an object literal. So they don’t just have to be assigned to variables – they can be passed as function arguments, returned from functions, and stored in data structures (objects and arrays). In the end though, there’s only one function type – doSomething and somethingElse are the same type of object.

Ruby

Ruby is at the opposite end of the scale to JavaScript. Instead of having just the one function type, it has multiple types: blocks, Procs, and lambdas. (There are also methods and method objects but that’s a different story.)

Blocks

A block in Ruby is a chunk of code. It can take arguments via its funky pipe syntax, and its return value is whatever the last line evaluates to. Blocks aren’t first-class citizens in Ruby like functions are in JavaScript – they can only exist in one place – as the last argument to a method call. They’re very much a special syntactic construct baked right into the language. All methods can received blocks whether they include a parameter for one or not – it will be received anyway, and you can interact with it without it having a name. It can be called with the yield keyword, and you can check to see if a block was supplied with block_given?. Robert Sosinski gives a good example of using blocks from the point of view of the caller and that of the receiver in his post Understanding Ruby Blocks, Procs and Lambdas:

class Array
  def iterate!
    self.each_with_index do |n, i|
      self[i] = yield(n)
    end
  end
end

array = [1, 2, 3, 4]

array.iterate! do |n|
  n ** 2
end

Procs

If you need a block of code to act as a first-class citizen, you need to turn it into a Proc. That can be achieved with Proc.new. This takes a block and returns a Proc. Pretty cool eh? Procs are useful where you’d want to use a block but you’re not passing it as the last argument to a method. Rails provides a good example:

class Order < ActiveRecord::Base
  before_save :normalize_card_number,
    :if => Proc.new { |order| order.paid_with_card? }
end

The :if => syntax shows that we’re passing a hash to before_save. We can’t put a block as a hash value, so we need to make a Proc instead. Other callbacks can just take a raw block:

class User < ActiveRecord::Base
  validates_presence_of :login, :email

  before_create do 
    |user| user.name = user.login.capitalize if user.name.blank?
  end
end

For more discussion on Procs and blocks have a look at Eli Bendersky’s post: Understanding Ruby blocks, Procs and methods.

Lambdas

A lambda in Ruby is probably the closest thing to a function expression in JavaScript. A lambda is created similarly to Proc.new:

lambda { |x| x ** 3 }

Both lambda and Proc.new take a block and return an object. However, there are important differences in how they deal with their arguments and how they deal with return and break. To talk about that though I’ll need to go back to basics to discuss the reasons for using these self-contained chunks of code.

The reasons for using anonymous functions

There are a lot of good reasons to support anonymous functions in a programming language. In Ruby and JavaScript, the uses generally fall under two rough categories: iteration and deferred execution.

Iteration with anonymous functions

Iteration has a lot of uses – you can sort a collection, map one collection to another, reduce a collection down into a single value, etc. It basically comes down to going through a collection and doing something with each item, somehow. I’m going to show a very basic example of iteration – going through a list of numbers and printing each one.

In Ruby, there are two common ways to work through an array or other enumerable object, doing something to each item. The first is to use the for/in loop, which works like this:

arr = [1, 2, 3, 4]
for item in arr
  puts item
end

The second is to use the each iterator and a block:

arr = [1, 2, 3, 4]
arr.each do |item|
  puts item
end

Using iterators and blocks is much more common, so you’ll usually see it done like this.

JavaScript has a for/in loop as well, but it doesn’t really work very well on arrays. The standard way to iterate over an array is with the humble for loop:

var arr = [1, 2, 3, 4];
for (var i = 0, len = arr.length; i < len; i++) {
    console.log(arr[i]);
}

Many libraries offer support for something like Ruby’s each iterator – here is the jQuery version:

var arr = [1, 2, 3, 4];
$.each(arr, function(index, value) { 
    console.log(value); 
});

There’s an important difference here between the way Ruby and JavaScript handle the each iterator. Blocks in Ruby are specially designed for this kind of application. That means that they’ve been designed to work more like a looping language construct than an anonymous function. What does this mean practically? I’ll illustrate with an example:

arr = [1, 2, 3, 4]
for item in arr
  break if item > 2 #for non Rubyists, this is just like a compact if statement
  puts item
end

arr.each do |item|
  break if item > 2
  puts item
end

These two code snippets do the same thing – if item is greater than 2 iteration stops. On one level, this makes perfect sense – the each iterator takes a block of code, and if you want to stop iterating you break, as you would in any other language’s native foreach loop. On another level though, this doesn’t make any sense – break is used for loops, not anonymous functions. If you want to leave a function, you use return, not break.

Interestingly, if you use return inside a Ruby block, you don’t just return from the block, but the containing method. Here’s an example:

def find_first_positive_number(arr)
  arr.each do |x|
    if x >= 0
      return x
    end
  end
end

numbers = [-4, -2, 3, 7]
first = find_first_positive_number(numbers)

When arr.each gets to 3, find_first_positive_number will return 3. This demonstrates that in blocks in Ruby, the break and return keywords act as if the block was just a looping construct.

jQuery.each, on the other hand, is working with normal functions (there’s nothing else to work with in JavaScript). If you return inside the function passed to each, you’ll just return from that function, and each will move onto the next value. It’s therefore the equivalent of continue in a loop. To break out of the each entirely, you must return false.

So, in Ruby, break, continue and return work in the same way whether you’re using the for/in looping construct or using an iterator with a block. With a jQuery iterator, return is the equivalent of continue, return false is the equivalent of break, and there’s no equivalent of return without putting extra logic outside the loop. It’s easy to see why Ruby blocks were made to work this way.

Deferred execution

This is where a function is passed to another function for later use (this is often called a callback). For example, in jQuery.getJSON:

$.getJSON('ajax/test.json', function(data) {
    console.log(data);
});

The anonymous function isn’t executed until the JSON data comes back from the server. Similarly, with Rails:

class User < ActiveRecord::Base
  validates_presence_of :login, :email

  before_create do |user|
    user.name = user.login.capitalize
  end
end

The block won’t be executed immediately – it’ll be saved away, ready to be executed as soon as a new User is created. In this case, break no longer makes sense – there’s no loop to break out of. return generally also doesn’t make sense in this case, as it may try to return to a function that has already returned. For more details on this see my question on Stack Overflow. This is where lambdas come to the rescue. A lambda in Ruby works much more like a Ruby method or a JavaScript function: return just leaves the lambda; it doesn’t try to exit the containing method.

So why use a block as a callback? Probably the best reason is that Ruby makes blocks so easy, so it’s simpler just to pass a block to the before_create method than to create a lambda and pass that (the before_create internals will be simpler as well).

Pros and cons

As I said at the beginning, Ruby and JavaScript have taken two extremely different approaches to functions. There’s a whole different discussion about methods across the two languages as well (JavaScript uses the same function type for methods, whereas Ruby has yet another type) but I’m not going to go there. Ruby has something different for every situation, and each one is optimised for its primary use-case. This can lead to confusion but can also make code easier to read. On the other hand, in JavaScript, a function is a function is a function. Once you know how they work, it’s easy.

There’s no point arguing about which of these two approaches is better – there’s no answer to that. Some people will prefer one over the other, but I like them both. I love the way blocks in Ruby make certain kinds of functional-like programming just roll off the fingers, but I also love the way JavaScript doesn’t complicate things, and makes proper functional programming much more possible.

If you made it this far, good on ya. I didn’t mean to write such a mammoth post, but sometimes these things happen. Cheers!

2011

January 5th, 2011 37 comments

Ok, I’m going to write one of those self-indulgent “what am I going to do with the next year?” posts. If you don’t like the sound of that then I won’t be offended.

I’m working full-time for a publishing project management company based in Stroud, which is where I’ve been since I graduated in 2007. It was never my plan to stay there forever, but I didn’t know what I wanted to do instead. Then I discovered programming, and found that not only did I really enjoy it, but I was actually quite good at it. I agreed with my employer that I would work until the end of 2011, and from that point on I would become a professional software developer/web developer/programmer/whatever.

So, here I am at the beginning of 2011, and I’ve got exactly a year to establish myself as someone who knows what they’re doing and people will pay to code stuff. I don’t currently know whether I want to be freelance or employed – I’ll see how it goes. There are certain things I want to achieve this year, and I’m hoping putting them in writing will help focus my mind.

Finish SICP

I’ve been trying to read Structure and Interpretation of Computer Programs for over a year, but other things have always come up to get in the way. This year I’d really like to finish it. I don’t have a computer science background, and I don’t want to be at a disadvantage to those who do, so this is my first step towards that goal. It won’t make me a computer scientist, but it will help me to understand things at a deeper level.

Get a decent online portfolio

I’d like to create a solid portfolio to show to potential clients/employers. That includes websites/apps/games, screenshots and downloads of desktop applications, and a lot of decent (and forked) Github projects.

Keep blogging

I started this blog a year ago. My first post with any significant views was in August, and since then I’ve had a fairly steady trickle of visitors with the odd spike when Reddit likes my posts. I want to get into a regular posting schedule, so that’s another aim. I’d really like to get to the stage where I have some posts saved up ready for future posting, but we’ll see.

Get some more freelance work

I was really lucky to pick up a bit of freelance work through Twitter towards the end of 2010 which slots well into my free time, pays well, and working on what could turn out to be a fairly significant project (the NDA means I can’t go further :S). I’d like some more of that for a range of clients so I have some security for 2012. As my spare time is the only time I have for learning new things and doing freelance work, I need to find a sensible balance between the two.

Do something good in node.js

I’m very excited by node.js. I haven’t done anything with it yet, but I really enjoy working in JavaScript, and I’ve heard a lot of good things about node. I’d like to make some node apps – hopefully enough to get to a position where people will actually pay me to do it.

Learn a functional language

At the moment I’m thinking Haskell, but I’d like to give Erlang a go as well. I want to do both, and more, but I’m very aware that a year is actually quite a short time, and I need to focus. So I’ll pick one and learn it, and see where that takes me.

Stack Overflow

I think a good Stack Overflow reputation is a useful string to the bow. At one point I wanted to be on the front page of Stack Overflow users, but the amount of rep needed per day just to stand still at that level is ridiculous. I’m currently on the 16th page, and I think I’d like to get to somewhere in the top 10 pages. I think any higher than that just takes too much time to maintain (and I’ve got other things I need to spend my time on – see above).

Nick Morgan, not just Skilldrick

I’ve been known online as Skilldrick for over a decade. For a long time I didn’t want to make it easy for people to associate my online identity with my real-life identity – the web was a lot more anonymous back then. Now I’d like potential employers to be able to find me online by my name, not my handle. Unfortunately there isn’t any mention of me on Google for pages and pages. So I’d like to get my real name out there a bit more.

Categories: Miscellaneous Tags:

Clearing up the confusion around JavaScript references

December 27th, 2010 8 comments

One question that seems to come up everywhere in discussions of JavaScript is:

Is JavaScript pass-by-reference or pass-by-value?

This is often asked by people who don’t really understand what pass-by-reference or pass-by-value really mean. One of the stock answers is:

Objects are passed by reference; primitives are passed by value.

I really don’t think this is the best way to describe the situation, so I’m going to try a new way.

What is a reference? (with cats)

I’m going to explain references with cats. Don’t worry, it’ll make purrfect sense.

Let’s say you have two cats, Bloopy and Floopy. Bloopy is a boy cat and Floopy is a girl cat. One day you announce to the world:

My favourite cat is Bloopy.

Let’s say you live your life as if you were inside a JavaScript interpreter. So you declare a new variable to hold your cat:

var fav_cat = Bloopy;

Imagine that this concept of “favourite” is like an invisible wire that links you to Bloopy. As long as Bloopy is your favourite cat you’ll be linked.

You can use this link to send messages to Bloopy. Say you bought Bloopy a new collar for Christmas. If you wanted him to wear it, you could say:

fav_cat.wear_collar();

You’re sending a message down your invisible wire telling Bloopy to wear his new collar. Because you’re living inside a JavaScript interpreter that works out fine.

One day, Bloopy makes a mess on your carpet. This upsets you, to the point that you decide that Floopy is now your favourite cat:

fav_cat = Floopy;

You’ve now re-assigned your favouritism. In terms of the invisible wire, you’ve detached it from Bloopy and attached it to Floopy. You can now ask Floopy to try on the new slippers you got her for her birthday:

fav_cat.wear_slippers();

One day, Floopy does something unmentionable. You now have no favourite cat:

fav_cat = null;

Mutability

So, what does that story tell us? Well, fav_cat is a JavaScript variable. At the beginning, it holds a reference to Bloopy. Later on, it holds a reference to Floopy. Don’t worry, nothing happened to Bloopy, but there wasn’t a reference to him stored in fav_cat any more. At two points in the story, we called methods on fav_cat. These methods mutated the underlying cat, in the first case to make it wear a collar, and in the second to make it wear slippers.

So, there’s two completely different things happening here; mutation and assignment. Mutation modifies the underlying object without affecting the link between the variable and the object, whereas assignment changes the link to a different object, but doesn’t modify any objects.

Immutability

Let’s continue the story. One day, you look up at the blue sky, and it inspires you to declare to the world that your favourite colour is blue:

var fav_col = "blue";

The next day you look at the grass, and declare that, in fact, this is the most beautiful colour:

fav_col = "green";

You can still think of this as there being an invisible wire between you and your favourite colour, but as colours (and JavaScript strings) are immutable, you can’t modify them using the wire – all you can do is re-assign your favourite colour to a new string.

So what does pass-by-reference mean?

In JavaScript, you never hold an object in a variable, you hold a reference to that object. It’s that invisible wire. You can send messages along the wire and you can tell the wire to attach to a different object, but that’s all. Let’s now look at passing a variable to a function:

function mutate(obj) {
    obj.name = 'Mutated';
}

var my_cat = { name: 'Floopy' };
mutate(my_cat);
alert(my_cat.name);

So, what’s happening here? When my_cat is passed into mutate, the parameter obj is given a reference to the same object that my_cat contains a reference to. So now my_cat and obj both have an invisible wire that links them to the object named ‘Floopy’. Inside the function, the name is changed to ‘Mutated’. The object that my_cat contained a reference to has been mutated by the function, so the last line will alert ‘Mutated’.

So, if that’s what happens with mutation, what happens with assignment? Here’s an example:

function reassign(obj) {
    obj = {};
}

var my_cat = { name: 'Floopy' };
reassign(my_cat);
alert(my_cat.name);

Ok, what’s going on here? When my_cat is passed into reassign, both obj and my_cat hold an invisible wire connecting them to the object named ‘Floopy’. Inside reassign, obj is reassigned to an empty object. So reassign has severed the invisible wire connecting obj to Floopy, and reattached the wire to a new, empty object. reassign doesn’t have any control over the link between my_cat and its object, so my_cat.name remains ‘Floopy’.

But what about pass-by-value?

Here’s what I say: forget what anyone ever told you about pass-by-value.

Another example:

function changeColour(col) {
    col = 'green';
}

var fav_col = 'blue';
changeColour(fav_col);
alert(fav_col);

I don’t think you’re going to be surprised that fav_col remains ‘blue’. That’s because chageColour reassigned – it didn’t mutate. And why is this called pass-by-value when the object example was called pass-by-reference? Because strings are immutable. You can’t mutate a string inside a function so there’s no way to change fav_col from inside changeColour.

And finally, the conclusion (with no cats)

The point is, it doesn’t matter whether the JavaScript interpreter holds a reference to the string ‘blue’, and passes a reference into changeColour or whether it passes the actual string into the function – either way the string can’t be mutated. The only way to change fav_col is to reassign fav_col.

There’s no point making a distinction between objects and primitive types when you’re passing into a function. The important distinction to make is between mutable and immutable types. A function can mutate a mutable object passed in (making changes to an object the caller passed in) but it can’t mutate an immutable object. A function can reassign a mutable or an immutable object, but the caller won’t see these changes, because all reassignment does is move the invisible wire attached to the parameter.

Further reading

Ecma-262-5 in detail: Name binding Dmitry Soshnikov talks about the same issues here but in a lot more detail. He uses the more strictly correct term “rebinding” where I have said reassignment, as the variable name is “bound” to an object. Definitely worth reading if you want a more in-depth discussion.

ECMA-262-3 in detail: Evaluation strategy A discussion of all the different types of evaluation strategy, including call by value, call by reference, and call by sharing. JavaScript is noted to be call by sharing, or “call by value where value is the reference copy”.

Call by sharing on Wikipedia Wikipedia notes that the Python community is the only one to have widespread use of this term, even though the same semantics are shared by Java, Ruby, Scheme, JavaScript etc.

Categories: Programming Tags: ,

JS1k: Making a very small game in JavaScript part 2 – optimisations

December 2nd, 2010 4 comments

This is part 2 of a series. See part 1 here.

Two types of optimisation

There are three types of optimisation when you’re trying to reduce the size of the code. The first two are “defactoring” techniques: they change the code but not the functionality. The third changes code and functionality.

1. Optimising for the compiler

One of the major optimisations the Closure compiler carries out is the renaming of identifiers. Closure will, wherever possible, rename your variables and functions with one letter names. This means you can keep using the nice long descriptive names and know that in the final file they won’t take up any more space than if you called everything x and y. So, how can you help with this?

One tiny optimisation I discovered was that I was using the literal 50 in a number of places. I defined a new variable fifty in the global scope and assigned it the value 50. Replacing all appearances of 50 with fifty increased the size of my un-minified code, but when minified, fifty became a single letter identifier and saved me a byte each time. Of course, there’s some extra code overhead to allow this, so there have to be enough occurrences for it to be worth it. Even a single byte’s worth it though.

I discovered I was using a.lineTo(x,y); a lot, where a is the canvas context. Closure can’t rename a.lineTo, so I needed to give it something it could rename:

function lineTo(x,y) {
  a.lineTo(x,y);
}

Again, there’s some overhead involved in setting this up, but with enough use of a.lineTo it’s well worth it.

I had to work round a bug with the compiler here. When it gets to the definition of lineTo it makes the assumption that inlining it will save space. So every occurrence of lineTo(x,y) in the code is replace by a.lineTo(x,y), and it removes the lineTo definition. To stop it doing this, I had to add the following line:

window['lineTo'] = lineTo;

This line forces the compiler to keep the lineTo definition, and when compiled produces window.x=x;. This is good and bad news – lineTo is now kept (in minified form) but there’s a useless line added. The way I got round that was to add a post-compilation step to my Makefile:

    sed -i 's/window\.[a-zA-Z]*=.;//g' kave-min.js

(Yes, I learned sed for this. That’s dedication.)

2. Optimising the code

Some code optimisations just make the code size smaller, full stop. One is true and false. In a lot of cases, 1 and 0 will do just fine, at a quarter of the size.

Code organisation matters too. You may be used to writing nice modular code with no global variables and loose coupling between each module. All those good intentions need to be completely suppressed. Make everything a global unless it absolutely has to be local. Everything should know about everything else. Couple wherever possible.

An example of coupling comes in the rendering stage of the game loop. The context fillstyle is set to white. After this, the snowball is rendered, followed by the snow. Then the fillstyle is set to blue, and the icicles and walls are rendered. The order of these rendering functions is tightly coupled in terms of order of execution. Trying to render the icicles before the snow would make the icicles white, unless we changed the fillstyle to blue then back to white for the snow.

3. Changing the behaviour

I said that the first two techniques “defactored” the code, leaving the behaviour unchanged. Sometimes the current behaviour is essentially complex, and needs to be simplified to reduce its code size. The simplest solution is to just leave out functionality, but sometimes that’s a compromise too far. Following are some less drastic options.

Here’s a really simple example. I picked a nice blue for the icicles with the following line:

a.fillStyle = "rgb(190,230,255)";

Unfortunately, the syntax for writing arbitrary rgb colours is relatively verbose. Much shorter are the colour names, like “yellow” or “blue”. “blue” was the wrong colour, but using Doug Crockford’s nice CSS colour chart I was able to find “lightblue”, which was pretty close to the original colour, but saved me 7 valuable bytes over “rgb(190,230,255)”.

The snow falls diagonally, but with a little bit of added randomness. I wanted it to follow a sinusoidal curve as it fell, but the additional code to enable this was too long, so I cut it. The snow’s not as natural as I’d like, but I had to compromise.

The collision detection algorithm’s pretty simple. I used the context.isPointInPath(x,y) method, passing the front middle point of the snowball. I could then call the detectCollision() function every time I drew a new path that I wanted to check (i.e. the walls and the icicles). It would be nice to check the top, front and bottom points of the snowball, but that would have been too much code.

Don’t try this at home

Coding is usually about weighing things up: readability/maintainability, execution speed, memory use, code size, etc. When you’re doing a JS1k submission, the only one of these that matters is code size (well, execution speed can matter as well, but only as a secondary concern). The important thing to remember though, is that this kind of optimisation is wrong, just plain wrong. It’s almost never a good idea to sacrifice maintainability for code size. If it were, we’d all be writing Perl. I’ll leave you with that thought.

Categories: Programming Tags: ,

JS1k: Making a very small game in JavaScript part 1 – tools

December 2nd, 2010 30 comments

Well, that was fun. I’ve just submitted my JS1k Xmas edition demo. I only heard late on when the first JS1K happened, and I wasn’t up to coding anything decent at that stage anyway. So when I heard about this new one, I jumped on the task, working all weekend on it (except when I had to help move a piano). By the end of Sunday it was done, and I was mostly happy with it.

Before I started, I wanted to see if there were any good resources out there for this kind of thing, so I asked on Stack Overflow. I got a few good tips there, but the most useful one was probably:

You need to have the extreme small file size in mind when you write the code. So in part, you need to learn all the tricks yourself.

Show me the code!

Here’s the minified source of my submission. Trying to read minified JavaScript is a bit like reading assembly language when you’re used to C, or bytecode when you’re used to Java, so don’t try to understand it too much. Have a look at the un-minified code to see what it originally looked like.

It’s all up on github as well if you want to see the commit history.

Tools of the trade

I realised early on that I needed to have a quick way of minifying my code and checking its size. I started off playing with the Google Closure Compiler web interface but it was too slow to continually copy-paste my code in there. I downloaded it as a CLI app. Next off, I wrote this Makefile:

default:
    java -jar compiler.jar --compilation_level ADVANCED_OPTIMIZATIONS --js kave.js --js_output_file kave-min.js
    stat -c%s kave-min.js

The first line uses Closure to compile with advanced optimisations, and the second outputs the size of the minified file. With this, all I had to do was type make to see what the minified filesize would be. Being able to do this after every tiny optimisation was extremely useful, as I could quickly experiment with new techniques and see if they were worthwhile.

Learning the tools

It’s important to understand what the compiler is actually doing, and what it’s able to do. Reading through the Advance Topics in the Closure docs was essential. Closure will do some pretty advanced optimisations, but it needs help. For example, it doesn’t touch strings (thankfully). That means if you refer to a method in one place as obj.meth and in another as obj['meth'], it can’t rename obj.meth.

Another big no-no is with, which is good, because it just confuses things anyway. Using with means Closure can’t distinguish between properties and local variables, so there are a lot of name shortenings it just can’t do.


In Part 2, I’ll look at some of the specific optimisations that I used.

Categories: Programming Tags: ,

A brief introduction to closures

November 22nd, 2010 17 comments

In this post, I’m going to attempt to explain what closures are and how to use them. Many modern (and some not-so-modern) programming languages contain support for closures, but for the purposes of this article I’m going to be using JavaScript. I’ve chosen JavaScript for a few reasons:

  • Ubiquity: If you have a web browser then you have a JavaScript interpreter
  • Simplicity: JavaScript is conceptually a fairly simple language (especially if you limit yourself to its Good Parts), compared to other dynamic scripting languages such as Python and Ruby
  • Familiarity: If you’ve used any of the C family of languages (e.g. C++, Java or C#) then JavaScript will look fairly familiar.

There may be some differences between languages with the mechanics, but deep down, closures are the same across all languages which allow them, so if you can understand the concept in JavaScript, you’ll understand it in any capable language.

Functional programming

Before I can get onto closures, I need to give a very brief introduction to functional programming.

Functional programming is all about functions (I doubt that comes as much of a surprise). In a language that surports functional programming, functions are generally first-class objects. That means they can be assigned to variables, stored in data structures, passed into functions and returned from functions.

In JavaScript, it’s important to distinguish between referencing a function and calling a function. To reference a function, just use the function name. To call a function, append parentheses (with optional arguments). Here’s an example:

function f() {
    alert('f called!');
}

f(); // calls f
var x = f; // f is a reference to the function, and now x is too
x(); // calls f (or x - they're the same thing)

It’s also possible to create anonymous functions in JavaScript (aka lambdas). If you want to create an anonymous function, leave out the name:

function () {
    alert('I have no name');
}

Another name for an anonymous function is a function literal. If you imagine it like a string literal you’ll see it’s an expression just like any other literal, and so it can be assigned to a variable:

var str = "string"; //string literal
var arr = [1, 2, 3]; //array literal
var obj = { 'JS': 1,
            'is': 2,
            'awesome': 3 }; //object literal
var f = function () {
    alert("I've been assigned to f");
}; //function literal

Note that there is very little difference between assigning an anonymous function to a variable called f and defining a function called f – the end result of both is a function that can be referred to by the name f, and can be called with f().

Another capability of functional languages is the ability to nest functions. It’s perfectly valid in JavaScript to define one function within another. Here’s another example:

function outer() {
    function inner() {
        alert('In ur function');
    }
    inner(); // we can call inner here as it is defined in outer's scope
}
inner(); // Error - but not here - inner is not defined in this scope

It’s this ability, coupled with the first-class nature of functions, that enables closures.

A real-life closure

A closure is a function with access to variables in its containing scope (the function “closes over” the variables). The thing that can be tricky to wrap your head round is that the inner function still has access to the outer function’s variables after the outer function has returned. Here’s an example:

function outer() {
    var counter = 0;
    function inner() {
        alert(counter);
        counter++;
    }
    return inner;
}

var x = outer(); // As we're calling outer here, x is a reference to inner 
x(); // alerts 0
x(); // alerts 1
x(); // alerts 2

In this code, outer is called once, and returns inner. x is a reference to inner. Because inner is a closure, it has access to outer‘s local variable, counter. Even though outer has returned, inner still has access to outer‘s variables. Be careful though – if outer were called again, we’d get a new version of inner. To continue the previous example:

var y = outer(); // Call outer again
y(); // alerts 0 - this is a different closure to the previous one.

The closure has access to any arguments in the containing scope as well:

function outer(x) {
    function inner() {
        alert(x);
    }
    return inner;
}

var func = outer(5);
func(); //alerts 5 - inner has access to the argument

A more advanced example

You should now know enough about the basics of closures to understand a more complex example. Here’s a way to give a JavaScript object private variables:

function CatMaker(name) {
    var age = 10;

    //construct an object on the fly with three methods.
    //All methods have access to age, but age cannot be
    //directly accessed outside of this function.
    return { 
        "sayHello": function () { //first method
            alert("Miaow");
        },
        "getAge": function (inCatYears) { //second method
            if (inCatYears) {
                return age * 7;
            }
            else {
                return age;
            }
        },
        "makeOlder": function () { //third method
            age++;
        }
    };
}

var mycat = CatMaker('Snuffles');
mycat.getAge(true); //returns 70
mycat.makeOlder();
mycat.getAge(true); //returns 77

The ONLY way to make changes to the private variable age is through the method makeOlder. All the methods share the same age variable, because they were all made in the same call of CatMaker. If we called CatMaker again to produce a new cat, it would have its own age variable.

The infamous lambda-in-a-loop problem

Consider the following example:

function attachListeners() {
    for (var i = 0; i < 10; i++) {
        $('#id-' + i).click(function () {
            alert("I am element number " + i);
        });
    }
}

It’s using jQuery. To understand this example you’ll need to know a tiny amount of how jQuery works. jQuery creates a global function called jQuery, aliased to $. The jQuery function takes (among other things) a CSS selector. It returns a jQuery object representing the HTML elements matching that selector. A click handler can be added by calling .click() on the returned object. .click() takes a function to be executed when the matched elements are clicked on. $('#myid').click(function () { alert('hello'); }); will show an alert when the HTML element with an id of ‘myid’ is clicked on.

The example selects 10 elements on the page, with the ids id-0, id-2 up to id-9. Each time round the loop, a new click handler is created. When you click on these elements, an alert box comes up. Some would think that each click handler has its own version of i but you know better. All the click handlers share the same i. Because of the way the for loop works, this value is one past the end of the loop, i.e. 10. So each element proudly announces that it is element number 10, which is clearly incorrect.

The problem here is that although a new closure is being produced each time round the loop, each closure shares the same environment, so i in each closure is the same i as in all the others.

Take a look at a working demonstration of the above code using jsFiddle. Have a play with it so you can get a feel for what’s going on.

The only way around this is to use another function:

function addOneListener(i) { //Each time i is bound to a different value
    $('#id-' + i).click(function () {
        alert("I am element number " + i);
    });
}

function addEventListeners() {
    for (var i = 0; i < 10; i++) {
        addOneListener(i); 
    }
}

Here’s the above as a fiddle. emehrkay in the comments has an alternative implementation of the above using raw DOM methods – it’s often useful to see different ways of achieving the same thing.

The key to understanding the new example is that every time addOneListener is called, a new closure is produced, and each of these closures has a different i. When you start using closures in JavaScript, this will bite you eventually, so beware. It’s such a common issue that bobince of HTML-parsing regex fame put it at number three in his list of common questions on Stack Overflow:

At number three it’s a new entry for “Why is my (Python, JavaScript, …) function getting the same value of the variable every time around the loop?”, by Clint Forloop and the Closures

Vim exercises for the beginner

November 8th, 2010 1 comment

Vim comes with a great little thing called vimtutor. This walks you through the basic Vim commands, so you’re able to get up and running. Each lesson has a small exercise to try out the command or commands taught.

Going through these lessons, I wished that there were more exercises for each lesson, so the key strokes would sink in better. So, without further ado …

Introducing Vim exercises!

I’ve only done a few lessons so far, but eventually I’d like to do exercises for all the lessons in vimtutor. They’re designed for the complete newbie to Vim, but would hopefully be useful to anyone learning Vim.

The exercises are available on github, but if you’re not using git (and if not, why not?!) you can download the file directly.

Here’s a sneak-peek:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lesson 1.1: Moving the cursor
Instructions: Follow the line around the screen using h, j, k and l.
|             Hint: if you accidentally start typing text, press Esc          
|       .-------.   to leave insert mode and u to undo any changes.           
|       |       |                                                             
\_______/       |                                                             
                |                                                             
                |                                                             
                \____.                                                        
                     |    .-----------------.                                 
                     |    |                 |                                 
           .---------+----+----.            |                                 
           \_________/    |    |            |                                 
                          |    |            |                                 
                          |    |            |                                 
                          |    \________.   \________ Well done! Now scroll   
                          |             |             down to lesson 1.3      
                          |             |             with j.                 
                          |             |                                     
                          \_____________/                                     



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lession 1.3: Text editing - deletion
Instructions: Correct the lines of text below by deleting the
unneccesary characters with  x . Hint: use 0 to return to the left.

Faar ouxt inn trhe unchttarteed bakckwataers oof the unfashionable end of the 
Weeestern sSpiral arrm ohf thze Gallaxy lies ae sm*all unregarded yellow syun.

Iti ijs aaa trufth universalllly acknowledgegeged, that a sxingle manr iin 
possessibon of aw gopod fort^une muost bee in wantxxx of a) wife.&&

Int waes lourvve at first sighght.. The feeirst time Yossarian sa9w th$e 
chaplain heee fell maaadly in lov*ve with himmmm. 

Aaas Gregor Samsa awokke onne moorning frrom uneeasy ddreams hhe founnd 
himmself trransformed inn hiis bbed intoo a giiigantic insecct.

It was  a bright cold day in  April, and the clocks  were striking   thirteen.

Twwo householllds, bothe alikee inf dignityyyy,
IIIn fair% Verrrrona, whereq wiie lay# ou987r sc"ene,
Frrom ++ancient grudddge brrrreak tto #new mutiny,,,
Where7 ciiviil blo*od mak^es civil h=ands uncclean.
Categories: Linux, Programming Tags: ,

Why classes are confusing in Ruby

November 8th, 2010 1 comment

I wrote before about how Object and Class are confusing in Ruby. I think I know what’s so confusing about it now.

In most other OO languages there are two concepts, objects and classes. Objects are instances of classes, and classes can be subclasses of other classes. Basically, each object or class has what could be described as a pointer. An object has a pointer to its class, and a class has a pointer to it’s superclass. In Java, where every class inherits directly or indirectly from Object, there is a clear line from each individual object to the Object class, via the object’s class and its class’s superclasses. Let’s take Bob, an employee:

      class          superclass      superclass
bob    ->    Employee    ->    Person    ->    Object

There’s a clear line from bob to Object, via bob’s class, and his class’s superclasses.

In Ruby, there’s still this line, but because classes are objects as well, there’s an extra line. There isn’t one pointer – there are two. Each class has a class and it has a superclass. That’s how you get to the point where Class is_a Object, and Object is an instance_of Class. Here’s some beautiful ascii art to illustrate:

  Class -------> Module -------> Object
    ^
    |            Class --------> Module -------> Object
    |              ^
    |              |
  Class -------> Module -------> Object
    ^
    |            Class --------> Module -------> Object
    |              ^
    |              |
Employee ------> Person -------> Object
    ^
    |              
   bob

Every vertical pointer describes an instance_of relationship. Every horizontal pointer describes an is_a relationship. I haven’t drawn all the instance_of relationships of the classes, because if I did I’d be drawing an infinite tree of Class, Module and Object nodes, and I don’t have time tonight to do that.

That’s what makes Object and Class so confusing in Ruby. In most other OO languages that use Object as the overall base class, there is a straight line between an object and Object (with multiple inheritance you can get diamonds, but it all gets back to the one Object eventually). With Ruby, if you follow all the relationships you end up with an infinite tree.

Of course, this is only because I’m insisting on following all the instance_of relationships. In most circumstances you can just forget that classes are objects, and only follow the is_a relationships. The only instance_of relationships that usually matter are between non-class objects and their classes. But if you follow the rabbit hole, you’ll find just how deep it goes …


I’ve now very much clarified my thinking on Ruby classes. See: Understanding the Ruby object model.

Categories: Programming Tags: ,

Something confusing about Ruby: Object and Class

November 8th, 2010 1 comment

In Ruby, everything is an object, every object has a class, and all classes inherit from Object. Three very simple statements.

Here’s where it gets complicated:

Classes are objects

If a class is an object, then it has to have a class. That class is Class. Here’s an example

class Example; end  #an empty class
Example.class  # => Class

So, if Class is a class, what is its class?

Class.class  #=> Class

Ok, fine. What is Object‘s class?

Object.class  #=> Class

No surprises there.

So far we’ve seen that Example, Class and Object are all instances of the Class class. That’s one part of OO – objects are instances of classes. In Ruby, where everything is an object, then even classes are instances of classes. And the the class of a class is Class.

The other part of OO is inheritance. Classes inherit from other classes. I’ve already stated that everything inherits from Object (either directly or indirectly). Let’s test that:

Example.superclass  # => Object
Object.superclass   # => nil (it's not turtles all the way down)
Class.superclass    # => Module
Module.superclass   # => Object

So, every class inherits from Object except for Object, which doesn’t inherit from anything.

Where does this lead then? Well, have a gander:

Object.instance_of? Class  # => true
Class.is_a? Object  # => true

Object is an “instance” of Class, but Class “is” an Object.

Conclusion

Ruby really does take this “everything is an object” thing seriously. Wow. And my brain hurts.

Categories: Programming Tags: ,

Ruby post-Python: second impressions (or: how I learned to stop worrying and love the implicit)

October 17th, 2010 16 comments

So, it’s three months since I wrote Ruby post-Python: first impressions. I’ve got a few small things to say, and a few bigger things. I’ll start with the small things:

  • I was worried before about the change from elif to elsif, but it turns out this hasn’t been a problem. Having a decent case statement means I haven’t had to write a single elsif since I’ve been using Ruby.

  • The case statement is AWESOME. Being able to match against ranges, regexes etc. is just pure genius.

  • I said about parsed and unparsed strings before – at that point I didn’t know about string interpolation. More awesomeness.

  • I’m still on the fence with 0 not evaluating to false, but I can see more and more why you’d want it. For example, instead of saying if myArray.length you’d use if myArray.empty? which is obviously more readable.

  • Question marks in method names are brilliant. I first came across the idea in Scheme, and I love it.

Implicitness (implicity?)

So, here’s my main second impression. It’s all to do with the implicitness of the two languages.

In Python, there’s a preference for explicitness. Indeed, line two of The Zen of Python reads:

Explicit is better than implicit.

Ruby is very different in this respect. Ruby’s syntax is full of implicitness. One good example is self in methods. Every method in Python needs self as its first argument (I know it’s only called self by convention). And when you call a method inside an object, you call it on self. In Ruby, the self is implicit, both as an argument and as the receiver of the method call.

I’m not going to get into an argument here about which is better. My current thinking is that requiring self as an argument to every method is completely redundant, but I quite like the clarity of using self as the receiver of all internal method calls. It still makes me nervous not prepending self. to method calls in Ruby – they look like global function calls to me. I’m getting used to it though.

Another example is method invocations. In Python, you need the parentheses. In Ruby you don’t. This gives an advantage to both languages. On the Python side, it means that myObj.meth returns the method, whereas myObj.meth() returns whatever meth() returns. That gives handy functional expressivity. I use this all the time in JavaScript, where it’s essential to know the difference between func and func().

On the Ruby side, the ability to call methods without parentheses has some really nice consequences. Coupled with method_missing, it gives the ability to write really nice internal DSLs. And because direct access to object attributes is impossible, you’re able to write code that looks like direct attribute access with all the benefits of accessor methods. That is a huge win for me. Nobody likes having millions of getters and setters, but if the alternative is having to rewrite all your client code when you realise you need to do more than simple assignment they’re a necessary evil. Thanks Ruby for helping me avoid that!

Something else that’s nice with Ruby’s implicitness is the fact that arrays and hashes don’t need brackets and braces. I mean, what else could :a => 5 be other than a hash? The syntax is unambiguous, so there’s no need to insist.

Special cases

Here’s another line from The Zen of Python:

Special cases aren’t special enough to break the rules.

It strikes me that this is another big philosophical difference between the two languages. In Python, whatever you’re doing, you’d better be doing it consistently and by the rules. It’s rare that special syntax is added to do something that could be achieved with normal code. Ruby, on the other hand, tends to optimise for the common usage, which might mean in certain cases a different syntax may be needed. Eli Bendersky, in his excellent blog explained this philosophy in his description of Ruby’s blocks:

… there is one common case that stands high above all other cases – passing a single block of code to a method that makes something useful out of it, for example iteration. And as a very talented designer, Matz decided that it is worthwhile to emphasize this special case, and make it both simpler and more efficient.

Is it worth keeping all syntax consistent, or is it OK to introduce special forms for certain common circumstances? Before I started using Ruby, I was probably in the former camp, but I’ve been convinced of the advantages of the latter now.

Metaprogramming

Every time the question of differences between Ruby and Python come up (or more often, the issue of superiority), the Pythonistas insist that all the metaprogramming magic possible in Ruby is possible in Python as well. But the fact of the matter is, it’s just not used that often in Python. With Ruby though, metaprogramming is just what’s done. In a lot of ways, this goes back to the implicit/explicit divide, in the sense that dynamically created methods aren’t visible in your source code. Personally, I don’t see the problem with that, but I can see why this would upset others.

What does Python have?

It’s obvious that I’ve fallen for Ruby, and Python’s lost its shine a bit for me. But it’s worth acknowledging the bits that Python does well. Interestingly, these all seem to be related more to the community than the language (although the language selects for the community pretty heavily I guess).

  • Solid libraries, especially NumPy/SciPy. There’s not really an equivalent in Ruby.

  • Stable libraries. In Ruby, and especially Rails, the best way to do something today will be the old way to do it tomorrow. There’s always a better way to do it, and it can be difficult to keep up. Because most of my experience in Ruby has been in Rails so far, it’s hard to say whether this is actually just a feature of the Rails community.

  • Mature community. This might be controversial, but it feels to me like the Python community likes to just get stuff done, whereas the Ruby community wants to be playing with the newest, shiniest things out there.

  • Decent GUI. By this, I mean PyQt. QtRuby looks like it’s miles behind PyQt, and FxRuby doesn’t look very advanced. PyQt, however, is great, even if there is a bit of impedance mismatch between a C++ framework and Python.

Categories: Programming Tags: , ,