Monday, June 15, 2009

Premature optimization in language choice

Donald Knuth famously said, "Premature optimization is the root of all evil." This is one of the most important things for a developer to learn. Many hours of programmer time are wasted on optimizations that save milliseconds. Even if your code is run enough to get back that time, the time is likely to be split up between so many users that none of them care about the tiny difference. Worse, time taken to optimize is time not spent on adding features or fixing bugs, each of which can save the user a lot more time than minor optimizations. Optimization before profiling is a choice of a bad trade-off: optimized code takes longer to write, and is harder to understand and maintain.

Choosing a programming language is often the most important optimization choice your team will make. Almost universally, compiled languages run faster than JIT-ed languages, which in turn run faster than traditionally interpreted languages. This comes with the same trade-off as most other optimization choices: the closer the code is to the machine, the longer it takes to write and the harder it is to understand and maintain.

This leads me to my main assertion of this post: language performance should be a very minor consideration in language choice. Since the language is one of the first choices made in a project, choosing a language based on its performance is inherently a premature optimization.

Of course, I'm not saying that optimization isn't important. Optimization is merely an area in which it is very important to choose your battles wisely; only optimize when and where you have to. Performance testing and code profiling can help you to know when and where that is.

The Pareto Principle is often applied to optimization with the ratio that 80% of your performance bottlenecks can be removed by optimizing 20% of your code. However, my experience that the ratio is much more disparate; 99% of your performance bottlenecks can be removed by optimizing 1% of your code.

Just as choosing a language for your project based on performance is a premature optimization, rewriting performance bottleneck code in lower-level language can be a positive optimization. Python, Perl, and Tcl support calls to C or C++ code using SWIG. Java supports call to lower-level languages through the JNI. Nearly every high-level language has similar tools that allow calls to lower-level interfaces. While choosing a language based on its performance is a premature optimization, choosing a language based on its ability to be optimized in a lower-level language is not premature, it's wise.

Some notes:

Performance scalability is a different issue from performance. One need only look to Twitter's issues with Ruby to see why this is an important distinction. Languages with high performance may not have lower performance scalability (see DARCS' issues with Haskell) and languages with lower performance may have high scalability (see Google's use of Python).

Lastly, I said that language performance should be a very minor consideration in language choice. However, sometimes high-performance languages are still good choices, not because of their performance, but because they are most expressive for the problem domain. Operating systems are a good example of this; memory management and filesystems require a lot of low-level addressing that is best represented in a systems language like C or C++.

2 comments:

  1. Good article, but that premature optimization quote is due to C.A.R. Hoare, not Knuth.

    ReplyDelete
  2. Interesting. I've always attributed that quote to Knuth, but some research on the internet shows that there's some disagreement on who coined the quote. Knuth attributed the quote to Hoare, but Hoare disclaims ever having said it.

    ReplyDelete