Fast languages

The same old boring questions

I've been programming for some time now. The year 2005 was pretty much 20 years ago. Lots of things change. Others not so much. Yet another set of things seem to never change at all. One of those topics that seem to keep coming back again and again is the language performance topic. To be more specific, I'm talking about this kind benchmarks

Language Performance

I'll be honest. This is as boring as is useless of a conversation. The reason why this is boring is due the fact that the question is pointless and ill formed. And useless because most of the benchmarks are poorly done, and they mostly prove nothing.

In this example what I found amazing is the fact that we are in 2025 and by now the reason why this is pointless should be obvious to everyone. So the question remains. Why do these keep coming again and again?

Instead of software engineering

Before I spit too many opinions out let me explain a bit better why I make these apparently unfair statements. Let me start by arguing first what a programming language is. In simple terms a programming language is a tool to solve problems. When stripped of all the nonsense, personal bias and religious thinking we can boil down a programming language to this simple idea. If you notice we got two fundamental concepts at stake here. The concept of problem, or family of problems and the concept tool. Let's take a step back and look to other technical areas and replace the concept tool and problem and see what pattern emerges.

  • Photography:
    • Lets consider the tool a kind of lens and the problem being taking the best photography.
  • Mechanic:
    • Lets consider the tool the screwdriver and the problem fixing the car

We can go with more examples but let's do an analogous line of reasoning with these two particular instances of tool/problem pair. If we follow the same line of thought we see being applied in software engineering we end up with benchmarks like:

  • Benchmarking of lens based on the focal length: which one is the widest?
  • Benchmarking the screwdrivers on the weight: which one is the lightest?

If we did this kind of benchmarking we would be called idiots, right? The reason is simple. Different lens target different kind of photographic perspectives. Different screwdrivers weight differently because they need to work on different kind of screws. In the end they are all justified and all good for which they were designed.

Yet, on software engineering we keep seeing the classic: benchmarking the fastest language.

Computer languages are different, and they are all useful pretty much because they target different trade-offs. Some give priority to performance, others to lowering the cognitive overhead. Some give priority to versatility others to correctness. The amount of software engineering problems is so big that in the end all of them are justified.

Now that should be clear that comparing languages based on an attribute without mentioning the problem at hand is a bit silly we can get a bit deeper and even question the implicit false dichotomy lingering around. Above I mentioned that the benchmark idea is an ill posed question/exercise. There is a reason for that. The reason is due the fact that it may induce people to conclude, erroneously, that by using a slower programming language we end up being limited to the subset of problems that are not performant intensive. And this, my friends, is wrong. A software project is composed of a huge amount of small problems. Usually the performance needs is isolated and represents a very small portion of the project. If 90% of the problem is not performant sensitive it means that we could use a slower language that prioritize correctness and has a lower cognitive overhead instead of one faster but less safe and with a bigger cognitive overhead. In fact that what just happens through the industry.

Let's put this in code

Let's consider the following example in php.

<?php  
echo "Hello, heavy stuff here:\n";

function someReallyHeavyStuff($lima, $limb){  
  $result=0;
  for($i=0;$i<$lima;$i++){
    $sum=0;
    for($j=0;$j<$limb;$j++){
      $sum+=$i+$j;
    }
    $result+=$sum;
  }
  return $result;
}

$r = someReallyHeavyStuff(10000,100000);

echo "The heavy result: $r\n"

?>

You can download the folder and run the following

make run-heavy-php

time docker run -it --rm -v "${PWD}:/usr/src/myapp" -w /usr/src/myapp php/ffi php heavy_stuff.php

Hello, heavy stuff here:  
The heavy result: 54999000000000

0.01user 0.00system 0:10.32

You can notice that the php script take 10 seconds to execute. Now, let's just write the same in C

unsigned long some_heavy_stuff(int lima, int limb) {  
  unsigned long result = 0;
  int i, j = 0;
  int sum = 0;
  for (i = 0; i < lima; i++) {
    sum = 0;
    for (j = 0; j < limb; j++) {
      sum += i + j;
    }
    result += sum;
  }
  return result;
}

Lets see how much it takes to run this time

time ./main  
Hello from blistering C:  
Doing stuff:  
Result is 12049327040000

./main  2,45s user 0,00s system 99% cpu 2,454 tota

You noticed that the time dropped to 2.45 seconds. We also noticed that the result is not the same. The reason is that when overflow php will cast the int into 32bit float that has a lower resolution than a C unsigned long, that is 64 bits. For this example the values are not the important, execution times are.

So what do we conclude? That we need to throw away php? Just rewrite everything in C? In many domain problems that could be the best trade off. But in others that would be bad one. Is this the end of the road? The answer is no. And the reason is because we are posing the wrong questions. This is not a question of PHP vs C. We can use both, hence the false dichotomy. In the following example we do just that.

<?php  
$ffi = FFI::cdef("
    unsigned long some_heavy_stuff(int lima, int limb);", 
    "./heavy_stuff.so"
);

echo "Hello, also heavy stuff here:\n";

$result = $ffi->some_heavy_stuff(10000,100000);

echo "The heavy result: $result\n";

?>

Here we use FFI which stands for Foreign Function Interface (spoiler alert, this approach is supported for pretty much all languages) and load the linux shared object and use the function called some_heavy_stuff that was defined and compiled into native code.

By running

make run-heavy-php-c

time docker run -it --rm -v "${PWD}:/usr/src/myapp" -w /usr/src/myapp php/ffi php heavy_stuff_c.php

Hello, also heavy stuff here:  
The heavy result: 12049327040000

0.01user 0.01system 0:02.94  

We end up running the same in php taking just 2:94 seconds. This is much less than the first example and so the performance limitation is mitigated. Instead of using one language instead of the other we use both languages accordingly the problem at hand.

Wrong optimizations

This php vs c story does not end here. There is another important takeaway. We can make php faster than C. Sounds stupid?

make run-lesser-heavy-php

time docker run -it --rm -v "${PWD}:/usr/src/myapp" -w /usr/src/myapp php/ffi php lesser_heavy.php  
Hello,not so heavy stuff here:  
The not so heavy result: 54999000000000  
0.01user 0.01system 0:00.51

Now we got php code running in less than a second? We have php code faster than c code? Can we conclude that php is faster than c? No, of course not. The reason why this happened is because good php code is better than bad c code. And this is what is happening here.

The reason why this php code is faster than the C code is because its not the same

unsigned long lesser_heavy_stuff(int lima, int limb) {  
  unsigned long result = 0;
  int i, j = 0;
  for (i = 0; i < lima; i++) {
    // Optimize sum of first n numbers
    // Factorize outer index
    result += (i * limb) + ((limb - 1) * limb) / 2;
  }
  return result;
}

the new version uses a different algorithm. If we pay a bit of attention we easily notice that the inner loop is not necessary. By using factorization and Gauss formula we end up with reduced complexity. We change from a quadratic O(2) to linear O(1) complexity.

This leads us to the second point I mentioned earlier. Benchmarks are silly and usually done wrongly because not enough attention is paid to the computational complexity, and we end up optimizing a slow algorithm which is always slower than implementing a faster one.