Monoids and Code
Functions and methods
There are lots of strange names when the topic is programming. People is immensely and as a result we have names for virtually anything you can think of in the computer industry. You got names like object oriented programming, future, callback, tcp, udp. You name it.
Computer programming languages are a subset of the computer world and as such they have lots of specific keywords. Yes, and a very specific semantic. Programming languages are a broad concept and within it we find several schools of thought. Different paradigms. You have the object oriented approach, the procedural, logic, functional and even multiparadigm one.
In this particular post I will delve into one particular case of programming philosophy and a specific construct commonly associated, Functional programming and the Monoid.
If you do a quick search you'll find a lot about functional programming (and honestly there is lots of things we can talk about), but if we must define in short terms what it is I would say
A paradigm where a program built based upon functions (in the mathematical sense, not procedures) and their composability
This seems a somewhat simplistic but it sums up the all idea. In functional programming you try to model all your solutions by defining functions and you compose them to achieve higher levels of abstraction and polymorphism. This seems kind of trivial, right?
Lets explore a bit more what do we mean by function in functional programming. You see, when we talk of functions in the world of programming languages sometimes what we mean in reality is a procedure. A function has a slight different semantic.
Consider the following method (since we are talking functional programming we choose scala as example)
def divide(numerator:Double,denominator:Double):Double={
numerator/denominator
}
Many people call the above method a function. But is it? How you define a function? Accordingly wikipedia
In mathematics, a function is a relation between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.
We can describe a function as a mapping (this is called relation in algebra) between values in one set A to values in other set B. That's why we saw the mathematical notation of F: A -> B. These mappings obey a simple constraint you cannot map the same value a from A into two different values b,c from B. So if f(a)=b and f(a)=c then b=c otherwise f is not a function. So functions like random(a,b) that will return a random value between a and b are not mathematical functions but, instead, procedures or methods.
Side effects and composability
Therefore, by looking a the previous definition, we should conclude that, by definition, divide is a function! Right? At first glance it seems reasonable to define this function as a mapping from the cartesian product of Double to Double.
Double x Double --> Double
But what about divide(1,0) does this belongs to the Double set? Unfortunantly it turns out that a division by 0 is not defined in the Double set. So by definition this is not a matematical function.
But does this detail really matters when we are outside the realm of the imaginary world of mathematics?
Maybe in many programming language paradigms it is not something that you'll need to take into account. The same don't apply if you want to make use of function programming and achieve polymorphism by composing functions.
Lets make this a little bit more clear with the following example.
println(divide(divide(10.0,2),divide(4.0,2)))
it is not hard to say that the previous piece of code will print in the screen the value of 2.5. You see, here we use the principle of mathematical function composion leverage the power of our building block, the function.
What about the following piece of code?
println(divide(divide(10.0,0),divide(4.0,2)))
It turns out that scala is smart enough to transform the result into Infinity. But this is because scala extended the conventional double to enable division as a closed operation under the set Double. What we mean here is that if a,b belongs to Double then a/b also belongs to Double. Why is that closeness is important?
The answer is pretty simple.
public static double divide(double numerator,double denominator) {
return numerator/denominator;
}
public static void main(String[] args){
System.out.println(divide(divide(10.0,0),divide(4.0,2)));
}
The previous code, unlike the Scala case, java don't extends the double. First because double is a primitive data type and therefore we cannot extend it and therefore java is forced to throw a runtime exception saying that what we just did is not cool.
When we program we define return type values for the methods. Unlike the mathematical case, our methods don't guarantee, automatically, closure over the return set. When this happens we end up with a method that can or cannot return the promised value data type. And this is a core problem. By allowing returning of null pointers, exceptions and other side effects we broke function composability and hence all the power that comes with it.
Close over composition
To be able to take advantage over composition methods (or procedures) need to become pure mathematical functions. As such we need to remove all edge cases that break the closure. Polymorphism trough functional composition needs, in the first place, closure over this operation. As such we need that
For f,g : A x A -> A a,b,c in A, f(a,g(c,b)) belongs to A
But how do we get this in the first place?
The idea is pretty simple and widely used in mathematical theories. You just need to redefine your functions to work in different sets. So initially we have the set of all floating point primitive doubles, lets call it D. We had our function divide that was defined in the traditional arithmetical way by just invoking the division operator. However we hit a roadblock. There is no way that divide(1.0,0.0) belongs to D. So what we do? Well we cheat. We devise a super set that will include all the previous cases and all the aberrations as such 1/0. But first what does even mean 1/0. Well in floating point arithmetic it is purely nonsense. But in the mathematical realm this means Infinity. So lets create a new set called C (from closure) that will include these edge cases. So, first, lets abstract a little bit of code and lets identify all the edge cases we can encounter.
Under floating point values set the aberrations are the following
0/0 --> This is not defined, lets call it not a number
a/0 where a>0 then by mathematical convention this is just +Infinity
a/b where b<0 then by mathematical convention this is just -Infinity
Yeah this is pretty fun. But hey those are mere things you just invented.
Well that's absolutely true. But that doesn't mean this invented stuff don't work. Well its true that it will not work on the set D. But if we include it on C we just need to devise C
Sure, but how?
public static abstract class DoubleClosure {
protected double value;
public abstract boolean isInfinity();
}
public static class DoubleVal extends DoubleClosure{
public DoubleVal(double val){
this.value=val;
}
@Override
public boolean isInfinity() {
return false;
}
}
public static class PlusInfinity extends DoubleClosure{
@Override
public boolean isInfinity() {
return true;
}
}
public static class MinusInfinity extends DoubleClosure{
@Override
public boolean isInfinity() {
return true;
}
}
So the trick is to create a custom type DoubleClosure that will include all the posible cases and then, voilá, we just operate over these types. So instead of
public static double divide(double numerator,double denominator)
we build, instead
public static DoubleClosure divide(DoubleClosure numerator,DoubleClosure denominator)
Yeah sure, these ideas are all made up by you. No real application would use this.
Well actually this is how scala implements pure functional arithmetic over floating point doubles. If you see the Double.scala class you'll find this
object Double extends scala.AnyRef with scala.AnyValCompanion {
final val MinPositiveValue : scala.Double
final val NaN : scala.Double
final val PositiveInfinity : scala.Double
final val NegativeInfinity : scala.Double
final val MinValue : scala.Double
final val MaxValue : scala.Double
def box(x : scala.Double) : java.lang.Double
def unbox(x : java.lang.Object) : scala.Double
override def toString() : java.lang.String
}
Which is pretty much the same idea but with a different syntax.
Associativity, Composition and Identity
Wrapping a data structure to a super set in such a way we get closure is a neat way of removing the need to constantely deal with edge cases. The key idea is encapsulation of the edge case logic in pure mathematical functions. But we can leverage the level of abstraction with other mathematical features. Namely associativity over composition. Associativity is not mandatory to implement pure functions. For that we got the closure property. Associativity is a nice property if, for instance, we want to program in a way we can abstract paralellism. The key idea behind parallel optimization consists in breaking your code in small functions and spread those functions over a set of processing units.
Lets pick an example. Imagine you got the function max(a,b) which simply returns the max between two numbers and return the biggest one. Now imagine we want to compute the max of a very, very big list. Well one idea would be to pairwise calculate the max until all the numbers were compared. How can we do this?
Well we can just pick the first element with the second and compute c=max(a_1,a_2), then c=max(c,a_3) and go with this until no more elements were available.
Is there a better approach? Yes you figure it! You can paralelize it.
If we had a_1,a_2,a_3,a_4 we could do something like max(max(a_1,a_2),max(a_3,a_4)). Where the two internal max operations could be done in parallel.
But why can we do this? The answer is associativity. The reason why this functions is because the max operation has the nice property of
max(max(a,b),c) = max(a,max(b,c))
which by definition is the mathematical property of associativity. It's the independency of order of operation that give us the guarantee we need to abstract away and parallelize immediately.
There is yet a catch, however. Associativity help us abstracting the funcitonal power by avoiding dealing with order of operations but we still need to track the number of operations in transit. Sounds strange? Lets imagine that we have the max operation running into 3 different threads and we want to avoid deal directly with the low level operation of distributing accuratelly all the elements to compute.
Imagine that I have A=[a,b,c,d] and max_1, max_2, max_3 the max function running on three threads. We could compute the final result as
res = max(max(max_1(A1),max_2(A2)),max_3(A3)) where A1=[a,b], A2=[c,d], A3=[]
notice that max_3(A3) doesn't make much sense because as it is would force us to break this function composition. But notice that under these semantics max([]) is equivalent to compute max(0,0) assuming a,b all positive. So the result is the same as
res = max(max(max_1(A1),max_2(A2)),max_3(A3)) where A1=[a,b], A2=[c,d], A3=[0,0]
The same goes if A3=[f]. This would be equivalent of computing max(f) = max(f,0)
So what are we talking about here? The presence of identity over the operation give us a way to abstract the number of function composition. So we add a new layer of abstraction over the tooling of functional programming.
Ok. But in the beginning you talk about monoids. What does all this mumbo jumbo have to do with them.
Well by definition a monoid is just that, a set with an operation that have associativity property and identity element.