F# Basics – From loops to folds
Instead of writing about Christmas trees this year, like I have for the last few years of FsAdvent, I thought I’d do a post about a topic that seems to frequently arise on the F# Software Foundation slack #beginners channel – looping and folds.
People coming to F# from other languages frequently find themselves copying the idioms of their past, and struggling with how to have things make sense in F#. Loops are a very common example.
A common thread in these discussions which seem to cause difficulty is “looping to generate a result”. This can be seen easily in the C# tutorial’s looping code. Here, a mutable value is created to hold some result (sum), a for loop iterates through a bunch of numbers, logic is placed in the loop body that mutates the value, and the result is used afterwards. While this can be ported to F# directly, it tends to be ugly in the end, as it relies on looping and on mutation.
I frequently suggest that people new to F# should strive to reduce mutation as well as code which relies on side effects (every loop body does), but confusion frequently occurs with cases like this. Recursion is frequently suggested as a solution, but its far from simple in many cases, especially when people are just starting out.
So let’s rework this a bit – and lets start small by reducing mutation. First off, we can refactor the “logic” portion of this into a function and make it pure. In this case, we want a function which will not mutate a value, but instead, return a new value each time it’s run, passing the existing input into it.
Doing this allows the “logic” to be much more easily tested, but is also an important first step. We can then remove another point of mutation – the for loop itself, by switching to a foreach loop over a sequence:
It’s typically at this point where people get stuck. Its far less obvious to see how we remove the mutable value which holds the result. Before we move forward with removing the mutation, lets port this to F#:
While this is obviously more succinct, it does exactly the same thing, using the same imperative techniques as before, at least in the core loop. In preparation to eliminate the mutation, I’m going to make this less succinct, so the intention behind what is occuring becomes more obvious.
While this does the same steps as the previous version, each temporary value is broken out into a binding, so the intent and process becomes more clear. However, we still are using a mutable to hold and mutate state each step.
Enter fold.
If we look at Seq.fold, we see that it has the following signature:
This function takes a function as input, which takes the previous state, the current value, and returns the new state. That is remarkably similar looking to our factoring of addNewConditionally – it takes the last value, the current value, and returns the new value. It also then takes a ‘State value – which is the initial state used for the first item being processed. Finally, it takes a sequence to operate over, and then returns a result at the end after processing each item.
In our case, we have all three of these, so we can easily rewrite our loop as a fold:
Note that this eliminates the need for a mutable value – each step works on the previous result, and returns a new value. We can now remove all of the temporaries, and rewrite this much more simply:
This gives us a simple, clean way to handle this example.
On a side note, in this particular case, the result type and input collection are all int values, and the initial value is the default value for int (0), so we have other, potentially simpler options. We could use Seq.reduce instead, or a Seq.filter piped into a Seq.sum. However, looking back at fold, one benefit is that the ‘State type is separate from the input type ‘T. This means we can use a fold to replace any loop that is used to generate results based off the items being looped over, even if the types don’t match.
As a simple example, lets say our requirements change. Instead of just needing the sum of the values that meet our criteria, suppose we want to generate a list of the values as strings. All we need to do is change our function to work on and return a string list, and to change our initial ‘State value.
This works the same way, but now we go from int seq to string list instead of to an int as our result.
While these are fairly simple and contrived examples, the approach taken here works for moving from any loop that generates “results” to a fold:
- Make core logic a pure function (which can be a lambda expression) – which takes the previous result, the current value, and return a new state
- Rework items being “processed” into a collection or sequence if using a “for” style loop
- Remove the mutable “running value” and the loop, and replace with a call to fold