Refactoring

Over the years, I’ve come to realise that code is more of an inter human communication device than a human-machine one. Machines only need electric current to perform their tasks. Logic gates, binary digits, you know what I’m talking about…

We, on the other hand, have evolved sophisticated mechanisms to relate with the surrounding environment. Even if the commands are still electric current based, we do not know how, nor do we want to communicate with each other at that level. One of the most advanced such mechanisms is the human language. This is a very complex construction, based not only on words, but also sounds, visual queues and, most importantly, abstract concepts.

It’s difficult enough to be in perfect understanding with each other, through human language (not to mention the differences between human languages and how they might map to different abstract concepts and all that…), let alone having to translate what we mean through an enormously oversimplified mechanism: that of a computer programming language.

So, thinking this way (that writing a computer program, in a computer programming language, is something that we do mostly for our own and our peers’ understanding) it makes sense to have that program as clear as possible. And yes, I know “clear” is a subjective concept, but that doesn’t mean we mustn’t try to achieve clarity. What I mean by this, is that it’s really much better for program readers to be able to grasp the intent of the program as easily as possible, if they are to do something based on it (modify it, learn from it, fix it, etc.). As always, don’t forget that you are also one of those program readers, even if it’s your own program.

Compare the following snippets of code and see which one you prefer:

  def exec(list) do
    res1 = []
    res1 = for x <- list, x.age > 18 do
      x
    end
    res2 = []
    res2 = for x <- res1 do
      x.weight
    end
    Enum.sum(res2) / length(res2)
  end

vs.

  def avg_adult_weight(people) do
    adult_weights = people 
      |> Enum.filter(&(&1.age > 18))
      |> Enum.map(&(&1.weight))
    Enum.sum(adult_weights) / length(adult_weights)
  end

The first snippet means nothing to us until we stumble upon x.age > 18, which gives away some hint about x probably being a person. A person over 18 years of age. This looks like some kind of filtering, ok. Next there’s some kind of transformation and in the end, some math. Sum / length looks like an average. Ok, I think I understand what they meant.

But wait! Why do they iterate twice over the list, isn’t that wasteful? Sure it is.

I left the multiple iteration pattern in there on purpose (something that is unfortunately very common in production code), to emphasise the fact that the authors were just drafting their intention. Once they tried it and saw that it “worked”, they moved on.

The second snippet should be clear for every programmer and, even for non-programmers. I specifically didn’t mention the programming language, because I really shouldn’t have to, for the reader to understand it. Even if you don’t get the weird syntax with those ‘&’ signs, you should easily read over them and see what the author meant. This is the essence of code clarity.

For even more clarity, I would also add details about the function signature:

@type person :: %{name: String.t(), age: integer, weight: integer}
@spec avg_adult_weight([person]) :: float

I think that code produced in the style of the first snippet, comes from the fact that authors think in an artificial way first, trying to explain human concepts in computer science terms (data structures, loops…). Refactoring then, is the process of translating the expression of those terms to a more human friendly form. This, to me, is a very low level process that we could avoid, by approaching coding from a human language perspective. For example, this is how I would approach the average adult weight function (ok, I would directly write the elixir code in this case, but the process is valuable when you don’t know exactly how you would implement it):

I know I want a function to tell me the average adult weight for a bunch of people:

  def avg_adult_weight(people) do
   
  end

Ok. Now I get those bunch of people and I want just the adults and then do some math with their weights:

  def avg_adult_weight(people) do
   # get just the adults… some filtering
   # do some math with the weights… sum / count probably
  end

You see, by taking notes (in comments) of what I want to achieve, I also get implementation ideas (after the ‘…’). This is quite the reverse of the above process. It’s a transition from human to programming language. Now, actually substituting the comments with code, is straightforward.

NOTE: Do not assume this function is written without having some verification mechanisms in place (tests, REPL sessions…).

Too many (maybe most) refactoring debates are taking place at the low level we discussed about a few paragraphs above. In my experience, this is really counterproductive. I see refactoring, as a process, starting from the step where you already have humanly readable code (like the second snippet) and you want to reshape it, to accommodate and play nicely with similar or new concepts. I’m talking about things like creating a higher order function, to abstract an algorithm, or similarly, in OOP, creating a template method in a superclass.