Erlang encourages poor Functional Design

•2016/10/17 • 3 Comments

Thesis

While Erlang may be a functional programming language, it encourages poor functional design and ingrains the habit of excessive coupling, which I’ve witnessed programmers take with them to other languages.

Definitions

Loose coupling is one of the programming principals:

in which each of its components has, or makes use of, little or no knowledge of the definitions of other separate components.[1]

Functional design helps create loose coupling, it is:

A design method in which the system is seen from the functional viewpoint. The design concentrates on isolating high-level functions that can then be decomposed into and synthesized from lower-level functions. Development proceeds as a series of stepwise refinements of functionality.[2]

Such designs have a number of advantages: it’s easier understand and reason about small, clearly defined functions; it’s easier to test such functions; it increases code reuse; it decreases the blast radius when changes need to be made. It’s the first of the unix philosophiesmake each program do one thing well. It’s also the S in SOLID: single responsibility principle a class should have only a single responsibility.

Error handling by returned values

Functional programming languages tend to deal with errors by either throwing an exception or by returning an error value. Many functional languages—including erlang—support both. Regarding this thesis, the returned errors are the problematic ones in erlang.

The basic convention for returned errors in erlang is to return either the tuple {:ok, Value} for a successful computation or {:error, Details} for a failed one. This is just a convention, sometimes unpure things just return :ok, and sometimes failures return {:error, Reason, Details} or something altogether different.

A similar mechanism in Haskell is the Either type, which can either be Right(value) or Left(error), or the similar Try in Scala which can either be Success(value) or Failure(throwable).

Composition of fallible functions

The nice part about Haskell’s Either and Scala’s Try, are that they support functional composition on their failure types which—much like exceptions—will avoid continued computation on failure.

To show what this looks like, let’s use an example of a hypothetical session service, which takes a json http post with a username and password, decodes it, looks up the user details from the database, checks the password, and generates a jwt token to respond with.

In scala, our top-level code might look like the following:

@Json case class LoginRequest(username: String, password: String)

def login(dbConn: DbConnection, req: Request): Try[Response] =
  for {
    reqJsonStr <- req.body
    loginReq <- json.decode[LoginRequest](reqJsonStr)
    userDetails <- db.findUser(dbConn, loginReq.username)
    _ <- checkPassword(userDetails.hashedPassword, loginReq.password)
    jwtToken <- jwt.generateToken()
  } yield Response(jwtToken)

Each of the above methods returns a Try type and the for block conceptually desugars to something like:

def login(dbConn: DbConnection, req Reuqest): Try[Response] =
  req.body match {
    case Failure(err) =>
      Failure(err)
    case Success(reqJsonStr) =>
      json.decode[LoginRequest](reqJsonStr) match {
        case Failure(err) =>
          Failure(err)
        case Success(loginReq) =>
          db.findUser(dbConn, loginReq.username) match {
            ...
          }
      }
    }
  }

So if findUser() can’t find the user in the database, it will return an appropriate Failure(), which will be immediately returned and checkPassword() and the rest of the function won’t be executed. While there is a learning curve to writing code this way, it’s pretty obvious without knowing all the details that the result is fairly concise and readable.

What happens if we try writing the same thing in erlang? Unfortunately erlang doesn’t have built-in monadic blocks, so we end up needing to write code like the following:

validate(_DbConn, req#{body = <<>>}) ->
  {:error, "No body provided"};
validate(DbConn, req#{body = ReqJsonStr}) ->
  case json:decode(ReqJsonStr) of
    {:error, Reason} -> {:error, Reason};
    {:ok, #login_request{username = Username, password = Password}} ->
      case db.find_user(DbConn, Username) of
        {:error, Reason} -> {:error, Reason};
        {:ok, #user_details{hashed_password = HashedPassword}} ->
          ...
      end
  end.

As you can imagine, this nesting can get quite deep and it can quickly become really hard to read, so the urge—and typical next step—is to split this into multiple functions:

validate(_DbConn, req#{body = <<>>}) ->
  {:error, "No body provided"};
validate(DbConn, req#{body = ReqJsonStr}) ->
  decode_json(DbConn, ReqJsonStr).

decode_json(DbConn, ReqJsonStr) ->
  case json:decode(ReqJsonStr) of
    {:error, Reason} ->  {:error, Reason};
    {:ok, #login_request{username = ReqUsername, password = ReqPassword}}) ->
      find_user(DbConn, ReqUsername, ReqPassword)
  end.

find_user(DbConn, ReqUsername, ReqPassword) ->
  case db.find_user(DbConn, ReqUsername, ReqPassword) of
    {:error, Reason} -> {:error, Reason};
    {:ok, #user_details{hashed_password = HashedPassword}} ->
      check_password(ReqUsername, ReqPassword, HashedPassword)
  end.

...

At this point we’ve thrown functional design out the window. We are now passing a database connection to a “function” called decode_json() which doesn’t need a database connection. decode_json() is now directly calling find_user() which makes no sense because finding a user really has nothing to do with decoding json. We’re passing a password field around to code that really has no business having a raw password. We have all the coupling problems: code that’s harder to reuse, harder to read, harder to test, harder to refactor.

Unfortunately there’s not a lot we can do about it. To keep clean functions we’d want to have the caller handle the error, but that’s back to exactly the first version we were trying to refactor.

Evidence

To support that this is not just a hypothetical problem, here’s a few examples taken from projects listed at or near the top of trending erlang projects. This is not exhaustive. It doesn’t represent a lot of erlang code, which uses exceptions to side-step the problem, but it also doesn’t take very much hunting to find code like this:

disco_web.erl

validate_payload is a very simple example. One could argue that it could be fixed just by renaming it to run_if_payload_valid, but its still coupling, even if very vanilla coupling. There is no good way to reduce this coupling without making the call sites more complex (or using macros).

ejabberd_auth_sql.erl

This is an example of all the logic in a single function, this doesn’t have the coupling problem, but is getting hard to read, and is not doing nearly as much as business logic often has to do. Note how it’s already getting hard to see what the “false” on line 100 is referring to.

At line 119 there’s a different variation that has almost exactly the same logic except for the furthest nested guts. While there isn’t a coupling problem here, the avoidance of such has caused extensive duplication of code which is also problematic.

boss_web_controller.erl

Here execute_action_check_for_circular_redirect is being passed either arguments that it doesn’t care about. It’s not even actually doing a “check”, instead the check is being forced into the caller, it’s just pattern matching on true or false result of the check. The check is strongly coupled to the execute_action_inner, so there’s no easy way to test the circular redirection logic in isolation.

What does this mean?

Does this mean i should not use erlang?

Not at all! It just has weaknesses; every language has weaknesses. This particular weakness should be kept in mind when switching between languages to avoid taking unfortunate habits to other languages unnecessarily. Things like erlando or Monad for elixir can help in avoiding the problem; unfortunately with the dynamic type-system and existing ad-hoc use of error return values in existing libraries, monadic frameworks will never be as nice as working with similar types in haskell or scala, and these libraries may not be allowed in projects which aim to avoid dependencies.

Exceptions can also be used, with the downsides that they bring.

While I personally would not rule out using erlang, I also wouldn’t recommend it to someone as a language to learn functional programming in, as I fear it will encourage the formation of subideal coding inclinations.

Strengthened thesis

While code written in this style in erlang are technically still functions—mappings to a single output for any given input—I feel we’ve lost the essence of functional programming; creating equations that individually represent truths and composing those in larger and larger pieces into programs. Instead we’re telling the computer to do A and then do B and then do C: imperative programming.

References

  1.  wikipedia: loose coupling
  2. functional design.A Dictionary of Computing. . Encyclopedia.com. 11 Oct. 2016

sse vector programming with gcc

•2012/01/06 • Leave a Comment

GCC supports vector programming against the SSE ALU, but the documentation was scattered and a little tedious to piece together. This will hopefully get you up to speed more quickly.

First off, you need to make a typedef to represent the vector you’d like to operate on:

typedef double v2df __attribute__ ((vector_size (16)));

The above is saying that we want a 16 byte vector of doubles, meaning we’ll have a vector of 2 doubles, the most that pre-AVX supports in the 128 bit xmm registers.

The Variable Attributes documentation on vector_size provides useful clues as to usage:

This attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct.

Aggregates with this attribute are invalid, even if they are of the same size as a corresponding scalar. For example, the declaration:

          struct S { int a; };
          struct S  __attribute__ ((vector_size (16))) foo;

is invalid even if the size of the structure is the same as the size of the int.

What it doesn’t mention is that the newly created type will be aligned, in the case of our above example, on a 16-byte boundary. That means if you use this in a struct you’ll need to treat this differently then an array of doubles to avoid extra padding:

struct state {
        uint64_t        id;
        v2df            pair;
};

such as 8 bytes padding between id and pair above (and a warning if you compile with -Wpadded–highly suggested).

GCC provides some nice syntax for working with vectors. For starts you can use initializer and array syntax (at least with GCC 4.6, possibly earlier):

v2df n = {3.1415926, 2.71828183}
printf("e: %.4f", n[1]);

You can also use the familiar scalar operators for addition, multiplication, bit-shifting, comparison, etc.

GCC 4.7 looks to be allowing scalars to be used direct:

v2df n = {...};
v2df m = n * 2.5;

Until then you have to build a vector out of the scalar yourself:

v2df n = {...};
v2df f = {2.5, 2.5};
v2df m = n * f;

SSE has a number of richer operators that you can’t access directly using syntax. Instead you can use them as x86 built-in functions. For instance, the following will take the pairwise maximum of two vectors:

v2df c = __builtin_ia32_maxpd(a, b);

There’s no documentation for these, so I’d use an opcode reference to find what you want and how it works and then use the corresponding built-in. There are some builtins that will load and store individual values into and out of the vector. I wouldn’t advise using these as GCC will do things like copy your value into a register, then onto the stack and then call the operator on the stack value. If you can use the GCC syntax to get what you need, you’ll be better off using that.

You can read more about the syntax in the GCC vector programming documentation.

Finally, you may need to conditionally use built-ins. For instance, roundpd was only introduced in SSE4.1; if you’re compiling for a target that only has up up to SSE3 GCC will give you an undefined function error. However, it has macros such as __SSE4_1__ you can use to #ifdef around any such cases.

sous vide without an appliance

•2011/12/04 • 1 Comment

I’ve been yearning to try sous vide cooking for a few years but haven’t liked any of the options. This one ($50 DIY) is perfect–low cost of entry, uses a stock pot so doesn’t add yet another appliance to cramp the kitchen, minimum part count.

I spent a little more and got a waterproof PT-100 type thermocouple with the added advantage of higher temperature reading resolution. I also used the auto-tune to setup the PID controller. The manual wasn’t very clear on how the auto-tune works, but a forum thread
cleared up how it works.

After a lot of debate I decided not to get a vacuum sealer. For now i’m using ziploc freezer/microwavable bags. By submerging them into the water just below the top i can get out enough air to allow good thermal transfer.

I used a rib eye for the maiden run with promising results. The texture was excellent but i’d been too conservative with the spices, having read that being locked in a bag the flavors are more intense. I also missed the nice smokey flavor from the grill–next time i’m using it to sear rather than a pan.

If i can get use to my kitchen sounding like a turn signal at an aquarium i think i’ll be getting a lot of use out of this. No sooner had the steak been scarfed, spare ribs were in for a 24 hour slow cook.

quick stab at monads

•2011/02/04 • Leave a Comment

This is very high level, and not generic enough, but let’s say i’m writing a parser for some binary data in Erlang. Assume it doesn’t have any binary/bit syntax. We want to parse something like this c struct:

struct quote {
       double price;
       uint32_t  size;
       enum side side;
       int64_t tv_sec;
       uint64_t tv_nsec;
};

In a language where we can’t mutate anything, our parser would have to look something like this (a lot of erlang code looks like this):

   Price, Data1 = parse_double(Data0)
   Size, Data2 = parse_uint32_t(Data1)
   Side, Data3 = parse_enum_side(Data2)
   Tv_sec, Data4 = parse_int64_t(Data3)
   Tv_nsec, Data5 = parse_uint64_t(Data4)

Monads allow you to abstract out the Data part. It’s still there and passed around, but it’s abstracted away so you don’t have to explicitly deal with it. So our code can effectively look like:

    Price = parse_double()
    Size = parse_uint32_t()
    Side = parse_enum_side()
    Tv_sec = parse_int64_t()
    Tv_nsec = parse_uint64_t()

And you’ll have to trust me on this, but the abstraction works for normal programming things, so we can conditionally call a parse_blah() function, or we can write our own higher level parse function that work exactly like those shown above, all without ever having to manually touch the Data.

Now all we’ve done is abstracted that “state” (Data); this is still completely pure code that isn’t allowed to do any IO or mutate anything. This example isn’t hypothetical, haskell has a binary parser that works exactly this way (my mongodb driver uses it to parse bson)

The trickery for doing IO with monads is in that we could do something similar, except instead of Data which in the last case probably contained the string and maybe an index of where the parser left off at, the Data represents the state of the world. Now we’re passing the state of the world around between functions, but abstracted so you don’t have to explicitly deal with it. The functions remain perfectly pure, the magic only lies in the state-of-the-world variable that while abstracted, you are passing around.

In a sense it is one step further then Erlang. Since erlang (basically) has no global state, no module state, no function-static state–the only state you have is the variables you pass to a function–you pretty much have to pass around _everything_ you need so you have access to it. For example, if you need a file handle deep down in some function (eg the logger), you’ll have to pass that open file handle down through each function call so you have it to call the logger with. The exception is that you can create file descriptors out of thin air at any time, and don’t have to pass around stdin and stdout and friends.

Haskell has taken it one step further and says you can’t do that (well, there are internal functions that break this for efficiency’s sake, but these are rarely used). Haskell says that if you want stdin or stdout, or want the state of the world so that you can open a brand new file descriptor, you have to pass those variables down to where they will be needed. But then after creating that restriction, they turn around and create abstractions that maintain these properties, but abstract away the messy manual code for you.

Monads in themselves are really simple. I think the hard part isn’t the monads themselves but rather the type notation and haskell’s function and implicit currying. It takes a while to get these at a deep level, and it’s really hard to read the monad implementations until you have a solid basis on these.

Extensible parsing via types

•2009/05/17 • Leave a Comment

The fact that c’s type-system is so sloppy has been greatly annoying me lately. Sure, you can give gcc flags such as -Wsign-conversion, but if i create a typedef of int32_t named apples and another called oranges, i’ll be at liberty to do any math operator on them all while gcc keeps a straight face.

On the other side of the spectrum is caml where use have to use different syntax for addition depending on if you adding two integers or two floats.

While in reality i much prefer a good pie to most of the cakes you may offer; if cake is what’s on the menu, i’d like to go ahead and eat it.

Haskell uses a concept of type classes to allow the same addition operator to be used for multiple different types, but in a type safe fashion. Gulp.

Lisp’s parser is extensible through what it calls reader macros; basically you can specify your own parser which will be called whenever your syntax is spotted. Using this you can define new sub-languages with completely different syntaxes.

This got me thinking, what interesting things could we do if Lisp had a static type system. One idea is type based parsing: when you define a typeclass you can specify a parser to be used for it. For example, if i create a price typeclass, i might drop in a parser that will let me write “$123,456.78”. That’s sugar, the substance is that the language could avoid potholes other languages have such as “123UL” in c or auto-typecasting or auto-promotion used widely.

type unification

•2009/02/22 • Leave a Comment

Several times i’ve tried poking into what exactly type inferencing is and how it works; even working through several chapters of the math in the brick book. It was still entirely unclear what type inferencing was.

After poking around some more i finally picked up enough hints to get it. It’s incredibly simple if you explain how it works before you go nuts with the math behind it: unification. Yes, that’s the same unification as seen in languages like prolog.

It’s obvious. It makes perfect sense.

Playing around with the Real World Haskell book i got an extra confirmation:


[1]/tmp$ cat > occ.hs   
occ xs = occ (head xs)                                                           
[0]/tmp$ ghci
GHCi, version 6.8.2: http://www.haskell.org/ghc/  :? for help
Loading package base ... linking ... done.
Prelude> :l occ
[1 of 1] Compiling Main             ( occ.hs, interpreted )

occ.hs:1:9:
    Occurs check: cannot construct the infinite type: a = [a]
      Expected type: a -> t
      Inferred type: [a] -> t
    In the expression: occ (head xs)
    In the definition of `occ': occ xs = occ (head xs)
Failed, modules loaded: none.

Yep, you can get the occurs check from the type system.

Hindley and Milner are probably great chaps, but it would save a lot of confusion if we just called it type unification already.