Created
July 3, 2020 20:32
Revisions
-
serras created this gist
Jul 3, 2020 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,81 @@ ## Some thoughts on building software Lately I have been busy reading some new books on [Domain Driven Design](https://www.oreilly.com/library/view/domain-driven-design-distilled/9780134434964/) (DDD) and [software architecture](https://www.thoughtworks.com/books/fundamentals-of-software-architecture) -- including [a short yet eye-opening one in Python](https://www.cosmicpython.com/) and [a great one in F#](https://fsharpforfunandprofit.com/books/). At the same time, it seems that more people in the Functional Programming world are looking at more formal approaches to modelling -- [some](https://www.hillelwayne.com/) [examples](https://www.youtube.com/watch?v=dtevi8lI4I8) [here](https://wickstrom.tech/writing.html). This has brought some thought from the background of my brain about how we should model, organize, and architect software using the lessons we've learnt from functional programming. Before moving on, let me be clear about this being just a dump of some thoughts, not always well-defined, definitely not always right. My goal is to get feedback from our broad community, spawn discussion, and hopefully end up with a bit more knowledge than before. So feel free to comment below! ### DDD ❤️ FP Whenever I read about DDD, I feel like this is a great match for functional programming. Ideas like [_bounded contexts_](https://martinfowler.com/bliki/BoundedContext.html) have a strong resemblance to [_domain specific languages_](https://en.wikipedia.org/wiki/Domain-specific_language), that is, the idea that instead of directly solving a problem, you should create a small language which makes it easier to talk and specify your solution. Racketeers have even coined the term [_language-oriented programming_](https://www.youtube.com/watch?v=z8Pz4bJV3Tk)! DDD also teaches us that we should distinguish between [_entities_ and _value objects_](https://enterprisecraftsmanship.com/posts/entity-vs-value-object-the-ultimate-list-of-differences/): the former have an inherent identity, whereas the latter do not. Value objects are immutable pieces of data, and we should consider any two value objects with the same data as equal. This sounds pretty much as the kind of data types we usually define in FP languages, right? In fact, I find quite interesting that in OOP languages entities are often easier to define -- reference comparison comes for free, after all -- and one needs to have enough will power to use value objects correctly; whereas in languages like Haskell value objects are the default, and entities are harder to implement. I would argue that the latter option is better anyway, since once databases enter the game one needs that special handling anyway. At this point I always ask myself: OK, where do my awesome sum types enter the game? Since most books assume a OOP-like language, where they are not as directly available as in Haskell or OCaml, we have few examples on how to model using them. However, many value objects lean themselves to such a description; my favorite example is the set of events or commands that may arise in a React-like application, which are modelled as a big data type in [Elm](https://guide.elm-lang.org/architecture/). Right now, this is where I think we should stop: modelling entities as sums seem somehow wrong, even though I cannot really express why. Another important lesson from DDD is that we should think about our integrity boundaries: [_aggregates_](https://martinfowler.com/bliki/DDD_Aggregate.html) forces us to define which objects change together, the _unit of work_ (UoW) pattern brings the idea of transaction to our models. UoW seems to be a [contested topic](https://stackoverflow.com/questions/14696568/avoid-unit-of-work-pattern-in-domain-driven-design) in DDD, but the essence is that we should think hard about the guarantees we should have at each moment, and how to handle different consistency models when our systems become distributed. Here is where I think that formal modelling could shed some light. The current state of affairs is that we develop models mostly on whiteboards, but never really explore or formalize them. Tools like [Alloy](https://alloytools.org/) are great to document those invariants, and figure out possible scenarios we hadn't thought of. You might think "hey, are you proposing to go back to waterfall?" Not at all! The fact that the model is documented means that we can _update_ it whenever our understanding of the domain changes, and get clues about where our actual software needs to be updated. If your system works in a distributed fashion [TLA+](https://lamport.azurewebsites.net/tla/tla.html) can help you detecting possible race conditions, deadlocks, or breakages of eventual consistency. These two tools are examples of _lightweight formal methods_, which do not require a big learning investment. Up to now I've discussed how DDD and FP have many things in common. Something which I feel is unique to the (typed) FP community is the treatment of effects, that is, the idea that we should not only care about properties of _values_ but also of _computations_. The sharpest distinction can be found in Haskell, where pure and side-effectful values take completely different types (and even get different syntax!), but even there we often talk about making a more fine-grained hierarchy of effects. How should we translate this idea to our modelling table? Or is this something which does not belong to the model at all? ### Modelling and coding The role of models is to help us understand better the domains we are talking about. The end goal, however, is to produce a (working) software artifact. I firmly believe that you should choose a language which allows you to translate as many of the invariants as possible from your model into proper checks in your code. For this matter, our FP community has come with several powerful techniques: - I have already mentioned [Racket](https://racket-lang.org/) as part of the Lisp tradition of creating linguistic abstractions to develop software. - Clojure bundles a [spec](https://clojure.org/guides/spec) module to specify the structure of data and its invariants. - Strong static types, as found in [Haskell](https://www.haskell.org/). Those can be taken even further with [refinement types](https://ucsd-progsys.github.io/liquidhaskell-blog/). By the way, it's quite amusing to see how both [Clojure](https://clojure.org/reference/refs) and [Haskell](http://hackage.haskell.org/package/stm) saw the important of dealing with transactions at the code level, and have mature Software Transactional Memory libraries. Even though in some communities we stress the importance of some abstractions like functors and monads, in the grand scheme of things those are one particular way to ensure effect tracking and the integrity of our data. For example, my colleagues at [47 Degrees](https://www.47deg.com/) working on [Arrow](https://arrow-kt.io/) use a [completely different approach](https://github.com/arrow-kt/arrow-fx/pull/169) towards the same goal. ### Architecture If you dive further in DDD, you will surely end up reading about hexagonal, onion, and ports and adapter architectures (spoiler: [they are all variations on the same theme](https://blog.ploeh.dk/2013/12/03/layers-onions-ports-adapters-its-all-the-same/)). At the other side we find the [functional core, imperative shell](https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell) (FCIS) architecture. If you are like me, you'll be very confused. This is another place where the language and techniques of the typed FP community can help us understand what is going on, by talking about [initial and final encodings](http://okmij.org/ftp/tagless-final/course/index.html). Really condensed, we can represent data and computation both as data types we construct and manipulate (_initial_ encoding): ```haskell data Tree a = Leaf a | Node (Tree a) (Tree a) data StatefulComputation s a = Get (s -> StatefulComputation s a) | Set s (StatefulComputation s a) | Return a ``` or as a set of methods we can call (_final_ encoding): ```haskell class Tree t where leaf :: a -> t a node :: t a -> t a -> t a class Monad m => Stateful s m where get :: m s set :: s -> m ``` It is quite common to use the initial style when thinking about _data_, and the final style when thinking about _computations_. Architecting a whole software system using the initial approach looks pretty much as the FCIS architecture. Stealing an example from [_Architecture Patterns with Python_](https://www.cosmicpython.com/), if we need to develop a system which performs some operations over our file system, we can divide the task into the logic which decides _what_ to do and the part which actually _performs_ the operations. To bridge those parts we have to reify the operations as data: ```haskell data FSOperation = Move Path Path | Copy Path Path | NewFolder Path ``` This data type is an initial encoding in disguise. The second part which performs the operations can be thought of as an [_interpreter_](https://softwareengineering.stackexchange.com/questions/242795/what-is-the-free-monad-interpreter-pattern) for that data type. Final encoding is related to [_dependency injection_](https://en.wikipedia.org/wiki/Dependency_injection) (DI). Passing a bunch of functions as arguments could be considered as a very primitive form of DI; the type class mechanism alleviates the need for manual handling and makes the functionality available wherever it's required. Correctly used, both techniques lead to good modularity -- swapping means writing another interpreter or another instance -- which in turn leads to good testability -- you can easily create fake handlers which detect that the behavior is correct. Unfortunately, apart from the technical details, I am not aware of guidelines on when to use one approach over the other. ### Conclusion We need to talk about how to build software using "native FP" approaches -- most books and models seem to follow the older OOP tradition. Let's do it!