Pellucid

Scala's Modular Roots

Written by Daniel James on August 15, 2014

Earlier this year Martin Odersky went on a whistle-stop tour of conferences and meetups with his latest talk, ‘Scala — The Simple Parts’. The first stop was the flatMap(Oslo) conference in Norway, and then one might have observed a blur resembling Martin across Europe and the US, with a grand finale at this years’s Scala Days.

Similar to his 2013 talk ‘Scala with Style’, which also concluded its tour at that year’s Scala Days, Odersky’s 2014 talk appeared as a mix of advocacy, evangelism, and an attempt by a language designer to make explicit his authorial intent—as much as that is possible. Due to the scheduling, ‘Scala — The Simple Parts’ had an air of a roadshow (he is a founder of Typesafe after all); due to the content, it also had an air of defensiveness. This latter aspect is, while understandable, rather unfortunate. I will not attempt to describe or diagnose the nature and intricacies of the Scala community, but I have the sense that Odersky’s attempt to highlight the essence of Scala–-the simple parts, as he sees them—was ultimately received with some skepticism. While Odersky seemed successful in drawing out seven core aspects of Scala, it remains a language that, as a whole, is powerful, expressive, unopinionated, and complicated. And what’s more, I also have the sense that many ‘Scala people’ are “ok with that”.

The purpose of this article, however, is not to critique Odersky’s talk and vision, but to explore just a single slide. Naturally, he refined his slide deck over the course of the tour, but one constant and core idea in his talk was Scala as a ‘modular language’. The original academic agenda for Scala was to create a hybrid language that combines object-oriented and functional programming. In his talk, Odersky laments that Scala is seen as precariously balancing between OOP and FP, and argues that rather than call Scala an ‘object/functional’ language—thus setting itself up to be caught in the crossfire of perpetual holy wars—we should instead call it a ‘modular’ language. With this context, the slide I’d like to discuss in depth is titled “Scala’s Modular Roots”. Here, Odersky points out the inspiration he drew from Modula-2, Modula-3, Haskell, and Standard ML; of particular interest is the equivalence he draws between Scala and Standard ML’s module system.

Object ≅ Structure
Class ≅ Functor
Trait ≅ Signature
Abstract Type ≅ Abstract Type
Refinement ≅ Sharing Constraint

Scala à la ML modules

For the rest of this article we’re going to walk through an example to aid our exploration of the above equivalence between Scala and Standard ML. I’m going to use a set data structure adapted from chapter 2.2 of Purely Functional Data Structures by Chris Okasaki. This is a handy source as all the implementations in this book are in Standard ML (and Haskell in the appendix), so we can compare our Scala translations against the original.

Before we proceed much further we should make sure we have an understanding of the ML module system.

An ML structure combines related types, values and other structures, with a uniform naming discipline. An ML signature specifies a class of structures by listing the name and type (or other attributes) of each component. […] ML also provides functors—structures taking other structures as parameters—[…]

L.C. Paulson, ML for the Working Programmer, 2nd Edition, (1996) pp.59

Signatures as Traits

We will come back to ML functors in due course, but let’s start with a signature. To implement a set, we’re going to need to be able to compare the elements, so we’ll demand that our elements are ordered.

signature ORDERED =
sig
  type T

  val compare : T × T -> int
end

ORDERED is our ML signature, which specifies a type T and a function named compare that takes a pair of Ts to an int. The contract is that the result of compare is negative if the first is less than the second, zero if equal, otherwise, positive. This is not the signature listed by Okasaki, but the signature here is closer to the Ordering trait in Scala standard library:

trait Ordering[T] {
  def compare(x: T, y: T): Int
}

This is the first case of Odersky’s equivalence: signatures and traits. There are some obvious differences. Leaving aside the fact that ML signatures are pure interface and Scala traits can contain implementation, the Ordering trait is a parameterized type—it has a type parameter T. In contrast, ORDERED declares a type T as part of the signature in the same manner as it declares the function compare. But, this is no concern, as in Scala we can trade positional type parameters for abstract type members, as follows:

trait Ordering {
  type T

  def compare(x: T, y: T): Int
}

Now we really have a exact correspondence. There is a larger discussion to be had for type parameters versus type members, but we’ll shelve it for now.

Structures as Objects

Now let’s see a structure. The following is a structure that implements the signature for integers. (There is an obvious overflow bug here, but we’ll gloss over that for the sake of brevity.)

structure IntOrdered : ORDERED =
struct
  type T = Int

  fun compare(x, y) = x - y
end

Odersky’s equivalence says we should translate an ML structure as a Scala object. We saw two traits above, so we’ll show both objects.

object IntOrdering extends Ordering[Int] {
  def compare(x: Int, y: Int): Int = x - y
}

object IntOrdering extends Ordering {
  type T = Int

  def compare(x: Int, y: Int): Int = x - y
}

The former is roughly what appears in the standard library. The latter clearly corresponds, modulo syntax, exactly to the ML structure.

Functors as Classes

Now that we have an abstraction for comparable elements in place, we can move on to implementing a set data structure. Let’s see an adaptation of Okasaki’s set in ML in its entirely first. First, the signature:

signature SET =
sig
  type Elem
  type Set

  val empty : Set
  val insert : Elem × Set -> Set
  val member : Elem × Set -> bool
end

This signature declares types for both the collection type and the element type, as well as functions to construct an empty set, insert a element, and test membership.

We’ve reached the next component of Odersky’s equivalence—Scala classes and ML functors—so we need a little more detail on what exactly an ML functor is.

If structure B depends on structure A, and we wish to replace A by another structure A', […] ML lets us declare B to take a structure as a parameter. We can then invoke B(A) and B(A'), possibly at the same time. A parametric structure, such as B, is called a functor. Functors let us treat A and A' as interchangeable parts.

L.C. Paulson, ML for the Working Programmer, 2nd Edition, (1996) pp.257

Okasaki’s implementation of a set is as a binary tree, and it is formulated as a functor so that it is parametric in the element and its ordering—the following is our adaptation.

functor UnbalancedSet(Element : ORDERED): SET =
struct
  type Elem = Element.T
  datatype Tree = E | T of Tree × Elem × Tree
  type Set = Tree

  val empty = E

  fun member (x, E) = false
    | member (x, T (a, y, b)) =
        if Element.compare (x, y) < 0 then member (x, a)
        else if Element.compare (x, y) > 0 then member (x, b)
        else true

  fun insert (x, E) = T (E, x, E)
    | insert (x, s as T (a, y, b)) =
        if Element.compare (x, y) < 0 then T (insert (x, a), y, b)
        else if Element.compare (x, y) > 0 then T (a, y, insert (x, b))
        else s
end

The UnbalancedSet functor take a structure that is an instance of the ORDERED signature, and produces a structure that is an instance of the SET signature. The Elem type of the set is implemented as the T of the ordering structure Element; the Set type is implemented as a basic binary tree and in terms of the Elem type. The rest of the SET signature is implemented as one would expect, making use of the compare function that comes from the Element parameter. To construct a concrete structure, we simply apply the functor to a chosen ordering. The following constructs a structure of integer trees using the ordering structure we defined for integers earlier.

structure UnbalancedIntSet = UnbalancedSet(IntOrdered)

Even if you are unfamiliar with Standard ML, I’m sure the above is clear and comprehensible.

Scala, the object/functional language

Let’s turn our attention back to Scala. While I’m cautious to claim that the following implementation of a set is ‘idiomatic’ Scala, it certainly hits all the facets of Scala that make it an ‘object/functional’ language. It also stakes out a point in the spectrum that is far away from the ML implementation above, which we will attempt to reconcile.

sealed abstract class Set[+A] {
  import Set._

  def insert[B >: A](x: B)(implicit o: Ordering[B]): Set[B] = this match {
    case Leaf => Branch(Leaf, x, Leaf)
    case b@Branch(l,y,r) =>
      if (o.compare(x, y) < 0)
        Branch(l.insert(x), y, r)
      else if (o.compare(x, y) > 0)
        Branch(l, y, r.insert(x))
      else b
  }

  def member[B >: A](x: B)(implicit o: Ordering[B]): Boolean = this match {
    case Leaf => false
    case Branch(l,y,r) =>
      if (o.compare(x, y) < 0)
        l.member(x)
      else if (o.compare(x, y) > 0)
        r.member(x)
      else
        true
  }

  def isEmpty: Boolean
}

object Set {
  def empty[A]: Set[A] = Leaf

  case object Leaf extends Set[Nothing] {
    val isEmpty = true
  }

  final case class Branch[+A](
      left:  Set[A],
      elem:  A,
      right: Set[A]) extends Set[A] {
    val isEmpty = false
  }
}

This implementation is quite different in many regards, and it also highlights many of key features of Scala, some of which made Odersky’s list of ‘simple parts’.

First off, it follows an object-oriented style of methods over functions. For example, set insertion will used as mySet.insert(x) rather than SetStruct.insert(x, mySet)—the set data structure is the receiver of the methods. Related to this is the use of a ‘companion object’, where we provided the static method empty to construct an empty set. In some sense, empty here is much closer to the ML implementation, than insert or member. I’ve chosen to nest Leaf and Branch inside the companion object, for no reason other than to highlight Scala’s flexibility in allowing one to nest most things in most other things.

The set is a parametric type, like our first declaration of the Ordering trait. Moreover, we are using declaration-site variance and we have declared our set to be covariant with the +. (As an aside, see this Stackoverflow question and Scala-user thread for a discussion on the invariance of Set in the Scala standard library.) This still requires use-site variance for the insert and member methods where the parametric type appears in contravariant position for the method parameters. Note that Leaf <: Set[Nothing], and because of Set’s covariance, Leaf <: Set[A] for all A.

In his talk, Odersky made a point of highlighting that ifthenelse is an expression in Scala, not a statement. Another expression is match, which mirrors ML’s pattern matching and function definition by cases. In ML the binary search tree is defined as an algebraic data type.

    datatype Tree = E | T of Tree × Elem × Tree

This is a sum-of-products, recursive representation of a tree. There are two cases E and T, the former being a nullary constructor and the latter a product of a left tree, an element, and a right tree. This is expressed in Scala using the case keywords,

case object Leaf extends Set[Nothing]

final case class Branch[+A](
    left:  Set[A],
    elem:  A,
    right: Set[A]) extends Set[A]

as well as the sealed keyword on the abstract Set class. The combination of these two aspects means that we can pattern match over a closed, finite set of cases (the consequence of a sealed type hierarchy) and as well as extract the contents of each case (using the unapply methods generated for us by case, cf. Chapter 15 of Programming in Scala, First Edition).

Scala being a hybrid object/functional language, there is often more than one way of doing things. What was just described is the functional approach—the approach taken by ML as well as Haskell and other related languages. The object-oriented approach, is demonstrated with the isEmpty method. The method signature is given in the supertype Set and left abstract; the two subtypes, Leaf and Branch, provide an implementation. The correct code path is determined by dynamic dispatch instead of by pattern matching. We could have implemented member and insert this way as well, and there would be no need for sealed or case—indeed, this would be the typical implementation strategy in Java (although the visitor pattern is also an option). In contrast to closed world of the functional approach, this is an open world, where we are free to expand the cases at some later point. For example, we could turn the binary tree into a 2-3 tree, by introducing

final case class Branch4[+A](
    left:   Set[A],
    elem1:  A,
    middle: Set[A],
    elem2:  A,
    right:  Set[A]) extends Set[A] {
  val isEmpty = false
}

Due to openness, we can provide the appropriate implementation for isEmpty here; due to closedness, we would have to go back and rewrite the implementations of member and insert, as there is a now new case. The flip side is that with the closed-world assumption, we can freely add new methods based on pattern matching without having to touch Leaf or Branch. This tradeoff is the essence of The Expression Problem. However, this is a much larger discussion, and we will leave this aside before we get side-tracked any further.

Direct ML style

Returning to the topic at hand, let’s now jump to the other end of the spectrum and attempt to replicate the ML set signature and the ML functor for sets implemented as unbalanced binary trees. The signature is straightforward:

trait SetSig {
  type Elem
  type Set

  def empty: Set
  def insert(e: Elem, s: Set): Set
  def member(e: Elem, s: Set): Boolean
}

There should be nothing surprising here: this is close to a transliteration of the ML signature. However, this becomes a tad tricky when we attempt to do the same for the functor. The following is a first stab, which doesn’t quite work out.

class UnbalancedSet(val Element: Ordering) extends SetSig {
  type Elem = Element.T

  sealed trait Set
  case object Leaf extends Set
  case class Branch(left: Set, elem: Elem, right: Set) extends Set

  val empty = Leaf

  def member(x: Elem, s: Set): Boolean = s match {
    case Leaf => false
    case Branch(l, y, r) =>
      if (Element.compare(x, y) < 0)
        member(x, l)
      else if (Element.compare(x, y) > 0)
        member(x, r)
      else
        true
  }

  def insert(x: Elem, s: Set): Set = s match {
    case Leaf => Branch(Leaf, x, Leaf)
    case Branch(l, y, r) =>
      if (Element.compare(x, y) < 0)
        Branch(insert(x, l), y, r)
      else if (Element.compare(x, y) > 0)
        Branch(l, y, insert(x, r))
      else
        s
  }
}

The primary constructor of the class takes a instance of Ordering (the one we declared earlier) in much the same way that the ML functor took a parameter. The first nuance is that the constructor parameter must be turned into a public member. If we were to instead only do,

class UnbalancedSet(Element: Ordering) extends SetSig {
  

we would get the following compilation error:

[error] private value Element escapes its defining scope as part of type UnbalancedSet.this.Element.T
[error]   type Elem = Element.T
[error]                       ^

The issue here is that we are refining the abstract type member Elem that was declared in the signature to be equal to the type member T of the value Element of type Ordering. Therefore, as Elem is public, Element must also be.

Now we encounter the next issue. While the code above will compile, it will not be very useful to us. Observe the following console session.

scala> object S extends UnbalancedSet(IntOrdering)
defined module S

scala> S.insert(1, S.empty)
 error: type mismatch;
 found   : Int(1)
 required: S.Element.T
              S.insert(1, S.empty)
                       ^

Unfortunately we can’t get very far. This is because the type checker is unable to prove that the type T of the Ordering value Element for this module S is in fact Int. This would require it to “track values in types”. To help us understand the workaround, let’s first expand the shorthand that is occurring in the primary constructor.

class UnbalancedSet(val Element: Ordering) extends SetSig { 

is equivalent to

class UnbalancedSet(o: Ordering) extends SetSig {
  val Element: Ordering = o
  

When object S extends this, the type of Element is still just Ordering, so it has no idea what the concrete type of Element.T is as Ordering#T is abstract. To help the type checker we’re going to need to do some type refinement. The first step is to leave Element abstract.

abstract class UnbalancedSet extends SetSig {
  val Element: Ordering
  

The second step is the refinement.

object S extends UnbalancedSet {
  val Element = IntOrdering
}

And now everything will work as expected. Why? Well, it may be instructive to consider the following. If we were to adjust the above to,

object S extends UnbalancedSet {
  val Element: Ordering = IntOrdering
}

we’d be back where we started—we’ve thrown away type information. If we specify all the types, we get:

object S extends UnbalancedSet {
  val Element: IntOrdering.type = IntOrdering
}

So, when the type checker tries to unify Int(1) with S.Element.T, it now knows that the type of the value Element is IntOrdering.type, and IntOrdering.T is Int, so Element.T is Int. Problem solved!

Mixed ML and Scala style

So, after a little type-gymnastics to ensure that abstract types are fully refined, we have successfully translated ML signatures, structures, and functors into equivalents in Scala. Often, the translation was simple enough to be more akin to a transliteration; however, the style of using our sets,

S.insert(1, S.empty)

is function-like, not method-like. To adhere to the object style of Scala, we would prefer to write something like,

S.empty.insert(1)

where the set data structure is the receiver for methods like insert and member (while empty still makes sense as a constructor function).

To set this up we need something akin to a nested signature:

trait SetModule {
  type Elem
  type Set <: SetSig

  trait SetSig {
    def insert(e: Elem): Set
    def member(e: Elem): Boolean
  }

  def empty: Set
}

The outer trait defines two abstract type members, Elem and Set, same as before, however, now Set is constrained by the inner trait SetSig, which declares the signature of set-like objects. Finally there is the empty constructor. Note that we could have done something like,

trait SetModule {
  type Elem

  trait SetSig {
    def insert(e: Elem): SetSig
    def member(e: Elem): Boolean
  }

  def empty: SetSig
}

however, the types here are slightly less precise. Before, we could say that whatever refines Set by implementing the SetSig signature, that’s the type that insert and empty will return. As an aside, the naming convention to draw the parallels with ML is breaking down here: I have chosen the suffix Module for the outer trait and Sig for the inner trait.

Finally, we can implement an abstract class that implements SetModule using the same unbalanced binary tree as before.

abstract class UnbalancedSet extends SetModule {
  val Element: Ordering
  type Elem = Element.T

  sealed abstract class Set extends SetSig {
    def member(x: Elem): Boolean = this match {
      case Leaf => false
      case Branch(l, y, r) =>
        if (Element.compare(x, y) < 0)
          l.member(x)
        else if (Element.compare(x, y) > 0)
          r.member(x)
        else
          true
    }

    def insert(x: Elem): Set = this match {
      case Leaf => Branch(Leaf, x, Leaf)
      case Branch(l, y, r) =>
        if (Element.compare(x, y) < 0)
          Branch(l.insert(x), y, r)
        else if (Element.compare(x, y) > 0)
          Branch(l, y, r.insert(x))
        else
          this
    }
  }
  case object Leaf extends Set
  case class Branch(left: Set, elem: Elem, right: Set) extends Set

  val empty = Leaf
}

This presents something that is definitely inspired by ML’s module system, but retains a bit more of Scala’s character.

Conclusions

I should make it clear that none of what has been presented here is novel: the techniques for expressing modular systems in Scala is clearly near and dear to Martin Odersky’s heart. In fact, the final formulation of SetModule follows the same formulation as a module for a graph data structure that Odersky shows towards the end of his ‘Simple Parts’ talk (and in more detail in a CACM article ‘Unifying functional and object-oriented programming with Scala’). Abstractly providing the element type of the set via the Ordering signature does go a little beyond what Odersky shows with his graph module; however, the issue of helping the type checker along with type refinement should be familiar to those who have written bigger macros, as that is exactly what is needed when passing the path-dependent macro context around components of a large macro.

Scala is clearly a powerful language: powerful enough that you can craft your own module system, and emulate a very well regarded one, such as ML’s, as we have seen here. What I have presented, I am not necessarily advocating or endorsing—although Odersky has used something similar to promote Scala, the ‘modular’ language—I am simply attempting to explore the design space and hope to encourage others to do the same. Another powerful language feature in Scala are implicits. There are plenty of use cases, but probably the best known is using them to emulate Haskell’s type classes. Projects such as Miles Sabin’s Shapeless are at the cutting edge of what is possible with Scala, and implicits are used there to great effect. Operations on Shapeless HLists take implicit parameters that amount to proof objects—proofs of properties such as the length of the list. This is commonly seen in fully dependently-typed languages such as Coq, Agda, and Idris; on the latter, one should watch Miles Sabin and Edwin Brady’s talk ‘Scala vs. Idris’. Implicits were barely used in this article, only when showing the initial, ‘idiomatic’ Scala implementation of a set; however, it seems reasonable to suspect that implicits could be used to greater effect when crafting a module system inspired by ML’s—a language feature that ML does not itself possess.

When first talking about the Ordering trait we briefly discussed the choice between type parameters and abstract type members, but set aside any detailed discussion. This was alluding to another talk by Oderky, ‘The Trouble With Types’ at Strange Loop ’13, where he contrasted parameters with abstract members, characterizing the former as positional and functional, and the latter as nominal and object-oriented (or modular). We have seen some of that contrast in this article. The larger news from the talk at Strange Loop was the future of Scala in DOT, a calculus for ‘Dependent Object Types’, which is proposed as a type-theoretic foundation for Scala. Should a future Scala have DOT as its core, type parameters would certainly remain, given their convenience, but they would be syntactic sugar on top of abstract type members. This is the modular language on the horizon.

Finally, I should highlight what has been left out here. As objects are values in Scala, that makes what we have show a first-class module system. While some members of the ML family have added extensions to cover this, an ML module is not first-class as standard. It would be interesting to discuss the consequences of this further. Also, we have left two aspects of Odersky’s equivalence between Scala and the ML module system, namely abstract types in an ML sense and sharing constraints. This will also need to be left for another time.

The following are some links for further reading.

Scala — The Simple Parts

Purely Functional Data Structures

Scala and Modules

Bio

Dan James is an API-Lead at Pellucid Analytics. He has a PhD in Computer Science from Oxford University. He’s a major contributor to our open source project Datomisca, a Scala library for Datomic, as well as our Framian and AWS-Wrap libraries. Rich Content