Pellucid

Software Transactional Memory

Written by Dustin Whitney on April 1, 2014

Lightning Talk on Software Transactional Memory in Scala given by Dustin Whitney at Scala Sunday. The Actors Pattern has gotten a lot of attention in the Scala ecosphere, but STM is often a good first solution for solving concurrency problems. It’s odd that it hasn’t had as much attention in the Scala world, so this talk aims to show how easy it is to use, and compare it to typical lock-based synchronization by showing how easy it is to make errors with lock-based synchronization.

Presentation notes

Hello, I’m Dustin Whitney. I’m one of the organizers of the New York Scala Meetup, and I’m the CTO of Pellucid Analytics. We’re hiring.

This talk is about Software Transactional Memory (STM). I feel that in the Scala world STM gets less attention than it deserves, which is odd given its popularity in other programming languages, and I honestly think it’s a great first choice when attacking concurrency issues with Scala. So I felt it would be worthwhile to give a lightning talk about it, especially given how simple it is. In fact 30 minutes is probably too much time — let’s get started

What is STM? It’s an alternative to lock-based synchronization, which has been the standard for doing concurrency in Java since its inception.

What is STM?

Alternative to lock-based synchronization

Analogous to database transactions

ACI (ACID with out the ‘D’)

Simple! (well… simplier)

STM is analogous to database transactions in that when a transaction is started, the ACID properties are adhered to, with the exception of the ‘D’ in ACID. I’ll get back to what the ACID properties and how they are manifested in STM in later slides, but it suffices to say that this is a good thing. Also, STM is simple and easy to reason about. What does it look like in Scala?

What does it look like?

libraryDependencies += ("org.scala-stm" %% "scala-stm" % "0.7")

import scala.concurrent.stm._

val protectedInt = Ref(0)

atomic{ implicit transaction=>
  val currentValue = protectedInt.get
  protectedInt.set(currentValue + 1)
}

Eventually STM will be included in the Scala Standard Library, but for now you must include it as a dependency in your sbt build file.

This is the import statement, and after you’ve imported it, there are really only two things you need to know:

  1. There is a ‘Ref’, which protects a piece of shared memory. In this case it is protecting an Integer.
  2. To access the shared memory protected by the ‘Ref’, you must be inside of an atomic block, where a transaction is created. When you’re inside of the atomic block you can access the shared memory with the get method and modify the shared memory with the set method.

Actions taken within the atomic block are ‘atomic’, the ‘A’ from ACID.

Atomic

val protectedInt        = Ref(0)
val anotherProtectedInt = Ref(0)

atomic{ implicit transaction =>
  val currentValue = protectedInt.get
  protectedInt.set(currentValue + 1)

  val anotherCurrentValue = anotherProtectedInt.get
  anotherProtectedInt.set(anotherCurrentValue + 1)
}

Being atomic means that the actions inside of the block are ‘all or nothing’, meaning if any part of an atomic block fails in committing a successful transaction, then none of the values in a Ref will be updated. The example I have here shows two Refs. It’s possible during the execution of this block, one or both of the Refs could be updated by another thread before the block is done executing. Should this occur, no change to either Ref will occur, and the atomic block will then retry.

How is it determined that a success or failure has occurred? STM uses optimistic concurrency control. What that means is, from a high level, when a block is executed a transaction object is created with a unique number related to the state of the Ref. At the end of the block an attempt to commit is made, and if the number of the transaction does not match the current state of the Ref, the transaction is a failure, and the atomic block is retried. Retries will occur until either a success occurs or too many retries occurs. The ‘too many retries’ number is configurable.

Picking up from there, atomic blocks are consistent, the ‘C’ in ACID.

Consistent

val protectedInt = Ref(0)

atomic{ transaction =>
    val currentValue = protectedInt.get(transaction)
    protectedInt.set(currentValue + 1)(transaction)
}

Your transaction acts as a way to always retrieve the same state of the world throughout your transaction. In this example, I’ve removed the implicit keyword from the front of the transaction and passed it into the get and set methods manually to make explicit the fact that it is used to fetch a consistent view of the world. It would not be good if another thread were to update the ref and when fetching it you were fetching those changes, which leads to the next letter in ACID

Isolated

val protectedInt = Ref(0)

atomic{ implicit transaction =>
    val currentValue = protectedInt.get
    protectedInt.set(currentValue + 1)
}

Modifications to the Ref are isolated from the rest of the world until a successful commit is made. Also when a successful commit is made those changes will not be visible to transactions that began before the successful commit.

Durable

STM is not Durable

STM is not durable since it is protecting memory, which is inherently non-durable.

To make clear some of the advantages of STM over lock-based synchronization, I thought I’d highlight a few things.

Advantages: DeadLock / LIVELOCK

val protectedInt = Ref(0)

atomic{ implicit transaction =>
    val currentValue = protectedInt.get
    protectedInt.set(currentValue + 1)
}

STM does not suffer from deadlock or livelock, because there are no locks. To explain what deadlock is, imagine a process that much obtain two locks. One thread has one lock and another thread has the other, and both are waiting for the other to release the lock so they can continue processing. They will continue to wait until the program is terminated. With STM, you simply start a transaction, and go. If something goes wrong, the atomic block will execute again.

Another disadvantage of lock-based synchronization is priority inversion.

Advantages: Priority Inversion

val protectedInt = Ref(0)

atomic{ implicit transaction =>
    val currentValue = protectedInt.get
    protectedInt.set(currentValue + 1)
}

This happens when lots of low priority threads take ahold of a lock and keep a high priority thread blocking. Again, since there are no locks with STM, high priority threads can gain access to the shared resource at will. It should be noted that with STM there can be issues with competition over committing transactions, but resolving these issues are pretty easy since they are easy to track down and debug.

I think the biggest advantage STM has over lock-based synchronization is composability.

Advantages: Composability

val protectedInt        = Ref(0)
val anotherProtedtedInt = Ref(0)

atomic{ implicit transaction =>
  val currentValue = protectedInt.get
  protectedInt.set(currentValue + 1)

  atomic{ implicit transaction =>
    val anotherCurrentValue = anotherProtectedInt.get
    val anotherProctedInt.set(anotherCurrentValue + 1)
  }
}

Locks simply don’t compose. Locks are chosen more or less arbitrarily, and when you wish to have multiple parts of a program work together in a threadsafe manner where locks are involved, you must know what all of the locks are, and what their policies are, and it’s really tough to make happen. With STM, you can nest atomic blocks, and everything works out the way you’d expect.

Gotchas: Immutability!

// bad!
val badMap= Ref(new java.util.HashMap[String, Int]())
val mutableMap = atomic{ implicit transaction => badMap.get }
mutableMap.put(Wrong!, 666)

// good
val goodMap = Ref(Map(Good -> 7))
atomic{ implicit transaction =>
  val tempMap = goodMap.get
  goodMap.set(tempMap + (Good -> 777))
}

Gotchas: Referential Transparency

//bad
val protectedString = Ref(This is a string)
val time = System.currentTimeMillis
atomic{ implicit transaction =>
  if(time % 2 == 0) protectedString.set(Time was even)
  else protectedString.set(Time was odd)
}

//good
atomic{ implicit transaction =>
  val time = System.currentTimeMillis
  if(time % 2 == 0) protectedString.set(Time was even)
  else protectedString.set(Time was odd)
}
Extended example: STM
case class Account(number: Int, balance: Int)
  val accounts = Ref(Map(
    1 -> Account(1, 100),
    2 -> Account(2, 100)
  ))

  def transfer(to: Int, from: Int, amount: Int){
    atomic{ implicit transaction =>
      val map = accounts.get
      val toAccount = map(to)
      val fromAccount = map(from)
      accounts.set(
        map
        + (to -> (toAccount.copy(balance = toAccount.balance + amount)))
        + (from -> (fromAccount.copy(balance = fromAccount.balance - amount)))
      )
    }
  }

Extended example: Synchronized

import java.util._
private val accounts = new HashMap[Int, Account]()
accounts.put(1, Account(1, 100))
accounts.put(2, Account(2, 100))

def transfer(to: Int, from: Int, amount: Int){
  accounts.synchronized{
    val toAccount = accounts.get(to)
    val fromAccount = accounts.get(from)
    accounts.put(to, (toAccount.copy(balance = toAccount.balance + amount)))
    accounts.put(from, (fromAccount.copy(balance = fromAccount.balance - amount)))
  }
}

Extended example: Synchronized2

Can you find the errors?

import java.util.concurrent._
private val accounts = ConcurrentHashMap[Int, Account]()
accounts.put(1, Account(1, 100))
accounts.put(2, Account(2, 100))

def transfer(to: Int, from: Int, amount: Int){
  val toAccount = accounts.get(to)
  val fromAccount = accounts.get(from)
  toAccount.synchronized{
    fromAccount.synchronized{
      accounts.put(to, (toAccount.copy(balance = toAccount.balance + amount)))
      accounts.put(from, (fromAccount.copy(balance = fromAccount.balance - amount)))
    }
  }
}

Extended example: Synchronized3

Can you find the errors?

import java.util.concurrent._
private val accounts = ConcurrentHashMap[Int, Account]()
accounts.put(1, Account(1, 100))
accounts.put(2, Account(2, 100))

def transfer(to: Int, from: Int, amount: Int){
  val toAccount = accounts.get(to)
  val fromAccount = accounts.get(from)
  val (firstLock, secondLock) = if(to > from) (toAccount, fromAccount)
                                else (fromAccount, toAccount)
  firstLock.synchronized{
    secondLock.synchronized{
      accounts.put(to, (toAccount.copy(balance = toAccount.balance + amount)))
      accounts.put(from, (fromAccount.copy(balance = fromAccount.balance - amount)))
    }
  }
}