Understanding Kotlin Coroutines via a Historical Review of Concurrency

Nov 20, 2022

The goal of this post is to help you understand Kotlin coroutines.

Unfortunately, to truly understand coroutines, you need to understand the problems it solves. And the problem coroutines solves was actually introduced by a concurrency solution we came up with to solve a different, earlier problem. And that problem was a introduced by yet another concurrency solution we came up with to solve yet an earlier problem. And so on. This pattern forms a chain that goes back to the beginning of concurrency in computing. So to truly understand coroutines, you must understand all the prior attempts at solving concurrency.

The history of concurrency spans multiple programming languages, operating systems, and computing paradigms. So I’m going to be a bit handwavy at times because I don’t want the discussion to require the reader to know the specifics of all the relevant pieces of technology. The goal of this post is to get the concepts across. So some aspects may be simplified out, if they’re irrelevant to communicating the core concepts.

In particular, all provided code examples are in a pseudocode reminiscent of the appropriate the programming environment. For example, when I’m talking about C, the code examples will be in pseudo-C. When I’m talking about Java, the code examples will be in pseudo-Java, and so on. Again, the goal is to get the general concepts across. These are not intended to be snippets you can copy and paste into your IDE and expect to compile.

The Call Stack

As part of the history of concurrency, we will talk about threads. And it’s challenging to understand threads unless you understand the call stack.

The call stack is a data structure that contains something called “stack frames”. Each stack frame contains any variables local to the currently executing function, and a pointer to the code to execute after the function returns. Informally, you can think of each stack frame as representing one function call. Unless you’re using a low-level language like assembly, the call stack is managed automatically by the programming environment. For example, you typically do not directly manipulate the call stack in C, Java, nor Kotlin.

Whenever you invoke a function, immediately before jumping to the new function, the environment pushes all the local variables and the current instruction pointer address onto the call stack. We then jump to the newly-invoked function. This newly-invoked function has an empty stack frame to work, thus allowing it to use its own local variables and invoke other functions in turn. Once our function is done and ready to return, the stack frame is popped off the call stack. Then the local variables are restored, and the program counter is adjusted to proceed back to the caller.

void function_a() {
  x = 3;
  /*
  When the function on the following line is called, the programming environment stores the value of the local variable `x` and the program counter. Storing the program counter allows execution to resume at the correct spot once function_b() finishes executing. All of this data is stored on the call stack.
   */
  function_b();
  // at this point, `x` is still 3 regardless of what function_b does
}

The call stack is independent of the “heap”, a different section of memory where objects live if they persist across function calls.

No Currency/Batch Jobs

In the beginning, there was no concurrency. Instead, you had “batch jobs”, where a “job” is a program you wrote that does some specific thing. You’d write out your program on punch cards, and then physically take it over to the room where the computer is, and add it to the pile of existing programs that everyone else has submitted ahead of you. You’d then go out and have dinner, and come back a couple of hours later, or perhaps the next day, to find out if your program ran correctly (and what the output was), or if there was a bug, in which case you had to fix your program and try again. The computer very literally ran one program at a time with no interruption, and you waited in line for your turn.

Multiple Processes

Eventually, operating systems added the concept of “processes”. Users could now run multiple programs at the same time1, with each program running in its own process. However, each process had its own address space, so it was impossible to directly share memory values between two processes.

From the developer’s perspective, the primary tool for concurrency was “forking”. In your source code, you would invoke a special OS-provided API fork() which would create an exact clone of your program, copying the call stack and the heap into a new, isolated address space2. Unfortunately, the cloned process still couldn’t directly communicate with the parent process, but it “remembered” everything the parent process did, because all that data was copied over to the new process.

string text = "Hello world!";
bool isClone = fork();
if (isClone) {
  println("I'm the clone AKA child process");
  println("And I remember this data:  " +text);
} else {
  println("I'm the parent process.");
  println("And I still have access to this data:  " +text);
}

Forking was the main mechanism for concurrency used in early C programs.

To get these two processes to communicate, you’d arrange for some rendez-vous point ahead of time. For example, your program might create a temporary file3 with a random name /tmp/eb536a7d and then fork. Now both the parent and the clone know the name of that file. Therefore, they can communicate by reading and writing to that same file.

string path = Files.createNewTempFile();
bool isClone = fork();
if (isClone) {
  int x = 1 + 1; // perform expensive computation in parallel
  NamedPipe pipe = Files.openNamedPipe(path);
  pipe.write(x); // send results to parent
} else {
  int y = 2 + 2; // perform expensive computation in parallel
  NamedPipe pipe = Files.openNamedPipe(path);
  int x = pipe.read(); // read result from child
  int total = x + y;
  println("Combined result: "  + total);
}

Multiple Threads

The next innovation was “threads”. These were initially marketed as “lightweight processes”: Creating a new process consumed a lot of computing resources, and coordinating work between multiple processes was a hassle. In contrast, with threads, each thread had its own call stack, but they all shared the same heap space. That meant that threads could directly communicate by reading and writing to the same location in memory.

Threads is the primary mechanism used in early Java4.

//shared memory
boolean thread1Done = false;
int thread1Result = 0;
boolean thread2Done = false;
int thread2Result = 0;

void main() {
  new Thread(() -> {
    int x = 1 + 1; //expensive computation in parallel
    thread1Result = x;
    thread1Done = true;
  }).start();
  new Thread(() -> {
    int y = 2 + 2; //expensive computation in parallel
    thread2Result = y;
    thread2Done = true;
  }).start();
  while (!(thread1Done && thread2Done)) {
    Thread.yield();
  }
  int total = thread1Result + thread2Result;
  println("Combined result: " + total);
}

Developers soon discovered two problems with threading, however.

The first was that it was very easy to accidentally create race conditions — bugs that manifest depending on the random timing in which two or more threads run. For example, one thread might be performing calculations that depend on a value in memory, while another thread was in the middle of writing to that location in memory, causing chaos. There were several tools to try to manage the chaos, including locks, mutexes, semaphores, latches, and so on. Using too few synchronization primitives would lead to race conditions. Using too many would lead to deadlock, where the program freezes up and can no longer progress. So concurrency was declared “very difficult”, and reserved only for experts or times you felt like there was a dire performance penalty otherwise.

//shared mutable state
int x = 0;
long y = 0;

void function() {
  /*
   Incrementing x like this is not thread-safe. The incrementing happens as two access operations: First, you read x, and then you write x + 1. A possible race condition could be that you read x, compute x +1, someone else writes a different value to x, and then you overwrite their value with your value, leading to corrupt data.
   */
  x++;
  /*
   Writing to y like this is not safe: A long is written in two steps: the upper half, and then the lower half. Some code could read the value in between the two write steps, leading to corrupt data.
   */
  y = 3_000_000_000;
}

The second problem is that threads could only scale to a few hundred or thousand. From a logical design perspective, you often want to assign one thread to one task. For example, if you were implementing a web service, you might want to assign one thread per incoming user request. If you went with this strategy, however, you could only handle up to a few hundred requests simultaneously. The constraint was primarily due to memory, as each thread required a couple of megabytes for its call stack. With a thousand threads, you were already at a couple of gigs just for the call stack, and you still needed more room for the heap.

Various Multithreading Strategies

Many, many different strategies that were explored in an attempt to solve the first problem — avoiding race conditions. We’ll do a very brief survey of them, without going too deep into any of them.

One of the first realizations is that if you have immutable data, you won’t have issues with race conditions. If your data never changes, there’s no risk of seeing the data in different states depending on thread scheduling. However, many people found complete immutability to be too restrictive.

In response to this, various techniques were developed such as the Actor model and Software Transactional Memory.

In the Actor model, you define one or more Actors which “own” some mutable data. An Actor runs in a single thread, so there race conditions are impossible within its data domain. If other threads want to interact with the data in some way, they post a message in the Actor’s “mailbox”, which the Actor can check and respond to at its own rate.

In Software Transactional Memory, you provide lambdas that know how to mutate data in a transaction, and the lambdas are ran optimistically until they can commit successfully without conflict (ideally, on the first run, but if not, they are re-run with an updated view of the mutable data).

In both these cases, the insight they’re relying on is that you can’t have deadlocks if you don’t have any locks. So the problem is simply to devise a model for safely performing mutations. This is called structured concurrency, analogous to how structured programming — forgoing the flexibility of goto statements in exchange for if statements, for loops, and other more restrictive constructs — makes it easier to reason about the possible behaviors of your program.

While these techniques addressed the complexity of shared mutable data, they did nothing to address the limited scaling.

Node.js Style Event Loop and Callbacks

One of the reasons Node.js became very popular very quickly was that it introduced a concurrency model that avoided both problems with the threading model mentioned earlier. Node.js is single-threaded, so you generally did not have to worry about race conditions: whenever you are executing sequential lines of code (e.g. not calling into the library APIs), you don’t have to worry about another piece of code executing in between.5

// shared mutable state
var x;
var y;
var z;

function foo() {
  x = 1;
  y = 2;
  z = x + y;
  /*
   x is guaranteed to be equal to 1, y is guaranteed to be equal to 2, and z is guaranteed to be equal to 3. There is no way for any other piece of code to execute while this function executes.
   */
}

And whatever it was that Node.js was using, “that thing” was lightweight: the Node.js architecture can scale to handle millions of requests simultaneously.

The secret behind Node.js was that its entire library API was designed to be non-blocking6. “Blocking” here refers to anything that causes the CPU to have to wait for some other piece of hardware before proceeding. The two most common forms of blocking are waiting for the hard drive to read or write some information, and waiting for the network device to finish uploading or downloading some data. Together, these can be referred to as “IO” or Input/Output.

In the programming environments that were common before Node.js, you could not directly perform IO. Instead, you would invoke some library API, like the InputStream class from Java. In those pre-Node.js environments, those APIs were blocking: When you invoke those methods, a new stack frame is pushed onto the call stack, the code inside the method executes, and the thread is blocked until the IO operation completed. Perhaps a different thread would get scheduled in, but there was no way for the original thread to progress.

String path = "/tmp/" + "temp.txt"; //compute the path to the file

try {
  Reader r = new FileReader(path);
  String contents = r.readLine();
  /*
   The thread is blocked until the call to `readLine` completes. That means all the resources associated with the thread, such as the call stack, are locked up despite leaving the CPU idle while waiting for the disk read to finish.
   */
  println("Contents: " + contents);
} catch (IOException e) {
  println("Error: " + e);
}

In the Node.js architecture, any API function that performs IO takes a callback lambda which specifies what to do once the IO operation finishes. When you invoke such a function, the environment starts the IO operation, and saves your lambda to be scheduled when the IO operation finishes. Then, it frees up that thread to be used by any other code that wants to run (for example, other lambdas you may have scheduled to run after their respective IOs have completed).

var path = "/tmp/" + "temp.txt"; //compute the path to the file
read(path, (err, contents) => {
  if (err) {
    console.log("Error: " + err);
    return;
  }
  console.log("Contents: " + contents);
});
/*
 Invoking the `read` function above immediately returns. The read is scheduled in the background, and once the read is done, the lambda will be invoked. We can continue doing work here if we want, but in this example, we simply reach the end of our code, releasing the thread to allow other code to perform any work they need to do.
 */

While this solution did solve the two problems introduced by threads, it introduced a new problem of its own, called “Callback hell”. Node.js’s callbacks in theory could compose indefinitely, but in practice, each composition introduced an additional level of nesting which made the code harder to read.

Consider the problem of reading two paths from the file /tmp/temp.txt; the first path is the file whose content you would read, and the second path is the destination to copy that content to. Compare how you would go about implementing this with normal blocking IO…

try {
  Reader r = new FileReader("/tmp/temp.txt");
  String sourcePath = r.readLine();
  String targetPath = r.readLine();
  r = new FileReader(sourcePath);
  String contents = r.readLine();
  Writer w = new FileWriter(targetPath);
  w.writeLine(contents);
} catch (IOException e) {
  println("Error: " + e);
}
println("Copying done.");

… versus the equivalent code using non-blocking callbacks:

read("/tmp/temp.txt", (err, contents) => {
  if (err) {
    console.log("Error: " + err);
    return;
  }
  var (sourcePath, targetPath) = contents.split("\n");
  read(sourcePath, (err, contents) => {
    if (err) {
      console.log("Error: " + err);
      return;
    }
    write(targetPath, contents, (err) => {
      if (err) {
        console.log("Error: " + err);
        return;
      }
      console.log("Copying done.");
    });
  });
});

Furthermore, it’s difficult to use your traditional control structures—such as for-loops—that span across callback boundaries. Considering the following block code:

int numFiles;
try {
  Reader fr = new FileReader("/tmp/control.txt");
  numFiles = fr.readLine().parseToInt();
} catch (IOException e) {
  println("Failed to read control file: " + e);
  return;
}
StringBuilder totalContents = new StringBuilder();
for (int i = 0; i < numFiles; i++) {
  try {
    Reader r = new FileReader("/tmp/file" + i + ".txt");
    totalContents.append(r.readLine());
  } catch (IOException e) {
    println("Failed to read from file #" + i + ": " + e);
    println("Trying to read remaining files...");
  }
}
println("Contents from all files (that succeeded): " + totalContents.toString());

Since you don’t know how many read operations you will perform ahead of time, you can’t hardcode a fixed number of callbacks. Instead, you would probably need to design a lambda that passes itself as a callback depending on some condition:

read("/tmp/control.txt", (err, contents) => {
  if (err) {
    console.log("Failed to read control file: " + err);
    return;
  }
  var numFiles = contents.parseToInt();
  var i = 0;
  var totalContents = "";
  function functionA() {
    if (i < numFiles) {
      read("/tmp/file" + i + ".txt", i.toString(), functionB);
    } else {
      console.log("Contents from all files (that succeeded): " + totalContents.toString());
    }
  }
  function functionB(err, contents) {
    if (err) {
      console.log("Failed to read from file #" + i + ": " + err);
      console.log("Trying to read remaining files...");
    } else {
      totalContents += contents;
    }
    i++;
    functionA();
  }
  functionA();
});

Promise Objects

“Promises” were the initial attempt to resolve the callback hell problem, by providing a user library that could be used in existing programming environments (i.e. it did not require any changes to the language itself). The idea was rather than design all your APIs to take a callback lambda as a parameter, your API should return a Promise object. The caller would then register their callbacks by calling methods on the Promise object.

Recall our simple example of reading a single file, in the callback style:

var path = "/tmp/" + "temp.txt"; //compute the path to the file
read(path, (err, contents) => {
  if (err) {
    console.log("Error: " + err);
    return;
  }
  console.log("Contents: " + contents);
});

Using a promise-style API, the solution would look something like this:

var path = "/tmp/" + "temp.txt"; //compute the path to the file
val promise = read(path);
promise
  .then((contents) => {
    console.log("Contents: " + contents);
  }).catch((err) => {
    console.log("Error: " + err);
  });

The Promise API allowed you to chain then calls as siblings of each other, rather than nest them. It also allowed you to put a single catch call at the end, mimicking the blocking-style of wrapping a whole bunch of code in a try block, with a single catch at the end.

Recall our callback example for reading a file to find out what two files to copy between:

read("/tmp/temp.txt", (err, contents) => {
  if (err) {
    console.log("Error: " + err);
    return;
  }
  var (sourcePath, targetPath) = contents.split("\n");
  read(sourcePath, (err, contents) => {
    if (err) {
      console.log("Error: " + err);
      return;
    }
    write(targetPath, contents, (err) => {
      if (err) {
        console.log("Error: " + err);
        return;
      }
      console.log("Copying done.");
    });
  });
});

Here’s what the promise version would look like:

read("/tmp/temp.txt")
  .then((contents) => {
    var (sourcePath, targetPath) = contents.split("\n");
    return Promise.all([read(sourcePath), targetPath])
  }).then((sourceContents, targetPath) => {
    return write(targetPath, sourceContents);
  }).then(() => {
    console.log("Copying done");
  }).catch((err) => {
    console.log("Error: " + err);
  });

Using control flow was also a little easier with promises, although you still could not cross lambda boundaries. Here’s our callback example for reading from some number of files, where the number is only known at runtime:

read("/tmp/control.txt", (err, contents) => {
  if (err) {
    console.log("Failed to read control file: " + err);
    return;
  }
  var numFiles = contents.parseToInt();
  var i = 0;
  var totalContents = "";
  function functionA() {
    if (i < numFiles) {
      read("/tmp/file" + i + ".txt", i.toString(), functionB);
    } else {
      console.log("Contents from all files (that succeeded): " + totalContents.toString());
    }
  }
  function fucntionB(err, contents) {
    if (err) {
      console.log("Failed to read from file #" + i + ": " + err);
      console.log("Trying to read remaining files...");
    } else {
      totalContents += contents;
    }
    i++;
    functionA();
  }
  functionA();
});

And here is the Promise version:

var promise = read("/tmp/control.txt")
  .then((contents) => {
    var numFiles = contents.parseToInt();
    var innerPromise = Promise.resolve(["", ""]);
    for (var i = 0; i < numFiles; i++) {
      innerPromise = innerPromise
        .then((totalContents, newContent) => {
          return Promise.all([
            totalContent + newContent,
            read("/tmp/file" + i + ".txt")
          ]);
        }).catch((err) => {
          console.log("Failed to read from file #" + i + ": " + err);
          console.log("Trying to read remaining files...");
        })
    }
    return innerPromise;
  }).then((totalContents) => {
     console.log("Contents from all files (that succeeded): " + totalContents.toString());
  }).catch((err) => {
    console.log("Failed to read control file: " + err);
  });

It’s much more straightforward than the callback version, and we can leverage for-loops a little bit, but it’s still more complicated than the blocking version, because there are still some boundaries where the control-flow statements can’t act across.

Async Await

Promises were a promising solution (no pun intended), but faced some limitations with what could be done purely at the library level. So languages like C# and JavaScript introduced new keywords that act as syntactic sugar for Promises, except that because there were no explicit lambdas created, there were no boundaries restricting control flow.

Consider our simple Promise example that reads from a single file:

var path = "/tmp/" + "temp.txt"; //compute the path to the file
val promise = read(path);
promise
  .then((contents) => {
    console.log("Contents: " + contents);
  }).catch((err) => {
    console.log("Error: " + err);
  });

Using the new keywords we have available, it would look like this:

var path = "/tmp/" + "temp.txt"; //compute the path to the file
try {
  val contents = await read(path);
  console.log("Contents: " + contents);
} catch (err) {
  console.log("Error: " + err);
}

In this paradigm, we have two new keywords: async and await. The async keyword (not pictured above) is a modifier on functions, and basically tells the compiler to, in the background, wrap the return value of the function in a promise. So If you were to implement your own read function, you would probably annotate it with the async keyword.

Meanwhile, the await keyword basically means that the expression to its right is a Promise, and we’re going to wrap all the code that comes after it into an implicit lambda that gets put into the then callback of the promise.

We can skip the example where we read a source and target paths from one file, and copy data between the two named files because it’s just as straightforward as you would expect. Instead, let’s skip straight to the example where we don’t know ahead of time how many files we’re reading from:

var numFilesStr;
try {
  numFilesStr = await read("/tmp/control.txt");
} catch (err) {
  console.log("Failed to read control file: " + err);
  return;
}
var numFiles = numFilesStr.parseToInt();
var totalContents = "";
for (var i = 0; i < numFiles; i++) {
  try {
    totalContents += await read("/tmp/file" + i + ".txt");
  } catch (err) {
    console.log("Failed to read from file #" + i + ": " + err);
    console.log("Trying to read remaining files...");
  }
}
console.log("Contents from all files (that succeeded): " + totalContents.toString());

Again, except for the occasional await keyword here and there, this looks almost identical to the blocking version. Its readability is almost maximal.

There are, unfortunately, two problems with these solutions. One of the problems is newly introduced by the async/await paradigm, while the other one was silently lurking ever since we started going down the road of Node.js style callbacks.

The new problem is called the “Function Coloring" problem. The Function Coloring problem refers to the idea that you can separate the functions into different categories (or colors). By default, functions can only be called other functions of the same category. So for example, maybe by default, red functions can only call be called by other red functions, blue functions can only call be called by other blue functions, and so on. Uncolored functions can be called by any function.

“By default” here means that there’s often some back door or language construct that allows you to bridge the world between different colors. But this usually adds restrictions on how you can invoke functions of specific colors. This can add a bit of complexity to the design of your programs. For each additional color, the complexity goes up a little more.7

In the Async/Await paradigm, async functions can only be called by other async functions. In contrast, with other solutions such as Threads, you don’t have to worry about the introduction of any new “colors”.

The second problem, the one that’s been lurking in the background ever since we started with the Node.js paradigm, is that all of these solutions are single-threaded. This wasn’t a huge deal when the majority of computers out there were single-threaded or had low thread count, but nowadays, computers have sufficiently many cores that if you have a compute-intensive program that isn’t multithreaded, you’re not taking full advantage of the hardware available to you.

What we want is a solution that can take full advantage of the multiple cores of a CPU, while maintaining all the progress we’ve made with ease of development so far. And if we could solve the function coloring problem, that’d be great too.

Kotlin Coroutines (finally)

Kotlin Coroutines essentially takes the Async/Await paradigm, and then adds two pieces of APIs: Dispatchers and CoroutineScopes.

Dispatchers decide on which thread a coroutine will run. In a typical setup, there can be two or three dispatchers: one dispatcher which owns a thread pool equal to the number of CPU cores, which is used for computation, and one dispatcher with a dynamically growing thread pool, which is used for any blocking operations. The optional third Dispatcher comes up in UI frameworks where the “UI” thread (sometimes also called the “main” thread) is unique, and so that Dispatcher knows to always schedule the coroutines assigned to it, specifically to the UI thread.

CoroutineScopes are a mechanism to make sure that Coroutines stop running when they are no longer needed. In a typical GUI application, for example, you might use coroutines to perform some background computation whose results will then be displayed in a specific window on the screen. If the user closes that window, then we no longer need to perform the computation. Kotlin’s CoroutineScopes are used to model this: There would be a CoroutineScope associated with the GUI window; the coroutine would be launched within that scope; when the GUI window is closed, the CoroutineScope is also closed (or “canceled” in the Kotlin terminology); and any coroutines associated with that scope get canceled.

In terms of syntax, Kotlin coroutines drop the await keyword, making it implicit. Instead of putting the async modifier on functions, Kotlin has you put the suspend modifier. Otherwise, conceptually, this is basically identical to what you’re used to with the Async/Await paradigm.

For completeness, here’s the example where we read from a control file to determine what other files to read, but it’s essentially identical to the Async/Await version, just without the await keyword (and with some Kotlin-specific syntax that is irrelevant to the topic of concurrency):

val numFilesStr =
  try {
    read("/tmp/control.txt")
  } catch (e: Exception) {
    println("Failed to read control file: " + e)
    return
  }
val numFiles = numFilesStr.toInt()
var totalContents = ""
(0..numFiles).forEachIndexed { i ->
  totalContents += try {
     read("/tmp/file$i.txt")
  } catch (e: Exception) {
    println("Failed to read from file #$i: " + err)
    println("Trying to read remaining files...")
    "" //Evaluate to the empty string
  }
}
println("Contents from all files (that succeeded): " + totalContents.toString());

The above code example assumes that we’re already in some CoroutineScope (e.g. provided by the UI framework we’re using) and that the read method is non-blocking (a “suspend function”, in the Kotlin terminology). If the read function is blocking, then the thread that runs this coroutine will also block. If that thread is the UI thread, that can cause the UI to hang. In those cases, you may need to manually manage which thread your coroutines run on by using the Dispatchers mentioned earlier.

Here’s the same example, but now we’re going to start from a “normal function”. We use the runBlocking helper function to transition to a suspend function (this is our “backdoor” into the colored function space). Then right before we do the read, we explicitly state we want to switch from the default dispatcher to the IO dispatcher:

runBlocking {
  val numFilesStr =
    try {
      read("/tmp/control.txt")
    } catch (e: Exception) {
      println("Failed to read control file: " + e)
      return
    }
  val numFiles = numFilesStr.toInt()
  var totalContents = ""
  (0..numFiles).forEachIndexed { i ->
    totalContents += 
      withContext(Dispatchers.IO) {
        try {
         read("/tmp/file$i.txt")
        } catch (e: Exception) {
          println("Failed to read from file #$i: " + err)
          println("Trying to read remaining files...")
          "" //Evaluate to the empty string
        }
      }
    }
  }
  println("Contents from all files (that succeeded): " + totalContents.toString());
}

The Kotlin solution thus gives us the niceties of the Async/Await paradigm, while also giving us the flexibility of manually controlling thread scheduling to take full advantage of all the cores on CPUs. This is in contrast to solutions like JavaScript’s implementation of Async/Await which gave us nice syntax, but restricted us to using only a single thread at a time8.

This is actually very important on the JVM, because unlike Node.js, most of the IO APIs provided on the JVM are blocking, and thus we often need to switch to the IO dispatcher manually.

And unfortunately, Kotlin’s solution does not solve the function coloring problem.

The Future and Java 19’s Virtual Thread

Java 19 introduces “Virtual Threads”, another attempt at solving the concurrency problems we’ve seen so far. At the time of writing this post, Java 19 is still brand new, so we unfortunately haven’t had enough time to see how well it solves the problem in practice. Still, we can examine the feature and talk about how it might play out theoretically.

Recall that in plain-old-Java, the primary tool for concurrency was the thread: you would create a new instance of the Thread class, passing in a Runnable with the code you wanted to run on the thread, and then you would invoke start on the Thread.

new Thread(() -> {
  System.out.println("I'm running in a new thread.");
}).start();

In Java 19, you would use a builder to produce a factory. Depending on how you configure the builder, the factory might produce traditional threads, or it may produce virtual threads:

ThreadFactory traditional = Thread.builder().factory();
ThreadFactory virtual = Thread.builder().virtual().factory();

traditional.newThread(() -> {
  System.out.println("I'm running in a traditional thread.");
}).start();

virtual.newThread(() -> {
  System.out.println("I'm running in a virtual thread.");
}).start();

The intent here is that you would migrate all your existing thread code to use this builder pattern, and then at that point, it would be trivial to switch between the traditional threads and virtual threads.

Java 19 also made significant changes to its standard library, so that when you invoke IO operations from a virtual thread, that operation is non-blocking. These changes make virtual threads behave very similarly to the Async/Await paradigm with a Node.JS style non-blocking standard library: you simply invoke your IO function, and your virtual thread will get descheduled to allow other virtual threads to run, up until the point where the IO operation completes, at which point your virtual thread will get scheduled to continue execution.

The big difference between Java 19’s solution versus NodeJS’s Async/Await and Kotlin’s coroutines are:

No new keywords. No await, async, or suspend.
No function coloring.

If this actually pans out in practice, then this will be the holy grail of concurrency: You just write your code as usual (though you should still be careful of shared mutable state), and you magically get all the benefits of NodeJS’s Async/Await style of programming.

The biggest mystery in my mind is whether, in real world-projects, we’ll see interoperability problems if you need to work with a combination of virtual threads and traditional threads.

Thank you for reading Nebu’s Newsletter. This post is public, so feel free to share it with a colleague.

I won’t get deep into the difference between concurrency and parallelism here, as it’s not super relevant to understanding coroutines. Very briefly, we started with OSes that provided “time slicing” to give the illusion of parallelism. Then, multicore processors became mainstream, and the typical user was able to get true parallelism. But from both the user’s perspective, and the developer’s perspective, there was no significant difference between the “mere” concurrency provided by time slicing and the “true” parallelism provided by multithreading.

fork() would also return a different value (e.g. true versus false) for the parent versus the clone, so that you could check the return value to find out which one you were, and then decide to do something different depending.

If your operating system supported named pipes, you would probably prefer to use that functionality rather than using just a plain old file. https://en.wikipedia.org/wiki/Named_pipe

Threading in Java changed radically in Java 19 — something I’ll get to towards the end of this post.

Note that this doesn’t mean you don’t need to worry about shared mutable state at all. Even in single-threaded programs, it can be hard to reason about shared mutable state — that’s one of the reasons people advise you not to use global variables. But reasoning about shared mutable state that’s only accessed from within a single thread is much easier to reason about than shared mutable state that’s accessed from multiple threads.

“Non-Blocking IO” is sometimes also called “Async IO” or “Evented IO”.

For example, Java has checked exceptions. By default, a method that throws IOException can only be called by other methods that also throw IOException (or some broader exception like Exception). The “backdoor” is the try-catch statement. It allows an “uncolored” method—one that doesn’t throw any exception—to invoke a method that does throw IOException. You could argue that in Java, every distinct exception type is its own distinct color.

In this context, function coloring is a feature, rather than a price to be paid: The whole point of checked exceptions is to ensure that the caller is aware of what exceptions the function they’re calling might throw.

JavaScript provides no multithreading support at the language level. Many browsers supply a WebWorker API, which lets you perform multithreading at the library level, but this is outside the scope of this article. See https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers for more information.