Designing APIs for success

Posted on Aug 30, 2022 by Andrew January

APIs should make doing the right things obvious, and the wrong things hard.

One of the API design principles we try to adhere to at Evertz is to help developers fall into the pit of success.

He admonished us to think about how we can build platforms that lead developers to write great, high performance code such that developers just fall into doing the “right thing”. Rico called this the Pit of Success. That concept really resonated with me. More generalized, it is the key point of good API design. We should build APIs that steer and point developers in the right direction.

The Pit of Success - Brad Abrams

When designed well, APIs should make doing the right things obvious, and the wrong things hard.

A rainbow emanating from the pit of a waterfall

It was with this in mind that we recently redesigned the API for one of the internal caches used in evertz.io Stream.

The previous API looked something like this:

interface CacheManager {
  void acquireReadLock();
  void releaseReadLock();

  void acquireWriteLock();
  void releaseWriteLock();

  // CacheItem CRUD methods
  CacheItem getItem(String id);
  void updateItem(CacheItem item);
  void deleteItem(String id);

  // CacheBlock CRUD methods
  CacheBlock getBlock(String id);
  void updateBlock(CacheBlock block);
  void deleteBlock(String id);
}

CacheItems can reference other CacheItems in the cache, and CacheBlocks reference CacheItems. We need to maintain consistency and prevent, for example, a CacheBlock referencing a deleted CacheItem. To achieve this, we use a mutex, specifically a ReadWriteLock. This ensures that only one thread is modifying the cache at any one time, and that no threads can read from the cache while it is being modified.

The old design mirrors that of the ReadWriteLock, providing pairs of acquire and release methods for the read and write locks. Unfortunately this has a few downsides:

  1. You have to remember to pair the acquire and release calls.
  2. You have to manually ensure the CRUD methods are only ever called in-between acquire and release calls.

It turns out in practice it is pretty easy to remember to pair up the acquire and release calls. You very quickly get used to seeing this pattern, and anything else immediately jumps out as strange:

cacheManager.acquireReadLock();
try {
  // Read from cache
} finally {
  cacheManager.releaseReadLock();
}

That said, it’s hardly nudging developers into the pit of success, is it?

The second problem is a little more pernicious. Calls to the CRUD methods can be buried deep in the call hierarchy. It isn’t always immediately obvious whether it is safe to call one of the methods on the CacheManager or not. And from the other side, it isn’t clear if you need to acquire a lock before calling a method. Does it need the read lock? Does it call a CRUD method? Or does one of the dozen methods it calls? Or one of the dozen they call? You have to meticulously document the lock requirements in the Javadocs.

A developer looking puzzled at some code on screen

A quick win would be to check inside the CRUD methods whether we are currently hold the appropriate lock. However, runtime checks like these are low down on the hierarchy of problem identification.

Compile time is better than startup time and test time, which are all better than runtime

It would be much better if the compiler could tell us that we don’t hold the correct lock. Enter the object capabilities model. Java doesn’t let us create a foolproof object capabilities model, but it does let us approximate one. It might not let us pick the developer up and throw them down the pit of success, but we can at least give them a good nudge.

The core idea is to move the CRUD methods onto a separate class, and only allow access to instances of that class by calling the acquire method. That way, in order to call a CRUD method, you have to call the acquire method.

interface CacheReader {
  CacheItem getItem(String id);
  CacheBlock getBlock(String id);
}

interface CacheWriter extends CacheReader {
  void updateItem(CacheItem item);
  void deleteItem(String id);

  void updateBlock(CacheBlock block);
  void deleteBlock(String id);
}

interface CacheManager {
  <T> T doInReadLock(Function<CacheReader, T> action);
  void doInWriteLock(Consumer<CacheWriter> action);
}

You can then replace the try/catch blocks with lambdas:

cacheManager.doInWriteLock(cacheWriter -> {
  CacheItem item = cacheWriter.getItem(id);
  if (isExpired(item)) {
    cacheWriter.deleteItem(item);
  }
});

And those deeply nested methods? Rather than having to try and use Javadocs you can accept a CacheReader/CacheWriter as an argument. Now the only way they can be called is if someone called doInReadLock/doInWriteLock and passed the instance down.

boolean isSequence(CacheReader cacheReader, String id) {
  CacheItem item = cacheReader.getItem(id);
  return item != null && item.getType() == CacheItemType.SEQUENCE;
}

There is a way that a particularly determined developer can break the model by leaking the reference:

CacheWriter leakedCacheWriter;
cacheManager.doInWriteLock(cacheWriter -> {
  leakedCacheWriter = cacheWriter;
});
// This happens outside of the write lock!
leakedCacheWriter.deleteItem(item);

Fortunately this anti-pattern isn’t particularly natural, and jumps out in code-reviews. That said, we can provide a little extra protection. We might not be able to do anything at compile time, but we can pick it up at runtime by adding a flag to the CacheReader/CacheWriter that we clear when we release the lock.

By putting a little bit of extra thought into the API design, we have been able to use the object capability model pattern to make it easy and natural to do the right thing: only call CRUD methods inside of a lock. These sort of ergonomic improvements not only reduce bugs, but also give the developer peace of mind that the obvious solution is the right solution.

Andrew January
Senior Principal Engineer