A Time Traveling Key-Value Store

2019-04-16

Never Give Up

Once upon a time I interviewed for an ML/DS/software engineering position. I didn’t get the gig (as the inimitable Time Hopper says, never give up), but came away with a fun programming exercise. In the room, I put together some reasonable Python to solve and test it, but I wanted to try my hand at using cats-effect to work through this in some functional Scala.

What is Your Problem and Why is it Interesting?

Your mission, should you choose to accept it, is to implement (and test) a data structure that looks a bit like a dictionary/map/hash/associative array/whatever your language calls it, but with a twist. In pseudo-Scala, your Time Traveling Key-Value Store (TTKV from now on) should have two methods that look a bit like the following (note that we change this a little in our implementation for functional programming goodness):

def put(key: K, value: V): Unit
def get(key: K, time: Option[Long] = None): Option[V]

Calling put should store the (key, value) pair in your TTKV. Calling get, however, has some interesting behavior. If you don’t supply an optional time, it returns the latest value you stored with the key. If you do supply a time, TTKV returns the value it had for key at that time. For instance, if you first call TTKV.put("a", 1) at time 3, then later at time 7 you call TTKV.put("a", 2), then

My Python in-the-room implementation relied on side-effects, reading a global nanosecond clock. This was less than ideal, and though it met the remit, it wasn’t easily testable—which I suspect was the point of the exercise.

By its nature, put involves an effect: reading a time. To do things functionally, I’ll turn to cats-effect and explicitly make note of this effectful computation. This involves a slight change to TTKV’s API; while get stays the same, we’ll change put to the following:

def put[F[_]](key: K, value: V)
             (implicit sync: Sync[F], clock: Clock[F]): F[TTKV[K, V]]

First, rather than mutating TTKV in place, put returns a new one with the (key, value) pair added for a specific time. Second, we’re explicitly noting that this work takes place inside a Clock effect, from which we grab the current time. I like how we can still get the same way as before, since querying a TTKV shouldn’t have to concern itself with effects, but this design specifically highlights that put is different.

Implementation

Here’s my implementation in its entirety (again, head to GitHub to see it in action):

package ttkv

import cats.effect.{Clock, Sync}
import cats.implicits._
import alleycats.std.iterable._
import scala.concurrent.duration.NANOSECONDS
import scala.collection.immutable.LongMap

case class TTKV[K, V](inner: Map[K, LongMap[V]]) extends AnyVal {
  def put[F[_]](key: K, value: V)
               (implicit sync: Sync[F], clock: Clock[F]): F[TTKV[K, V]] =
    clock.monotonic(NANOSECONDS).map { t =>
      val m = inner.getOrElse(key, LongMap.empty) + (t -> value)
      TTKV(inner + (key -> m))
    }

  def get(key: K, time: Option[Long] = None): Option[V] =
    for {
      m <- inner.get(key)
      keys = m.keys.filter(_ <= time.getOrElse(Long.MaxValue))
      t0 <- keys.maximumOption
      value <- m.get(t0)
    } yield value

  def times(key: K): List[Long] =
    inner.get(key).map(_.keys.toList.sorted).getOrElse(Nil)

  def times: List[Long] =
    inner.keys.toList.flatMap(times).sorted
}

object TTKV {
  def empty[K, V]: TTKV[K, V] = TTKV(Map.empty)
}

Will it Blend?

In the room, I wrote some simple unit tests for my Python implementation. For this functional version, I put together some property-based tests with ScalaCheck. You can find the entirety on GitHub; I’ll copy some highlights here.

First off, calling get on an empty TTKV should return None:

property("initially empty") = forAll {
  (a: Char) =>
    TTKV.empty.get(a) == None
}

Similarly, calling get immediately after a put should return the same thing you put:

property("single get") = forAll {
  (a: Char, x: Int) =>
    TTKV.empty.put(a, x).map(_.get(a) == Some(x)).unsafeRunSync
}

More importantly, calling get with any time prior to when you put a thing should return None:

property("get before time") = forAll {
  (a: Char, x: Int, t: Long) =>
    (t > 0) ==>
      TTKV.empty
        .put(a, x)
        .map { m =>
          val t0 = m.times(a).head
          m.get(a, Some(t0 - t)) == None
        }.unsafeRunSync
}

And for two puts with the same key, calling get with any time between the two times should return the value from the first put:

property("middle get") = forAll {
  (a: Char, x: Int, y: Int, d: Double) =>
    ((0 <= d) && (d < 1)) ==>
      TTKV.empty
        .put(a, x)
        .flatMap(_.put(a, y))
        .map { m =>
          val ts = m.times(a)
          val t0 = ts.head
          val t1 = ts.last
          val δ = (t1 - t0) * d
          m.get(a, Some(t0 + δ.toLong)) == Some(x)
        }
        .unsafeRunSync
}

Appendix

Here’s the flake.nix file to specify the W O R L D:

{
  description = "Scala TTKV";

  inputs = {
    flake-utils.url = "github:numtide/flake-utils";
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-22.11";
  };

  outputs = {
    self,
    flake-utils,
    nixpkgs,
  }:
    flake-utils.lib.eachDefaultSystem (system: let
      pkgs = import nixpkgs {inherit system;};
    in {
      packages.default = pkgs.writeShellApplication {
        name = "ttkv";
        runtimeInputs = [pkgs.sbt];
        text = ''
          ${pkgs.sbt}/bin/sbt test
        '';
      };
    });
}

The build.sbt is similarly minimal:

lazy val root = (project in file("."))
  .settings(
    name := "ttkv",
    libraryDependencies ++= Seq(
      "org.typelevel" %% "cats-effect" % "1.3.0",
      "org.typelevel" %% "alleycats-core" % "1.6.0",
      "org.scalacheck" %% "scalacheck" % "1.14.0" % "test"
    ),
    scalacOptions ++= Seq(
      "-feature",
      "-deprecation",
      "-unchecked",
      "-language:postfixOps",
      "-language:higherKinds",
      "-Ypartial-unification"
    )
  )

To run the tests, poke around in the code, etc. feel free to copy the whole ttkv directory from my programming workbench on GitHub.