The Tropical (min-plus) semiring is one of my favorite examples of how changing one’s perspective can make difficult problems much simpler.^{1} In this semiring, instead of using the usual addition and multiplication, we replace them with minimum and addition, so, for instance, \( 1 \oplus 2 = 1 \) and \( 3 \otimes 2 = 5 \). I’ve written and presented about this before.
Recently, someone on the APL Farm posted
an excellent
article
by sigfpe
(AKA Dan
Piponi), which prompted me to post a
different
article where they work with the min-+ semiring.
I also posted this article by
Russell O’Connor, which is one of my all-time favorites.
In it, there’s an example where the distances between nodes in a graph gets
turned into a Tropical matrix, and algorithms for computing shortest distances
become iterated linear algebra over this different semiring.
And then I wanted to work in APL instead of Haskell…
After a bit of head scratching, I realized that Dyalog
APL is really good for manipulating matrices over the
min-+ semiring.
Taking exampleGraph2
from O’Connor’s blog post, and letting \( 10^{20} \)
stand in for \( \infty \)^{2}:
inf←1e20
dist←6 6⍴0 7 9 inf inf 14 7 0 10 15 inf inf 9 10 0 11 inf 2 inf 15 11 0 6 inf inf inf inf 6 0 9 14 inf 2 inf 9 0
we have the following distance matrix
dist
0.0E0 7.0E0 9.0E0 1.0E20 1E20 1.4E1
7.0E0 0.0E0 1.0E1 1.5E1 1E20 1.0E20
9.0E0 1.0E1 0.0E0 1.1E1 1E20 2.0E0
1.0E20 1.5E1 1.1E1 0.0E0 6E0 1.0E20
1.0E20 1.0E20 1.0E20 6.0E0 0E0 9.0E0
1.4E1 1.0E20 2.0E0 1.0E20 9E0 0.0E0
With this matrix, representing distances between vertices in our graph, we can
use the APL matrix product operator .
with different operations to perform
our calculations; instead of +
and ×
for addition and multiplication, we
turn to ⌊
(min) and +
; then dist ⌊.+ dist
gives us the two-hop distances:
dist ⌊.+ dist
0 7 9 20 23 11
7 0 10 15 21 12
9 10 0 11 11 2
20 15 11 0 6 13
23 21 11 6 0 9
11 12 2 13 9 0
We can make this more succinct with ⍨
, telling the interpreter to apply our
function with dist
as both arguments:
⌊.+⍨ dist
0 7 9 20 23 11
7 0 10 15 21 12
9 10 0 11 11 2
20 15 11 0 6 13
23 21 11 6 0 9
11 12 2 13 9 0
I was very excited that I could write down the steps of the
Gauss-Jordan-Floyd-Warshall-Kleene algorithm in so few characters!
Moreover, iterating that step until convergence is similarly succinct using the
power operator ⍣
; we can keep running ⌊.+
until the output matches (≡
) the input:
⌊.+⍨⍣≡dist
0 7 9 20 20 11
7 0 10 15 21 12
9 10 0 11 11 2
20 15 11 0 6 13
20 21 11 6 0 9
11 12 2 13 9 0
Imagine my surprise when I searched for “distance” in Iverson’s notation as a tool of thought:
]]>and if
p
gives distances from a source to transhipment points andq
gives distances from the transhipment points to the destination, thenp⌊.+q
gives the minimum distance possible.
I’m excited to announce that our paper on using classification performance on imbalanced datasets as a proxy for measuring generative model quality up on the arXiv! It combines a lot of interesting techniques, like post-hoc testing, the Bayesian boostrap, and explainable AI techniques, all to get an idea of how well a quantum generative model performs. Please check it out!
]]>Robert Smith, a.k.a. stylewarning
, has a lovely blog
post that walks
through implementing an interpreter for a “general-purpose quantum programming
language called \( \mathscr{L} \).”
In only 150 lines of Common Lisp, the
implementation is
featureful, self-contained, and a delight to read.
At first I was content with only reading the post and code, but then I saw someone else had put together an OCaml implementation as well. The game was afoot!
Eventually I put together a Rust
implementation.
This version weights in at more than twice the line count as the
original—even though it relies on
ndarray
for linear
algebra instead of implementing it by hand!
However, it does have a couple of features not present in the original (or
OCaml) implementation(s):
Machine
takes an unsigned 64-bit seed
for repeatable PRNG
behavior; andpeg
crate, I added parsing, so one can pass
"H 0\nMEASURE"
as a string and get an interpretable quantum program.The implementation is
here in my
catch-all “workbench” repository.
It’s definitely not production-worthy code; it’s got an assert!
that will
panic if not met, and it uses expect
to pave over some errors.
Please feel free to take a look and let me know what you think!
In playing around with OCaml, I’ve spent some spare time perusing more programming language resources. There are a bunch out there, especially in this era of online learning; for instance, Cornell has made some great resources available. In looking specifically for more ML-ish^{1} flavored ones, I came across this fun undergrad homework set. In the second problem, it asks you to implement a tiny (recursively defined) language of expressions encoding functions from the unit square to the unit interval, \( [0, 1]^2 \mapsto [0, 1] \).
type expr = VarX
| VarY
| Sine of expr
| Cosine of expr
| Average of expr * expr
| Times of expr * expr
| Thresh of expr * expr * expr * expr
After you learn to build them up randomly, you can evaluate them across the unit square to generate pictures—have each output value encode either a greyscale value, or one of an RGB triple. Note to undergrads: please don’t cheat off me; you’ll only rob yourself of a fun learning experience.
This was a fun exercise in OCaml, but I also found it instructive to tackle it in Rust and compare. Besides the superficial differences, Rust’s concern with and ability to control memory usage is apparent. I defined the core enumeration thusly:
pub enum Expr {
X,
Y,
SinPi(Box<Expr>),
CosPi(Box<Expr>),
Mul(Box<Mul>),
Avg(Box<Avg>),
Thresh(Box<Thresh>),
}
I renamed some variants to more accurately express their intent; e.g.
SinPi(expr)
evaluates to \( sin(\pi \cdot \) expr
\(\!)\).
More importantly, I reworked the recursive variants to each contain (at most) a
single Box
(i.e. unique pointer to a heap allocated value); for instance,
Expr::Mul
contains a Box<Mul>
, where Mul
is defined as
pub struct Mul {
e1: Expr,
e2: Expr,
}
I took up this trick after
learning
more about it from the inimitable matklad.
As the linked response says, this keeps the size of the base enum smaller.
In the source code, I’ve got a property
test
that asserts the size of an arbitrary^{2} Expr
is exactly 16 bytes^{3}:
#[test]
fn size_of(e in arb_expr()) {
assert_eq!(size_of_val(&e), 16)
}
If we instead went with a more direct translation of the OCaml code, like
diff --git a/src/lib.rs b/src/lib.rs
index 3b04a74..68c600c 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -7,9 +7,9 @@ pub enum Expr {
Y,
SinPi(Box<Expr>),
CosPi(Box<Expr>),
- Mul(Box<Mul>),
- Avg(Box<Avg>),
- Thresh(Box<Thresh>),
+ Mul { e1: Box<Expr>, e2: Box<Expr> },
+ Avg { e1: Box<Expr>, e2: Box<Expr> },
+ Thresh { e1: Box<Expr, e2: Box<Expr>, e3: Box<Expr>, e4: Box<Expr> },
}
then this property test would no longer pass, as the enum needs to be large enough to contain its largest variant.
Once I coded up the library, I also put together a small
executable to
take in some parameters and generate greyscale or RGB pictures, similar to the
original exercise.
With that defined, I could run it a bunch of times using
jot
to
generate random seeds and parallel
to use multiple cores:
jot -w %i -r 20 0 1000000000 | parallel --bar ./target/release/ra -s {} -d 7 -r
Here are some gems I found combing through the PNGs:
I really like property-based testing, in this case with
proptest
. ↩︎
Since
size_of_val
returns the size of the pointed-to value in bytes, this test returns 16 on
my 64-bit laptop; it will probably fail on a 32-bit architecture. Be sure
to check out
this awesome
article for more on Rust enums. ↩︎
I’ve wanted to level up my OCaml understanding for a while now, but between doing a fair amount of work in Rust at my current job and already having a bit of Haskell, I wasn’t sure I could justify the “distraction.” But between some excellent resources available online, some interesting reading (more on that in a moment), and the newest 5.0.0 release, I decided it was time to dig in!
The first reading topic I’ve stumbled across recently come from an extremely prolific CS researcher, Oleg Kiselyov who has written a lot of amazing papers and code. One such project is the tagless-final style of embedding domain-specific languages in a typed functional language like Haskell or OCaml. I can’t do it justice, it’s truly amazing stuff, you should go read it yourself. But! Having seen some of the work before in Haskell, it was intriguing to look through this work and compare and contrast Haskell and OCaml versions.
The second topic is modules, as in “module systems” and “modular programming”.^{1} They’re really interesting; indeed, some say modules matter most. OCaml (like SML and other related languages) allow for some really interesting flexibility around designing and implementing interfaces. I won’t be able to explain as well as the above link or this more verbose and less esoteric explanation.
Which leads me to the point of my little
project named kompreni
.
While it is certainly nowhere near as complete as, say,
bastet
,^{2} I wanted to use
describing abstract algebra ideas as a way to explore expressing things in
OCaml.
We’ll start with a semigroup; a semigroup consists of a set \( S \) and an associative binary operation \( \cdot \) on that set; that is, for any \(x\), \(y\), and \(z\) in \(S\), it must be the case that \( x \cdot (y \cdot z) = (x \cdot y) \cdot z \).
In kompreni
, I’ve written a Signature
for semigroups:
module type Signature = sig
type t
val ( +& ) : t -> t -> t
end
along with an accompanying Laws
functor, a function which takes any module
implementing the above Signature
and creates a new module:
module Laws (S : Signature) = struct
open S
let associative x y z = x +& (y +& z) = x +& y +& z
end
One thing I miss compared to Haskell is being able to specify the precedence
and associativity of user-defined operators like +&
.
Since it begins with the plus symbol, OCaml treats it like a plus; ah well.
That means that this expression above, when formatted with dune
fmt
, looks like it’s missing some parentheses;
OCaml says +
is left-associative, so it says +&
is too.
With that signature and functor together, I’ve next turned to
qcheck
for property-based testing in
OCaml.
For instance, in kompreni
’s test suite I’ve got the following.
First we define a base signature for testing:
module type Testable = sig
type t
val gen : t QCheck2.Gen.t
end
Then we combine it with the Signature
above to actually test the associative
property:
module SemigroupLaws (X : Testable) (S : Semigroup.Signature with type t = X.t) =
struct
include Semigroup.Laws (S)
let tests =
[
make_test
QCheck2.Gen.(triple X.gen X.gen X.gen)
"associative" (uncurry3 associative);
]
end
With other scaffolding (like uncurry3
) defined elsewhere, this takes two
modules—one implementing the testable interface and another that is a
Semigroup
with the same internal type t
—and defines a list of qcheck
tests.
kompreni
also contains monoids,
semigroups with an identity element we’ll call zero
:
module type Signature = sig
include Semigroup.Signature
val zero : t
end
Note that include
statement, which says “bring in the body of the Semigroup
signature, too.”
We also have more Laws
for monoids to fulfill, in addition to associative
:
module Laws (M : Signature) = struct
open M
include Semigroup.Laws (M)
let left_id x = zero +& x = x
let right_id x = x +& zero = x
end
which we can also test:
module MonoidLaws (X : Testable) (M : Monoid.Signature with type t = X.t) =
struct
include Monoid.Laws (M)
let tests =
let module SL = SemigroupLaws (X) (M) in
List.map (uncurry2 (make_test X.gen))
[ ("left id", left_id); ("right id", right_id) ]
@ SL.tests
end
This post is already getting too long, so I’ll finish with my favorite example. I think this setup shows the power of OCaml modules, but also shows a place where either I don’t know how to use them well enough (very plausible!) or they fall just a tad short of what I’d like.
Consider semirings, which consist of a set of elements \( S \), two operations \( + \) and \( \cdot \), and two special elements \(0\) and \(1\) such that:
My absolute favorite semiring is the Tropical semiring, also known as the min-plus semiring, which consists of:
Here’s kompreni
’s implementation^{3}:
module MinPlus = struct
type t = Finite of Q.t | Infinite
let ( +& ) a b =
match (a, b) with
| Finite x, Finite y -> Finite (Q.max x y)
| Finite _, _ -> a
| _, Finite _ -> b
| _, _ -> Infinite
let zero = Infinite
let ( *& ) a b =
match (a, b) with
| Finite x, Finite y -> Finite (Q.add x y)
| _, _ -> Infinite
let one = Finite Q.zero
end
And here are the laws that a Semiring
must abide by in kompreni
:
module Laws (S : Signature) = struct
open S
include Commutative_monoid.Laws (S)
(* (S, 1, *&) is also a Monoid *)
let times_associative x y z = x *& (y *& z) = x *& y *& z
let times_left_one x = one *& x = x
let times_right_one x = x *& one = x
(* Zero annihilates *)
let zero_annihilates_left x = zero *& x = zero
let zero_annihilates_right x = x *& zero = zero
(* Distributivity *)
let left_distributive x y z = x *& (y +& z) = (x *& y) +& (x *& z)
let right_distributive x y z = (x +& y) *& z = (x *& z) +& (y *& z)
end
I really like being able to include previous law definitions in new ones; for
instance, having defined the Commutative_monoid.Laws
functor, the line
include Commutative_monoid.Laws (S)
ensures that I include all those laws
into my tests for almost free—ensuring that I test \((S, +, 0)\) is indeed
a commutative monoid.
However, I don’t see a way to cleverly check \((S, ⋅, 1)\) is also a monoid
without writing out the times_
rules explicitly.
Maybe I need more OCaml module & functor goodness?
bastet
forgoes wrapping the monoid definitions and laws into the semiring one, for what it’s worth.
While I miss the usability of cargo
from Rust and some fun things from
Haskell, OCaml is fun and eye-opening!
Please feel free to head to GitHub to
check out the code.
Not to be confused with a module over a ring or one of the many other definitions of “module.” ↩︎
Nor will it likely ever be. ↩︎
Rather than floats, this uses rational numbers from zarith
for finite values. Using \(\mathbb{Q}\) is not the same as using \(\mathbb{R}\), but this still forms a semiring! Also it’s really, really hard to represent arbitrary real numbers in software. For instance, the associativity tests fail if you use OCaml floats, because floating point numbers aren’t real. ↩︎
With it being my birthday today, I took a little bit of time to noodle on a pet project.
For all the computer science classes I’ve taken, books I’ve read, course notes I’ve looked through, and code I’ve written, reviewed, or perused, I had yet to implement any sort of Lisp-like language myself.
Having decided to change that, I’ve been toying around for the last month or so on scum
, a Scheme-ish, Lisp-like language implementation.
It’s by no means good^{1} in any sense of the word; it’s incomplete, under-commented, probably loaded with bugs, undoubtedly slow, and profligate with its memory usage.
However, as of today, it’s cleared the bar I set for myself for being a reasonable-ish implementation: I can define and call lambdas from the REPL!
There’s not a lot of code currently, so feel free to peruse the repo yourself:
~/github/scum ∃ l
Permissions Size User Date Modified Git Name
.rw-r--r--@ 25k genos 13 Apr 14:58 -- Cargo.lock
.rw-r--r--@ 94 genos 13 Apr 14:58 -- Cargo.toml
.rw-r--r--@ 1.1k genos 13 Apr 14:58 -- LICENSE
.rw-r--r--@ 56 genos 13 Apr 14:58 -- README.md
drwxr-xr-x@ - genos 20 Apr 15:18 -- scum-lib
drwxr-xr-x@ - genos 13 Apr 14:58 -- scum-repl
drwxr-xr-x@ - genos 20 Apr 13:22 -I target
~/github/scum ∃ tokei
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
Markdown 1 3 0 2 1
Pest 1 38 26 8 4
Rust 8 628 577 9 42
TOML 3 27 23 0 4
===============================================================================
Total 13 696 626 19 51
===============================================================================
By no means as succinct as, say tinylisp
, nor as complete as the mal
implementations, scum
consists of
rustyline
to provide an interactive REPL:λ> (define square (lambda (x) (* x x)))
(lambda (x) (* x x))
λ> (square 39)
1521
I originally started in Haskell, but decided to shift to Rust because I kind of wanted this to be an exercise in satisfying an annoying pedant; Haskell provides so much already, it almost felt like cheating to use it.
Besides the aforementioned rustyline
crate in scum-repl
, I’ve used pest
and pest_derive
for parsing and thiserror
for organizing the sundry reasons things can go wrong.
While pest
was new to me and has been fun, I’ve used thiserror
on other projects at $WORK
before; I absolutely would not write any substantial Rust project without it at this point.
I wouldn’t have gotten even this far on scum
^{2} without the help of
mal
,👋 to the former coworker who enjoined me to blog more! I miss working with you, friend.
It’s the exact opposite of raganwald
’s README section, inspiration here. ↩︎
I didn’t consume all of this recently in my quest to put scum
together, but I would almost certainly be remiss in not mentioning these; in fact, I’m probably leaving lots out! ↩︎
I’ve been noodling with a Rust implementation of the ideas from this talk, and decided I should probably share it—or at least put it up online in case my laptop crashes. The repo is available here.
]]>We’ve talked previously about implementing a time-traveling key-value store. Having worked more with Rust in the interim, I tried my hand at a Rust implementation. Rust’s type system, standard library, and attention to pedantic details make this my favorite version yet.
The standard library provides everything we need:
use std::collections::BTreeMap;
use std::time::Instant;
pub struct Ttkv<K, V> {
started: Instant,
mapping: BTreeMap<u128, (K, V)>,
}
Recording the std::time::Instant
when we created our store allows us to
monotonically record insertion timestamps—on most hardware, at least—and we
can still check if this assumption is violated.
We build a new store via implementing Default
:
impl<K, V> Default for Ttkv<K, V> {
fn default() -> Self {
Self {
started: Instant::now(),
mapping: BTreeMap::default(),
}
}
}
Digging into the implementation (the following snippets are all wrapped in an
impl
block), it’s straightforward to check for emptiness:
pub fn is_empty(&self) -> bool {
self.mapping.is_empty()
}
Adding a pair to our store gives us an opportunity to assert that time is monotonic:
pub fn put(&mut self, key: K, value: V, timestamp: Option<u128>) {
let t = timestamp.unwrap_or_else(|| {
Instant::now()
.checked_duration_since(self.started)
.unwrap_or_else(|| panic!("non-monotonic insertion"))
.as_nanos()
});
self.mapping.insert(t, (key, value));
}
Retrieval from the store is similar to previous implementations:
pub fn get(&self, key: &K, timestamp: Option<u128>) -> Option<&V>
where
K: PartialEq,
{
self.mapping
.range(0..timestamp.unwrap_or(u128::MAX))
.filter(|(_, (k, _))| k == key)
.last()
.map(|(_, (_, v))| v)
}
Finally, collecting the insertion times in order:
pub fn times(&self) -> Vec<u128> {
self.mapping.keys().cloned().collect()
}
Rust’s (and cargo
’s) approach to testing is a pleasure to use.
We can do regular unit-style tests:
#[test]
fn initially_empty() {
let t = Ttkv::<String, String>::default();
assert!(t.is_empty());
assert!(t.times().is_empty());
}
And with the help of
proptest
we can do
property-based testing:
#[test]
fn two_gets_different_keys(a: String, b: String, x: String, y: String) {
prop_assume!(a != b);
let mut t = Ttkv::default();
t.put(a.clone(), x.clone(), None);
t.put(b.clone(), y.clone(), None);
prop_assert_eq!(t.times().len(), 2);
prop_assert_eq!(t.get(&a, None), Some(&x));
prop_assert_eq!(t.get(&b, None), Some(&y));
}
As a side note, this test originally failed for me in an early draft where the timestamping mechanism wasn’t monotonic.
This was a fun exercise, and again is probably my favorite implementation. See the repo for the full code!
]]>I’m excited to announce that our paper on using quantum ML to predict weather radar products is up on the arXiv! What’s more, per the press release, we’ll be presenting the paper at the Artificial Intelligence for Humanitarian Assistance and Disaster Response Workshop at NeurIPS 2021.
]]>One of the best uses of the type system I’ve seen is the Build systems a la carte paper. In it, the authors use Haskell’s types to outline and explore the problem domain in really novel ways. I wanted to understand CRDTs more, so I made a thing. It’s nowhere near the level of Build systems a la carte, but I found it useful.
CRDTs are important to modern distributed programming; as the above Wikipedia article explains, they are
data structure[s] which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies that might come up.
In reading about CRDTs, I came across the phenomenal paper A comprehensive study of Convergent and Commutative Replicated Data Types. With this paper, Wikipedia, and some other references as my guide, I made a Rust crate to understand state-based CRDTs (a.k.a. convergent replicated data types or CvRDTs). I really like how this crate uses two traits with associated types to describe CvRDTs, fitting them into a common framework.
The first trait is for CvRDTs that can only grow, i.e. only add items. Here’s an slightly modified version of it; the code (on GitHub, with documentation on docs.rs) has more:
pub trait Grow: Clone {
type Payload: Eq; // Internal state
type Update; // Message to update internal state
type Query; // Query message
type Value; // Query response
fn new(payload: Self::Payload) -> Self; // Create a new version of our CvRDT from Payload
fn payload(&self) -> Self::Payload; // Retrieve Payload
fn add(&mut self, update: Self::Update); // Add item, mutating in-place
fn le(&self, other: &Self) -> bool; // Is this ≤ other in semilattice's partial order?
fn merge(&self, other: &Self) -> Self; // Merge this and other into new CvRDT
fn query(&self, query: &Self::Query) -> Self::Value; // Query to get some Value
}
Our rigorous typing does make for one difference from the aforementioned paper and Wikipedia.
When implementing GCounter
s and PNCounter
s, we no longer have an unspecified myId()
function giving the current node’s index.
In order for the Payload
type to fully specify their internal state, these classes require that index be part of the payload.
Other CvRDTs can also shrink, i.e. remove items.
Here’s a slightly modified version of cvrdt-exposition
’s trait:
pub trait Shrink: Grow {
fn del(&mut self, update: Self::Update); // Delete item, mutating in-place
}
We have the Eq
bound on the Payload
type for verification and testing.
In order to verify that my CvRDT implementations behave as required, i.e. that their merge functions are commutative, associative, and idempotent, I turned to the proptest
crate for property-based testing, via assert_eq!
calls.
I turned to Rust’s macro system to generate some of these tests for me; see properties.rs
in the GitHub source and the test
sub-modules of the implementation modules for more.