Combining previous work with new things
In a previous post, we explored an example from Persi Diaconis using the Fourier Transform to change a collection of permutations into a table where the (i, j) entry is the number of voters who ranked candidate i in place j; this transformation consists of first mapping each permutation to its matrix representation, then adding the matrices together.
There’s more to explore here, however; since counts are just integers and
addition of integers is both associative and commutative, this matrix addition
is also commutative—that is, we can reorder and group the summands as much as
we like.
Thus, this problem is well suited to map-reduce-like approaches, since
associativity implies parallelism.
With commutativity, we can include a “shuffle” step, too.
This is what the Clojure library tesser
provides.
Combined with core.matrix for
matrices, tesser allows us to
- map the vote vectors to their permutation matrices
- add them together
but, contrasted with core Clojure’s map and reduce, we can take advantage
of the mathematical structure to do all of step 1 in parallel, then all of
step 2 in parallel with arbitrary shuffling.
Clojure Code
Here’s Clojure code that, after simulating a collection of votes, calculates
the desired table in two different ways:
- in serial, with the default
mapandreduce, and - using
tesserto take full advantage of associativity and commutativity.
It then checks that the two answers (imaginatively called a and b) are
equal, then prints out a.
(require '[tesser.core :as t])
(require '[clojure.core.matrix :as m])
(let
[k 7
n 10000
zero (m/zero-matrix k k)
vecs (for [_ (range n)] (shuffle (range k)))
a (->> vecs
(map m/permutation-matrix)
(reduce m/add zero))
b (->> (t/map m/permutation-matrix)
(t/reduce m/add zero)
(t/tesser [vecs]))
matrices-equal (if (= a b) "Yes" "No")]
(println (str "\n\nAre a and b equal? " matrices-equal "\n"))
(m/pm a))
Which, on one run, gave the following output:
Are a and b equal? Yes
[[1448.000 1419.000 1434.000 1505.000 1444.000 1389.000 1361.000]
[1428.000 1429.000 1430.000 1404.000 1384.000 1475.000 1450.000]
[1466.000 1471.000 1367.000 1458.000 1443.000 1408.000 1387.000]
[1427.000 1369.000 1434.000 1429.000 1452.000 1434.000 1455.000]
[1393.000 1388.000 1481.000 1444.000 1380.000 1439.000 1475.000]
[1412.000 1429.000 1447.000 1389.000 1415.000 1486.000 1422.000]
[1426.000 1495.000 1407.000 1371.000 1482.000 1369.000 1450.000]]
Nix Expressions
Code demos are fun, but too often they fall victim to bit-rot and stop working.
To combat this, we can strive for reproducibility, an important goal of data
science.
One tool that can help is nix.
Nix is a package manager, like homebrew for OS X,
that, through a novel approach, goes a very long way towards reproducibility.
By specifying everything in a declarative, functional language, we can be sure
we’re building the same thing twice.
With nix, we can specify the exact versions of tesser and core.matrix
we used, along with how to build them.
Then, we can provide a function to reproducibly run the above example.
The nix expression to do so is rather long—though that’s a small price to
pay for the assurance we get.
Here’s the content of flake.nix.
Save this in a file called flake.nix, and save the above Clojure code in diaconis.clj in the same directory.
Then run nix runt to (eventually) see the results.
{
description = "Fourier Transform in Clojure";
inputs = {
flake-utils.url = "github:numtide/flake-utils";
nixpkgs.url = "github:NixOS/nixpkgs/nixos-22.11";
};
outputs = {
self,
flake-utils,
nixpkgs,
}:
flake-utils.lib.eachDefaultSystem (system: let
pkgs = import nixpkgs {inherit system;};
leinFromGitHub = {
name,
owner,
repo,
rev,
sha256,
cd ? ".",
}:
pkgs.stdenv.mkDerivation {
name = name;
src = pkgs.fetchFromGitHub {
owner = owner;
repo = repo;
rev = rev;
sha256 = sha256;
};
buildInputs = [pkgs.leiningen];
# https://blog.jeaye.com/2017/07/30/nixos-revisited/
buildPhase = ''
export LEIN_HOME=$PWD/.lein
mkdir -p $LEIN_HOME
echo "{:user {:local-repo \"$LEIN_HOME\"}}" > $LEIN_HOME/profiles.clj
cd ${cd}
LEIN_SNAPSHOTS_IN_RELEASE=1 ${pkgs.leiningen}/bin/lein uberjar
'';
installPhase = ''
cp target/${repo}*-standalone.jar $out
'';
};
tesser = leinFromGitHub {
name = "tesser";
owner = "aphyr";
repo = "tesser";
rev = "d82895390264d8f6bf012fa4053a2003a500b574"; # release 1.0.4
sha256 = "0k2shy6ccig00lc9y6y8lyh7bnd250xxp5nh8fz254f2kq8ib4mn";
cd = "core";
};
matrix = leinFromGitHub {
name = "matrix";
owner = "mikera";
repo = "core.matrix";
rev = "07cb88b1b43ccc07f3f7e8634e1eccdb6986049b"; # develop branch as of 2021-11-19
sha256 = "1dn5hy5mdgl2bcmav8qll3qxqbscb340c462kkj728sfpk4ch5xd";
};
in {
packages.default = pkgs.writeShellApplication rec {
name = "diaconis.clj";
runtimeInputs = [pkgs.clojure pkgs.jdk];
text = ''
java -cp ${tesser}:${matrix} clojure.main ${name}
'';
};
});
}