About
I’m a cyclist in snowy Kitchener, Ontario. I enjoy functional programming and mucking with data.
Education
2003–2008. Bachelor of Software Engineering, University of Waterloo.
Work
2021-present. TypeScript developer at SyncWith. Founder building data tools to enable non-technical business staff to get their key business data into Google Sheets and Looker Studio.
2014–2019. Scala/Clojure developer at Sortable. Member of the founding team of 5, helped the company grow to ~80 people, of which ~40 were technical staff. Ad tech integrations with all the major vendors. Big data ingestion, storage and query frameworks built on Hive, Spark and Presto.
2011–2014. Scala developer at Snapsort Inc. Consumer recommendation engine website. Web crawling at scale. Running infrastructure on AWS. Data modelling.
2008–2010. C#/C++ developer at Microsoft Corp. Contributed authentication frameworks to the Windows Identity Framework in .NET and Active Directory Federation Services 2.0.
2006–2008 (internships). C#/C++ developer at Microsoft Corp.
2005. VB.NET software consulting to school boards.
2003–2004 (internships). Java developer at Arius Software Corp.
Open Source
tile-smush (2024): tile-smush
, a specialized tile-join
to merge .mbtiles
files 5x faster. Read more in this blog post.
Hiker Atlas (2024): Like AllTrails, but worse.
tilemaker (2023): Contributed a series of performance and functional improvements to an OpenStreetMap vector tile generator. Runtime and memory usage were dramatically improved, enabling people to build planet-scale basemaps on consumer-grade computers.
datasette-parquet (2023): Read Parquet files and DuckDB database files in Datasette.
datasette-ui-extras (2023): Opinionated improvements to the UI of Datasette, and support for additional kinds of facets.
datasette-scraper (2022): Manage small crawl and extract jobs in Datasette, with an extensible plugin system.
Parquet support in SQLite (2018): I wrote a virtual table extension for SQLite that enables reading data from Parquet files. Read more in this blog post.
csv2parquet (2018): A very thin wrapper around pyarrow to make it easy to convert CSV files to Parquet files.
Manu (2018): A Java library, command-line tools and web server for manipulating and serving timeseries data.
Stanford NLP (2013): I contributed a patch to the Stanford part of speech tagger that improved its tagging throughput by 50% for English languages, and 300-400% for non-English languages.
Miscellaneous
Patents. Software patents are a blight on the industry for the most part. The one I was granted is definitely part of the blight.