1. Managing multiple AWS credentials with pass

    Tue 21 January 2020

    A significant part of my day job at Faculty involves the administration of cloud resources, predominantly on AWS.

    We have many AWS accounts: for development, for isolating parts of our infrastructure for specific customers or business lines etc. We also have restricted access to some of our customers' accounts. A …

  2. Scala in production — making your codebase more approachable

    Wed 09 January 2019

    This is the last post in a two-part series describing conventions that we have adopted when developing the Faculty platform.

    In the first post, we looked at conventions that reduced the likelihood of introducing bugs. In this post, we will look at conventions that make our codebase more approachable for …

  3. Scala in production — four conventions for safer programs

    Sun 06 January 2019

    The backend of the Faculty platform is written in Scala. Practically, this means we have spent most of the last three years writing Scala microservices with the Play framework.

    When we first started development in Scala, I had some misgivings about whether it was the right choice: would it be …

  4. Flow types for generators and coroutines

    Tue 28 August 2018

    Since ECMAScript 6 introduced the yield keyword, coroutines have become more common. The best known example is probably the async/await framework for concurrency, but coroutines also form the backbone of redux-saga and have made their way into bluebird.

    There seems to be little documentation on how to add Flow …

  5. Scraping APIs with Akka streams

    Sat 25 February 2017

    Getting data out of APIs sounds easy. After all, that's the point of APIs. And yet, this is often one of the more challenging aspects of building an ETL pipeline. In this post, I demonstrate how to use Akka streams to extract large quantities of data from the Yelp API …

  6. Scraping APIs with Akka streams — part 2

    Fri 24 February 2017

    This is the second half of a two-part series on Akka streams. In the first post, we used Akka streams to query the Yelp API for thousands of postcodes. In this post, we build on this program to build a robust scraper that will work for several days, unsupervised, within …

  7. How frustrating is your programming language?

    Sun 31 July 2016

    The internet is rife with bad metrics for comparing programming languages. This blog post adds another dubious measure: how often developpers swear in open source projects. This may serve as a proxy for how frustrated people get when developing in a particular language.

    First of all, credit where it's due …

  8. The right programming language for data science

    Sun 20 March 2016

    At ASI Data Science, we run an eight week data science fellowship that takes the brightest PhDs and postdocs and trains them in data science. The fellows spend seven weeks working closely with a company as part of the fellowship. This is the story of one of the fellows, John …

  9. Web APIs with Scala and Plotly

    Tue 23 February 2016

    We often need to keep track of quantities that evolve over time, like stock prices or weather data. This post describes how to use Plotly as a lightweight, append-only database for keeping track of this data. We query the price of Google stock every hour and send the results to Plotly.

  10. From academia to data science

    Thu 28 January 2016

    You're finishing a PhD in physics, engineering or another quantitative subject, and you're wondering what to do next. You might have heard the term "data science" bandied around your research group, so you decide to look into it. You realize that, as a data scientist, you will be solving difficult …

  11. From academia to data science — part 2

    Thu 28 January 2016

    This is the second half of a two-part series on negotiating the transition from a PhD in a quantitative subject to data science. In the first part, I talked about acquiring the basic coding and algorithms skills that companies are interested in. It's now time to talk about databases, machine …

  12. IPython notebooks and git

    Thu 25 December 2014
    IPython notebooks are becoming increasingly popular, but they don't play well with version control. This post offers a customisable recipe for including IPython notebooks in git repositories in a sensible manner.
  13. Monte Carlo integration

    Fri 19 December 2014
    Monte Carlo integration is an extremely powerful method for evaluating high-dimensional integrals. It's a method that, I find, is not nearly as well known as it should be. This post is an attempt at fixing this.