1. Scraping APIs with Akka streams

    Sat 25 February 2017

    Getting data out of APIs sounds easy. After all, that's the point of APIs. And yet, this is often one of the more challenging aspects of building an ETL pipeline. In this post, I demonstrate how to use Akka streams to extract large quantities of data from the Yelp API …

  2. Scraping APIs with Akka streams — part 2

    Fri 24 February 2017

    This is the second half of a two-part series on Akka streams. In the first post, we used Akka streams to query the Yelp API for thousands of postcodes. In this post, we build on this program to build a robust scraper that will work for several days, unsupervised, within …

  3. How frustrating is your programming language?

    Sun 31 July 2016

    The internet is rife with bad metrics for comparing programming languages. This blog post adds another dubious measure: how often developpers swear in open source projects. This may serve as a proxy for how frustrated people get when developing in a particular language.

    First of all, credit where it's due …

  4. The right programming language for data science

    Sun 20 March 2016

    At ASI Data Science, we run an eight week data science fellowship that takes the brightest PhDs and postdocs and trains them in data science. The fellows spend seven weeks working closely with a company as part of the fellowship. This is the story of one of the fellows, John …

  5. Web APIs with Scala and Plotly

    Tue 23 February 2016

    We often need to keep track of quantities that evolve over time, like stock prices or weather data. This post describes how to use Plotly as a lightweight, append-only database for keeping track of this data. We query the price of Google stock every hour and send the results to Plotly.

  6. From academia to data science

    Thu 28 January 2016

    You're finishing a PhD in physics, engineering or another quantitative subject, and you're wondering what to do next. You might have heard the term "data science" bandied around your research group, so you decide to look into it. You realize that, as a data scientist, you will be solving difficult …

  7. From academia to data science — part 2

    Thu 28 January 2016

    This is the second half of a two-part series on negotiating the transition from a PhD in a quantitative subject to data science. In the first part, I talked about acquiring the basic coding and algorithms skills that companies are interested in. It's now time to talk about databases, machine …

  8. IPython notebooks and git

    Thu 25 December 2014
    IPython notebooks are becoming increasingly popular, but they don't play well with version control. This post offers a customisable recipe for including IPython notebooks in git repositories in a sensible manner.
  9. Monte Carlo integration

    Fri 19 December 2014
    Monte Carlo integration is an extremely powerful method for evaluating high-dimensional integrals. It's a method that, I find, is not nearly as well known as it should be. This post is an attempt at fixing this.