“You can have data without information, but you cannot have information without data.” – Daniel Keys Moran

Selected Publications

Recent Posts

More Posts

Getting set up If there is one realisation in life, it is the fact that you will never have enough CPU or RAM available for your analytics. Luckily for us, cloud computing is becoming cheaper and cheaper each year. One of the more established providers of cloud services is AWS. If you don’t know yet, they provide a free, yes free, option. Their t2.micro instance is a 1 CPU, 500MB machine, which doesn’t sound like much, but I am running a Rstudio and Docker instance on one of these for a small project.


Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping. Who should attend? This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping. What will you learn? You will learn: data manipulation with dplyr, tidyr and purrr; tools for accessing the DOM; scraping static sites with rvest; scraping dynamic sites with RSelenium; and setting up an automated scraper in the cloud.


The problem The tidyverse api is one the easiest APIs in R to follow - thanks the concept of grammer based code. But every now and again you come across an error that boggles your mind, my personal one is the error spread delivers when I have no indexing for my data: Error: Duplicate identifiers for rows (2, 3), (1, 4, 5).... I am sure we have all seen this some or other time.


As with most data scientist, there comes a time in your life when you have enough false confidence that you try to build production systems using docker on AWS. This past weekend, I fell in this trap. Having recently attended a R-ladies event in Cape Town that dealt with the topic of Docker for Data Science, I achieved a +5 docker confidence bonus - and you know what they say about having a new hammer… you look for anything that looks like a nail!


Moveme! I recently worked with a dataset that had over 100 columns and had to keep moving the order of the columns such that I could easier conduct my analysis. For example, whenever you try and conduct a multiple-factor analysis (FactoMineR::MFA), the function requires specific grouping of your variables to conduct the analysis. This meant that after feature engineering, I was left with the problem of having to order my columns so that the analysis could be run.




A collection of analytical/helper functions that I have collected over years of coding R. This package has now become my base library whenever I start a project. Use at own risk ;-)


RInno makes it easy to install local shiny apps by providing an interface between R and Inno Setup, an installer for Windows programs (sorry Mac and Linux users). It is designed to be simple to use (two lines of code at a minimum), yet comprehensive.

Pareto's Playground

A blog that I helped to build and maintain while at Eighty20 as a Data Scientist. This was before the days of Hugo and blogdown



Teaching forms a integral part of what I enjoy.

I am an assistant lecture for the following course at University of Stellenbosch:


  • Introduction to Algorithm Based Typography
  • Economic and Development Problems in Sub-Sahara Africa
  • Introduction to Excel: Basic data analytics and tools for efficiency
  • Introduction to R: What is programming and data analytics
  • Reproducible Research: Integrating R, Rtsudio, github and markdown into your research


University of Cape Town

  • BUS4053H - Quantitative Finance Project

University of Stellenbosch