AWS

Scaling H2O analytics with AWS and p(f)urrr (Part 3)

H2O + AWS + purrr (Part III) This is the final installment of a three part series that looks at how we can leverage AWS, H2O and purrr in R to build analytical pipelines. In the previous posts I looked at starting up the environment through the EC2 dashboard on AWS’ website. The other aspect we looked at, in Part II, was how we can use purrr to train models using H2O’s awesome api.

Scaling H2O analytics with AWS and p(f)urrr (Part 2)

H2O + AWS + purrr (Part II) This is the second installment in a three part series on integrating H2O, AWS and p(f)urrr. In Part II, I will showcase how we can combine purrr and h2o to train and stack ML models. In the first post we looked at starting up an AMI on AWS which acts as the infrastructure upon which we will model. Part one of the post can be found here

Scaling H2O analytics with AWS and p(f)urrr (Part 1)

H2O + AWS + purrr (Part I) In these small tutorials to follow over the next 3 weeks, I go through the steps of using an AWS1 AMI Rstudio instance to run a toy machine learning example on a large AWS instance. I have to admit that you have to know a little bit about AWS to follow the next couple of steps, but be assured it doesn’t take too much googling to find your way if you get confused at any stage by some of the jargon.

Interacting with AWS from R

Getting set up If there is one realisation in life, it is the fact that you will never have enough CPU or RAM available for your analytics. Luckily for us, cloud computing is becoming cheaper and cheaper each year. One of the more established providers of cloud services is AWS. If you don’t know yet, they provide a free, yes free, option. Their t2.micro instance is a 1 CPU, 500MB machine, which doesn’t sound like much, but I am running a Rstudio and Docker instance on one of these for a small project.