Deploying Storm on GCE

In this post I'll describe how to set up and use Storm on Google Compute Engine (GCE). I'm gonna break this down into the following parts:

Below, I assume you're familiar with GCE basics and have gctuil set up.

Note that all scripts and resources are available via the GitHub repo mhausenblas/storm-on-gce.

I. Manually Deploying Storm on GCE

What follows is a step-by-step walkthrough on how to manually deploy Storm on GCE, based on Michael Noll's post Running a Multi-Node Storm Cluster.

I.A Selecting the Machine Types

First thing to do is to select appropriate Machine Types. To make things simple, I've reduced the choice to the following two types as I want affordable machines with RAM in the 8 GB to 16 GB range:

  • n1-standard-2 with 7.50 GB RAM at 0.207 USD/h
  • n1-highmem-2 with 13 GB RAM at $0.244 USD/h

Given that both types have 5.50 GCEU CPU capacity I'm happy to pay the extra ~4 ¢ for the n1-highmem-2 yielding almost double the RAM capacity. Note that the machines are charged a minimum of 10 min and after that in 1 min increments.

I.B Creating and Authenticating the Project

In the GCE console I created a new project with the ID storm-simple and enabled billing for it (see Settings menu item on the left hand side and expect a confirmation mail after a few minutes).

To add and set up the instances I first needed to do some (one-time) authentication:

$ gcutil --project=storm-simple auth

which asks me to paste a long URL into the Web browser and then paste the resulting authentication code back into the CLI. Now we're good to go and can build the cluster by adding and configuring instances.

I.C Building the Cluster

To set up the instances I first needed to decide which Linux flavour to use, so I had a look at what's available:

$ gcutil --project=storm-simple listimages

and I decided to go with the CentOS 6 image available at projects/centos-cloud/global/images/centos-6-v20131120.

For the Storm cluster we'll use four instances with the following cluster layout:

192.158.31.148 zk
23.251.128.67  nimbus
192.158.29.83  slave1
23.251.128.41  slave2

Note that the IPs stem from using reserved/static IP addresses and that they will differ in your case. You might want to update your local /etc/hosts with the respective values.

ZooKeeper Instance Setup

To set up the ZooKeeper (ZK) instance we need to do the following things:

  1. Provision an instance called zk and ssh into it
  2. Install Java 6 (OpenJDK) on the zk instance
  3. Install ZooKeeper
  4. Configure the firewall
  5. Check if everything works

1. Provisioning

$ gcutil --project=storm-simple addinstance zk --image=projects/centos-cloud/global/images/centos-6-v20131120 --service_account_scopes=storage-rw

Whoa! That's a lot to take in at once, so let's dissect it:

  • Pretty much all project-related CLI interactions start out with gcutil --project=storm-simple as this sets the scope to operate in.
  • The addinstance zk unsurprisingly adds an instance with the node name zk to your project.
  • The --image parameter lets you pick a prepared VM image; you can use custom images, though.
  • And last but not least we specify the --service_account_scopes parameter, allowing us to upload files from and to Google Cloud Storage, which is Google's equivalent to AWS S3.

Then, to log into the ZK node, do the following:

$ gcutil --project=storm-simple ssh zk

Later, when we're done you might want to shut down and remove the instance for good, like so:

$ gcutil --project=storm-simple listinstances
+------+----------------+---------+----------------+----------------+
| name | zone           | status  | network-ip     | external-ip    |
+------+----------------+---------+----------------+----------------+
| zk   | europe-west1-a | RUNNING | 10.240.143.134 | 192.158.31.148 |
+------+----------------+---------+----------------+----------------+

$ gcutil --project=storm-simple deleteinstance zk

Note that strictly speaking you only need the second command, but I typically do listinstances first, just to get a feeling for what is running in this project

2. Install Java

Once you're logged into the instance with gcutil --project=storm-simple ssh zk you can install OpenJDK like so:

[mhausenblas@zk ~]$ sudo yum install -y java-1.6.0-openjdk.x86_64
[mhausenblas@zk ~]$ export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre

3. Install ZooKeeper

Download the ZK 3.4.5 tarball and configure it according to this recipe and also make sure that:

  [mhausenblas@zk ~]$ export ZK_HOME=/usr/local/zookeeper/bin

4. Configure Firewalls

By default, ZK will listen on port 2181, so we need to tell GCE that traffic to this port is allowed via configuring a firewall for it:

$ gcutil --project=storm-simple addfirewall zk --description="Allow ZK" --allowed=":2181" --print_json

Further, we need to make sure that the built-in firewall doesn't interfere:

[mhausenblas@zk ~]$ sudo service iptables save
[mhausenblas@zk ~]$ sudo service iptables stop

5. Check ZK

Now the ZK instance is ready to be used. Try it out:

[mhausenblas@zk ~]$ $ZK_HOME/zkServer.sh start
[mhausenblas@zk ~]$ $ZK_HOME/zkCli.sh -server localhost:2181

Note that if you're not familiar with ZK and its CLI interface you might want to consult the documentation.

Nimbus Instance Setup

Follow steps 1. and 2. (provision instance and install Java) from the ZooKeeper instance, above. And then we need to install some more Storm dependencies and configurations such as a dedicated Storm user on the Nimbus instance:

1. Install ZeroMQ and JZMQ

Do the following:

[mhausenblas@nimbus ~]$ cd /tmp
[mhausenblas@nimbus tmp]$ wget https://github.com/downloads/saltstack/salt/zeromq-2.1.7-1.el6.x86_64.rpm
[mhausenblas@nimbus tmp]$ sudo yum install -y zeromq-2.1.7-1.el6.x86_64.rpm
[mhausenblas@nimbus tmp]$ wget https://s3.amazonaws.com/cdn.michael-noll.com/rpms/jzmq-2.1.0.el6.x86_64.rpm
[mhausenblas@nimbus tmp]$ sudo yum install -y jzmq-2.1.0.el6.x86_64.rpm

2. Add Storm User

Do the following:

[mhausenblas@nimbus ~]$ sudo groupadd -g 53001 storm
[mhausenblas@nimbus ~]$ sudo mkdir -p /app/home
[mhausenblas@nimbus ~]$ sudo useradd -u 53001 -g 53001 -d /app/home/storm -s /bin/bash storm -c "Storm service account"
[mhausenblas@nimbus ~]$ sudo chmod 700 /app/home/storm
[mhausenblas@nimbus ~]$ sudo chage -I -1 -E -1 -m -1 -M -1 -W -1 -E -1 storm

3. Install Storm

Do the following:

[mhausenblas@nimbus ~]$ cd /tmp
[mhausenblas@nimbus ~]$ curl -O https://dl.dropboxusercontent.com/s/dj86w8ojecgsam7/storm-0.9.0.1.zip
[mhausenblas@nimbus ~]$ cd /usr/local
[mhausenblas@nimbus ~]$ sudo unzip /tmp/storm-0.9.0.1.zip
[mhausenblas@nimbus ~]$ sudo chown -R storm:storm storm-0.9.0.1
[mhausenblas@nimbus ~]$ sudo ln -s storm-0.9.0.1 storm
[mhausenblas@nimbus ~]$ sudo mkdir -p /app/storm
[mhausenblas@nimbus ~]$ sudo chown -R storm:storm /app/storm
[mhausenblas@nimbus ~]$ sudo chmod 750 /app/storm

And finally, edit /usr/local/storm/conf/storm.yaml so that it contains the following:

storm.zookeeper.servers:
    - "zk"

nimbus.host: "nimbus"
nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"

ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"

storm.local.dir: "/app/storm"

4. Configure Firewalls

By default, Nimbus will listen on port 6627 and the Web UI is on port 8080:

$ gcutil --project=storm-simple addfirewall nimbus --description="Allow Nimbus" --allowed=":6627" --print_json 
$ gcutil --project=storm-simple addfirewall nimbusui --description="Allow Nimbus" --allowed=":8080" --print_json 

And again, we need to make sure that the built-in firewall doesn't interfere:

[mhausenblas@nimbus ~]$ sudo service iptables save
[mhausenblas@nimbus ~]$ sudo service iptables stop

5. Check Nimbus

A little test drive to see if everything works out fine:

[mhausenblas@nimbus ~]$ sudo su - storm
[storm@nimbus ~]$ cd /usr/local/storm
[storm@nimbus storm]$ bin/storm nimbus
[storm@nimbus storm]$ ps -ef | grep storm

Note that the last ps command should list two Storm occurrences, once backtype.storm.daemon.nimbus and once backtype.storm.ui.core.

Now we can also have a look at the Nimbus WebUI at http://nimbus:8080 which should look as follows:

Nimbus WebUI screen shot

(-> large version of Nimbus WebUI screen shot)

Slave Instances Setup

Follow steps 1. to 3. (up to and including Storm installation) from the Nimbus example, above for each of the slave nodes. Then, in order to launch a Supervisor on a slave instance, do the following:

1. Check if Slave works fine

[mhausenblas@slave1 ~]$ sudo su - storm
[storm@nslave1 ~]$ cd /usr/local/storm
[storm@nslave1 storm]$ bin/storm supervisor

To make things easier, you can use the following Storm launch scripts available via Google Cloud Storage (see gs://mhausenblas_storm/cluster and then, after doing gsutil config once per instance, you can simply copy them to an (instance) local directory:

[mhausenblas@nimbus ~]$ sudo su - storm
[storm@nimbus ~]$ mkdir ~/cluster
[storm@nimbus ~]$ gsutil cp gs://mhausenblas_storm/cluster* ~/cluster
[storm@nimbus ~]$ chmod 755 ~/cluster/*

Note that if you're unfamiliar with gsutil you might want to consult the documentation.

When both slave instances are added and the Supervisors are running, head over to http://nimbus:8080 again, which should look as follows:

Supervisor WebUI screen shot

OK, we've successfully set up all the instances and the Storm cluster is ready to be used now!

Troubleshooting

If things go south, check out the following log files:

  • Nimbus: $STORM_HOME/logs/nimbus.log
  • UI: $STORM_HOME/logs/ui.log
  • Supervisor: $STORM_HOME/logs/supervisor.log

Also, make sure that the processes are up and running using ps -ef | grep xxx and are listening on their respective ports with lsof -i TCP:YYYY. See also the Storm Troubleshooting guide for more details.


II. Automated, Script-supported Deployment

For an automated, script-supported deployment I took the following repositories and recipes as inspiration:

Simply create a GCE project in the GCE console (Note: the project ID defaults to storm-simple), enable billing for it (see Settings menu item on the left hand side and authenticate yourself. If unsure, consult section I.B from above.

Then, you can launch the following script that provisions the 4 instances and installs the respective software on each, depending on its role (ZooKeeper, Nimbus, Slaves).

./install_storm_on_gce.sh

After this you can ssh into any of the instances of the 4 node cluster, for example into the nimbus instance:

$ gcutil --project=storm-simple ssh nimbus

Once in the instance, you need to start the respective services (the setup script also downloads the necessary scripts into ~/cluster):

[mhausenblas@zk ~]$ cluster/cluster_launch_zk.sh

[mhausenblas@nimbus ~]$ cluster/cluster_launch_nimbus.sh
[mhausenblas@nimbus ~]$ cluster/cluster_launch_nimbus_ui.sh

[mhausenblas@slave1 ~]$ cluster/cluster_launch_supervisor.sh

[mhausenblas@slave2 ~]$ cluster/cluster_launch_supervisor.sh

And finally, you can also have a look at the Nimbus WebUI at http://nimbus:8080 which should look something like:

Nimbus WebUI screen shot


III. Submitting a Topology to the Cluster

To submit a topology we first need to prep the machine we submit it from.
I assume you've checked out the sources in a directory ~/storm-on-gce and then you'd do the following:

$ mkdir ~/.storm
$ cp ~/storm-on-gce/templates/template_storm.yaml ~/.storm/storm.yaml

With this, and assuming you've updated your local /etc/hosts with the respective values for your instances, the local machine is aware of the Storm cluster and ready to receive a topology. We will use nathanmarz/storm-starter for a test drive:

$ cd /tmp
$ git clone git@github.com:nathanmarz/storm-starter.git
$ cd storm-starter
$ lein deps
$ lein compile
$ lein uberjar
$ cd target/
$ ls -al *.jar
-rw-r--r--  1 mhausenblas  staff  718162 30 Dec 05:18 storm-starter-0.0.1-SNAPSHOT-standalone.jar

And this last JAR file, storm-starter-0.0.1-SNAPSHOT-standalone.jar is exactly what we need for our test drive:

$ storm jar target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamation-topology
...
0    [main] INFO  backtype.storm.StormSubmitter  - Jar not uploaded to master yet. Submitting jar...
73   [main] INFO  backtype.storm.StormSubmitter  - Uploading topology jar target/storm-starter-0.0.1-SNAPSHOT-standalone.jar to assigned location: /app/storm/nimbus/inbox/stormjar-ef3376c5-85d4-4a65-bbde-a266f17609d1.jar
2973 [main] INFO  backtype.storm.StormSubmitter  - Successfully uploaded topology jar to assigned location: /app/storm/nimbus/inbox/stormjar-ef3376c5-85d4-4a65-bbde-a266f17609d1.jar
2973 [main] INFO  backtype.storm.StormSubmitter  - Submitting topology exclamation-topology in distributed mode with conf {"topology.workers":3,"topology.debug":true}
3256 [main] INFO  backtype.storm.StormSubmitter  - Finished submitting topology: exclamation-topology

To see if it works, check out http://nimbus:8080 again, which should look as follows:

Exclamation Topology screen shot

Alternatively you can simply check on the command line as so:

$ storm list
0    [main] INFO  backtype.storm.thrift  - Connecting to Nimbus at nimbus:6627
Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
exclamation-topology ACTIVE     18         3            401

And now have fun writing and submitting your own topologies!


IV. Further Reading

You might want to check out the following resources:

comments powered by Disqus