Getting Started

First, you need to install git, Maven 3 and Eclipse on your machine.

sudo apt-get install git maven eclipse

Next, clone the skeleton for the exercises:

git clone https://github.com/stratosphere/bigdataclass.org.git
cd bigdataclass.org

Inside the folder there is another folder for each exercise. Right now there is exercise1 which contains the java exercise and exercise2 which contains the Scala exercise. Each exercise is a self-contained maven project, you can import this project into Eclipse, using the "Import -> Import as Maven Project" menu. This can take a while as Maven is going to download all the dependencies.

Before starting to work on the exercises, you will need the following plugins:

Eclipse 4.x:

Eclipse 3.7:

Stratosphere Introduction

Stratosphere Intro (Java and Scala Interface) from Robert Metzger

FAQ

I'm getting a OutOfMemoryException on Mac OS X

Its seems that OS X does not allocate enough memory for Stratosphere's LocalExecutor. Open the Run Configuration (the drop down menu right to the "Run" button) and add the following to the JVM Arguments -Xms400m -Xmx800m. Note: the memory issue has been fixed in later releases (0.6-SNAPSHOT)

How can I generate more data (Test data)?

We have written a little tool to convert a Wikipedia dump into the file format used in the BigDataClass. Remember to change the document count in the util class before running with more documents.


comments powered by Disqus

Legal