First, you need to install git, Maven 3 and Eclipse on your machine.
sudo apt-get install git maven eclipse
Next, clone the skeleton for the exercises:
git clone https://github.com/stratosphere/bigdataclass.org.git
cd bigdataclass.org
Inside the folder there is another folder for each exercise. Right now
there is exercise1
which contains the java exercise and exercise2
which
contains the Scala exercise. Each exercise is a self-contained maven
project, you can import this project into Eclipse,
using the "Import -> Import as Maven Project" menu.
This can take a while as Maven is going to download all the dependencies.
Before starting to work on the exercises, you will need the following plugins:
Eclipse 4.x:
Eclipse 3.7:
I'm getting a OutOfMemoryException on Mac OS X
Its seems that OS X does not allocate enough memory for Stratosphere's LocalExecutor
.
Open the Run Configuration (the drop down menu right to the "Run" button) and add the following to the JVM Arguments -Xms400m -Xmx800m
.
Note: the memory issue has been fixed in later releases (0.6-SNAPSHOT)
How can I generate more data (Test data)?
We have written a little tool to convert a Wikipedia dump into the file format used in the BigDataClass. Remember to change the document count in the util class before running with more documents.