Skip to content
dmeister edited this page Sep 27, 2011 · 5 revisions

How to build FS-C

Introduction

This page described how to build FS-C, e.g. for extending or patching.

Prerequisite

Environment Variables

  • HADOOP_ROOT - Path to Hadoop installation
  • PIG_ROOT - Path to PIG installation

Initial installation

Scala 2.9.0, sbt, and so on have to be installed before fs-c can be compiled. Here is a possible setup process:

wget http://downloads.typesafe.com/typesafe-stack/1.0.1/typesafe-installer-1.0.1.jar
java -jar typesafe-installer-1.0.1.jar
cd $HOME/bin
wget http://simple-build-tool.googlecode.com/files/sbt-launch-0.7.7.jar
vim $HOME/bin/sbt
Contents: java -Xmx512M -jar `dirname $0`/sbt-launch-0.7.7.jar "$@"
chmod u+x $HOME/bin/sbt

Details

  • sbt package - build fs-c jar file
  • sbt compile - compiles sources
  • sbt test - compiles and executes test cases

If sbt is called without a target, it is started in interactive mode, so that multiple targets can be called without restarting a JVM.

  • ./project/build/release.py creates a new fs-c release tar file.
  • ./project/build/set_version.py updates the version information in all fs-c source files.

Notes on Pig and Hadoop

Hadoop and Pig are used for distributed analysis of the trace files. We use Hadoop 0.20 and Pig 0.7, but other versions also should work.

Hadoop is usually setup are a cluster of multiple machines. A manual can be found here.

Two configurations settings must be adjusted in the Pig installation. fs.default.name must be set to the Hadoop name node, e.g. hdfs://namenode1.pc2.de:9000/ and mapred.job.tracker must be set to the job tracker, e.g. jobtracker1.pc2.de:9001.

If no replication is used for Hadoop, the node where fs-c import is called should not be a hadoop data node, because due to limitations of the data location algorithm used by Hadoop all data will be stored on the local machine, even if multiple data nodes are available.