Rghost 0.8.8 with ruby-2.0-preview1

rghost version 0.8.8 has been released, it is compatible with ruby-2.0-preview1

Patch for Nginx Upload Module

This is a simple patch to add PUT support for original nginx-upload-module

Compiling and Installing

Nginx

curl -LO http://nginx.org/download/nginx-1.3.4.tar.gz
tar xf nginx-1.3.4.tar.gz

Upload module (with PUT support)

git clone git://github.com/shairontoledo/ngx-upload-module.git

Compiling

cd nginx-1.3.4

for Mac, install pcre brew install pcre and add to ./configure --with-pcre --with-cc-opt=-I/usr/local/include --with-ld-opt=-L/usr/local/lib

export NGX_PREFIX=/path/to/install

./configure --prefix=$NGX_PREFIX \
  --add-module=../nginx-upload-module  \
  --with-debug \
  --with-http_stub_status_module \
  --with-http_flv_module \
  --with-http_ssl_module \
  --with-http_dav_module \
  --with-http_gzip_static_module \
  --with-http_realip_module \
  --with-mail \
  --with-mail_ssl_module \
  --with-ipv6

Prepare directories

 for ((i=0;i<10;i++)); do sudo mkdir -p /tmp/uploads/$i; done
 chmod 777 -R /tmp/uploads # set it properly to your system

Sample nginx.conf with Unicorn for Rails

worker_processes  1;

error_log  logs/error.log  debug;

events {
    worker_connections  1024;
}

http {

    upstream upstream_upload {
        server localhost:3000;
    }

    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;

    keepalive_timeout  65;

    #gzip  on;

    server {
        listen       80;
        server_name  localhost;

        location / {
            root   html;
            index  index.html index.htm;
        }

        location @unicorn_for_upload {
            proxy_pass http://upstream_upload;
        }

        location /files {

            #tryfiles
            if (-f $request_filename) {
                break;
            }

            if (-f $request_filename/index.html) {
                rewrite (.*) $1/index.html break;
            }

            if (-f $request_filename.html) {
                rewrite (.*) $1.html break;
            }

            # client size
            client_max_body_size 5632m;
            client_body_timeout 120;
            client_body_buffer_size 1m;
            send_timeout 120;

            #end point settings
            proxy_read_timeout 120;
            proxy_max_temp_file_size 6144m;
            proxy_send_timeout 120;

            #If it not a file, pass it to webapp
            if ($http_content_type = 'application/x-www-form-urlencoded'){
               proxy_pass http://upstream_upload;
              break;
            }

            if ($request_method ~* "POST|PUT") {
                upload_store_access user:rw group:rw all:rw;
                upload_store /tmp/uploads 1;
                upload_max_file_size 6G;
                upload_cleanup 400 404 499 500-505;

                upload_aggregate_form_field "file_entry[filename]" "$upload_file_name";
                upload_aggregate_form_field "file_entry[size]" "$upload_file_size";
                upload_aggregate_form_field "file_entry[file_path]" "$upload_tmp_path";
                upload_aggregate_form_field "file_entry[type]" "$upload_content_type";
                upload_aggregate_form_field "file_entry[md5]" "$upload_file_md5";
                upload_pass_form_field ".*";
                upload_pass_args on;
                upload_pass @unicorn_for_upload;
                break;
            }

            if (!-f $request_filename) {
                proxy_pass http://upstream_upload;
                break;
            }
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}

Testing through curl

curl -XPUT localhost/files -F file=@localfile.txt
curl -XPOST localhost/files -F file=@localfile.txt

Parsing JSON to model in Play! Framework

First of all, use directly jerkson(already include in Play! classpath) instead of native Play! wrapper play.api.libs.json.

import com.codahale.jerkson.{Json => json}

Now define Invoice model with Person and Document

case class Person(name: String)

case class Document(name: String, size: Int)

case class Invoice(
  code: String,
  person: Option[Person],
  documents: Option[List[Document]] )

The Option[Type] means that person and document are optional, if you declare just person: Person and the entry person is not present in the json the parse will raise an exception, so use Option to troubleshoot that.

In the action use RawBuffer of instead Json parser given we’re looking for bytes

def create = Action(parse.raw) { request =>
  val body = new String(request.body.asBytes().get)
  val invoice = json.parse[Invoice](bytes)
  Ok("yes")
}

Test it via curl, json structure will fit perfectly in the model.

curl -XPOST http://localhost:9000/invoices -d '
{
  "name": "6003a1bd7b3c59f69f",
  "person": {
    "name": "John Smith"
  },
  "documents": [
    {
      "name": "receipt A 2012.pdf",
      "size": 5555
    },
    {
      "name": "contract.doc",
      "size": 4444444
    }
  ]
}
'

You can improve the code getting charset from request to get bytes normalized from the body.

Posting JSON data to Rails Application

I’ve been playing around with many RESTful requests with Riak, ElasticSearch, MongoDB and Rails. In my scripts I used to use curb, cURL or even jQuery.

As you may know Rails receives data inside of an action as a hash, aka params, let’s give you some background before talk about post as JSON format.

Basic scenario

Given you have a model called Product you may want to create a new product with some fields, you must format your fields to fit as a Ruby Hash(or using Rails form helper), normally the format is model[field], in a HTML form would be

<form action="index" method="POST">
  <input type="text" name="product[name]" value="Responsive Web Design with HTML5 and CSS3"/>
  <input type="text" name="product[sku]" value="1849693188"/>
  <input type="text" name="product[publisher]" value="Packt Publishing (April 10, 2012)"/>
</form>

When you post data without specify enctype attribute for form tag, the default content type is application/x-www-form-urlencoded, behind the scene content type goes as a HTTP header and in HTTP body fields and values will be escaped and concatenated by &

product%5Bname%5D%3DResponsive+Web+Design+with+HTML5+and+CSS3&product%5Bsku%5D%3D1849693188&product%5Bpublisher%5D%3DPackt+Publishing+%28April+10%2C+2012%29

Using curl to post this

curl "localhost:3000/mytest" -d "product[name]=Responsive Web Design with HTML5 and CSS3&product[sku]=1849693188&product[publisher]=Packt Publishing (April 10, 2012)"

The raw HTTP looks like this

POST /mytest HTTP/1.1
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Host: localhost:3000
Accept: */*
Content-Length: 132
Content-Type: application/x-www-form-urlencoded

product%5Bname%5D%3DResponsive+Web+Design+with+HTML5+and+CSS3&product%5Bsku%5D%3D1849693188&product%5Bpublisher%5D%3DPackt+Publishing+%28April+10%2C+2012%29

In Rails side you can get data from params:

{
  "product"=> {
    "publisher"=>"Packt Publishing (April 10, 2012)",
      "name"=>"Responsive Web Design with HTML5 and CSS3",
      "sku"=>"1849693188"
  }
}

Posting as JSON

Only two things are needed to post data as JSON, first is set Content-Type header to application/json (or enctype for forms) and of course some data in JSON format, For example same data from product model:

curl -XPOST -H "Content-Type: application/json" "localhost:3000/mytest" -d '
{
  "product": {
    "publisher": "Packt Publishing (April 10, 2012)",
    "name": "Responsive Web Design with HTML5 and CSS3",
    "sku": "1849693188"
  }
}'

I think this is very useful when you have a tool to serialize json from a model and also for purpose of the testing, it’s more readable, you avoid unnecessary conversion.

You can choose what’s easier for you.

JLine is not loading in maven-scala-plugin with scala-console

For unknown reason maven-scala-plugin is not loading properly scala:console. After compile my app and run

mvn scala:console

It raises

Failed to created JLineReader: java.lang.NoClassDefFoundError: scala/tools/jline/console/completer/Completer
Falling back to SimpleReader.
Welcome to Scala version 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_29).
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Even though application context works, the collateral effect for this issue is you cannot neither get typed commands by up-arrow key nor get autocomple working by tab key.

I’ve fixed just including in my pom.xml the jline dependency

<dependency>
  <groupId>org.scala-lang</groupId>
  <artifactId>jline</artifactId>
  <version>2.9.0-1</version>
</dependency>

It’s not elegant but works. I’m looking for an environmental fix for this instead of a fix per project.

Indexing files with Scala and Elasticsearch

I was doing load testing in Elasticsearch., I’ve created a simple code in Scala to fetch files recursively and index them to Elasticsearch.

The code uses Java Mime Magic Library as a helper to get file description.

So let’s get started installing Elasticsearch

Installing and start Elasticsearch

curl -O http://cloud.github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.18.7.tar.gz
tar zxf elasticsearch-0.18.7.tar.gz
cd elasticsearch-0.18.7
bin/elasticsearch -f

Remove -f out if you don’t want start it in foreground.

Scala code

Our Scala code have 3 functions, one to list all files

def fetchFiles(path:String)(op:File => Unit){
  for (file <- new File(path).listFiles if !file.isHidden){
    op(file)
    if (file.isDirectory){
      fetchFiles(file.getAbsolutePath)(op)
    }
  }
}

A function to create the JSON.

def document(file:File) = {

  val json = jsonBuilder.startObject
  .field("name",file.getName)
  .field("parent",file.getParentFile.getAbsolutePath)
  .field("path",file.getAbsolutePath)
  .field("last_modified",new Date(file.lastModified))
  .field("size",file.length)
  .field("is_directory", file.isDirectory)

  if (file.isFile) {
    try{
      val m = Magic.getMagicMatch(file, true)
      json.field("description",m.getDescription)
      .field("extension",m.getExtension)
      .field("mimetype",m.getMimeType)
    }catch {
      case _ => json.field("description","unknown")
        .field("extension",file.getName.split("\\.").last.toLowerCase)
        .field("mimetype","application/octet-stream")
    }
  }
  json.endObject
}

Only files will be passed to Magic detection, there’s a treatment in case detector gets issue parsing the file. It’ll generate the final format to be indexed.

  {
      "name": "pragmatic-guide-to-git_p1_0.pdf",
      "parent": "/Users/shairon/Reference",
      "path": "/Users/shairon/Reference/pragmatic-guide-to-git_p1_0.pdf",
      "last_modified": "2010-11-26T18:55:43.000Z",
      "size": 1358963,
      "is_directory": false,
      "description": "PDF document",
      "extension": "pdf",
      "mimetype": "application/pdf"
  }

And finally the main

def main(args: Array[String]) = {
    val dir = new File(args(0))
    if (!dir.exists || dir.isFile || dir.isHidden) {
      printf("Directory not found %s\n",dir)
      System.exit(1)
    }

    val client = new TransportClient()
    client addTransportAddress(
      new InetSocketTransportAddress("0.0.0.0",9300)
    )
    fetchFiles( dir.getAbsolutePath){
      file => {
        printf("Indexing %s\n",file)
      client.prepareIndex("files", "file", DigestUtils.md5Hex(file.getAbsolutePath))
        .setSource(document(file))
        .execute.actionGet
      }
    }
    client.close
}

As you may notice, we’re running Elasticsearch and the program in the same machine(0.0.0.0), if you want to run Elasticsearch in other machine, change the ip/hostname at

client addTransportAddress(new InetSocketTransportAddress("ip/host-name-here",9300))

The index name and type is in the line

client.prepareIndex("files", "file", DigestUtils.md5Hex(file.getAbsolutePath))

it’s equivalent of a curl call

curl -XPUT 'http://0.0.0.0:9200/files/file/4cdb168a80e2adc397f44353b3223494' -d '...'

The only difference is port 9300, it’s used by Java Transport Client and 9200 is used straightforward by others clients.

Indexing

Indexing files is also simple. All we have to do is get this code put together so clone it https://github.com/shairontoledo/elasticsearch-filesystem-indexer

git clone git://github.com/shairontoledo/elasticsearch-filesystem-indexer.git
cd elasticsearch-filesystem-indexer

Install dependencies and compile it by maven

mvn install

Running

mvn exec:java -Dexec.mainClass=net.hashcode.fsindexer.Main -Dexec.args=/Users/me/directory/path

Set exec.args to a directory that you want to index.

Searching

After to index some files, you can search by

curl -XGET 'http://0.0.0.0:9200/files/file/_search?q=pdf&pretty=true'

You should see a response similar to

"took": 86,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 767,
    "max_score": 0.34046465,
    "hits": [
        {
            "_index": "files",
            "_type": "file",
            "_id": "a277bda1f97f8ffa6885347b1c76b8d3",
            "_score": 0.34046465,
            "_source": {
                "name": "agile-web-development-with-rails_p1_0.pdf",
                "parent": "/Users/shairon/Reference",
                "path": "/Users/shairon/Reference/agile-web-development-with-rails_p1_0.pdf",
                "last_modified": "2011-11-03T09:59:09.000Z",
                "size": 6700177,
                "is_directory": false,
                "description": "PDF document",
                "extension": "pdf",
                "mimetype": "application/pdf"
            }
        },

      ...

Now you have data, you can improve your queries and get started with Elasticsearch in your Scala application.

A Placeholder for Ruby Splat

I was reading the code language/splat_spec.rb at master from rubyspec/rubyspec – GitHub and also this post
The Strange Ruby Splat.

I’ve written about asterix symbol(or star) in the post Asterisco (Portuguese), I’m complementing these posts introducing the (*), for unix users it’s similar to /dev/null for a variable. I’ve been used it in my codes, I don’t remember the reference about it.
It’s very simple, you can use it when you don’t want to assign a value to variable in a block or in a parallel assignment.

Parallel Assignment

A simple array

langs = ["java", "csharp", "ruby", "haskell" ]

Imagine that you want just first and third value of the array, in a regular coding you could do this way.

l1,l2,l3 = *langs

It will assign l1 to "java",l2 to "csharp" and l3 to "ruby". By default * will discard the rest of array. Ok so far, but remembering about our case, we don’t need second value so l2 won’t be used.

To troubleshoot that, use (*) to jump second position.

l1,(*),l3 = *langs

It can be applied for any position, another example, picking the first and the last position.

l1,(*),(*),l4 = *langs

In block

In block it’s similar, let’s create a class to illustrate that.

class Person
  attr_accessor :name, :email, :phone

  def initialize(name,email,phone)
    @name, @email, @phone = name, email, phone
  end

  def attributes
    yield @name, @email, @phone
  end
end

An instance of Person

person = Person.new("joe", "joe@domain.com", "555-5555")

As you may notice, Person#attributes will yield to block name, email and phone. We just want the name and phone.

person.attributes do |name, (*), phone|
  puts name, phone
end

So that’s all, it’s more interesting than useful, isn’t it? :)

Netbeans 7.1 + Ruby, JRuby and Ruby on Rails installation issues

If you are experiencing problems to install Netbeans 7.1 with Ruby, JRuby and Ruby on Rails plugin, there’s a workaround from Tom Enebo.

The installation should be by zip file for now. Read the post above to get more details.

Language detection using pdf documents

This is a set of test for detect tika extracted documents.. It uses Language Detection Library for Java (langdetect.jar). Add more files to /docs/{lang}, the test will load them automatically.

Current support

  • Portuguese
  • English
  • French
  • German
  • Dutch
  • Italian
  • Spanish
  • Korean
  • Simplified Chinese
  • Traditional Chinese
  • Japanese

Install

Install Language Detection Library for Java manually

mvn install:install-file -Dfile=lib/langdetect.jar -DgroupId=com.cybozu.labs.langdetect -DartifactId=langdetect -Dversion=1.0 -Dpackaging=jar

The others dependencies will be installed by Maven.

Two steps to add your language

Create a directory under docs/{langX}, add some document to it and then create a new test method.

@Test
public void LanguageX() throws Exception {
  genericTest("Language X", "lang-x");
}

Running

export JAVA_OPTS=-Djava.awt.headless=true
mvn test

Log Output

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running TestSuite
[INFO ][LangMatcherTest] Setup
[INFO ][LangMatcherTest] Loading lang detector
[INFO ][LangMatcherTest] Loading tika
[INFO ][LangMatcherTest] Language Dutch(nl): loading
[INFO ][LangMatcherTest] Language Dutch(nl): 9 files
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/bereikbaarkaart-nl_print.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/de_bodem_onder_amsterdam.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/Economische verkenningen MRA2011_tcm14-228419.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/How To Reach Amsterdam RAI.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/IBM_Amsterdam_HDK.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/kwaliteitswijzer_291111_web_def.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/mas0.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/route-oracle-amsterdam1-157087-nl.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/{00C1E905-406C-490B-8079-5B9DCA9927BA}_C_NED.pdf
[INFO ][LangMatcherTest] Language English(en): loading
[INFO ][LangMatcherTest] Language English(en): 8 files
[INFO ][LangMatcherTest] Language English(en): docs/en/188741.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/Article33.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/BOSTmap.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/ff-boston_tcm7-4572.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/file84471.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/InformingTheDebate_Final.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/preven_code_tcm3-4039.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/STATEOFBLACKBOSTON_000.pdf
[INFO ][LangMatcherTest] Language French(fr): loading

Dump files

When the test passes to the language you can get extracted text in dump directory in the root of the project.

dump/nl
dump/nl/bereikbaarkaart-nl_print.pdf.txt
dump/nl/de_bodem_onder_amsterdam.pdf.txt
dump/nl/Economische verkenningen MRA2011_tcm14-228419.pdf.txt
dump/nl/How To Reach Amsterdam RAI.pdf.txt
dump/nl/IBM_Amsterdam_HDK.pdf.txt
dump/nl/kwaliteitswijzer_291111_web_def.pdf.txt
dump/nl/mas0.pdf.txt
dump/nl/route-oracle-amsterdam1-157087-nl.pdf.txt
dump/nl/{00C1E905-406C-490B-8079-5B9DCA9927BA}_C_NED.pdf.txt
dump/en
dump/en/188741.pdf.txt
dump/en/Article33.pdf.txt
dump/en/BOSTmap.pdf.txt
dump/en/ff-boston_tcm7-4572.pdf.txt
dump/en/file84471.pdf.txt
dump/en/InformingTheDebate_Final.pdf.txt
dump/en/preven_code_tcm3-4039.pdf.txt
dump/en/STATEOFBLACKBOSTON_000.pdf.txt

Scala in a Mavenized Netbeans project

I like Netbeans development environment, with it I can handle Java, C++, Ruby and now Scala. I found Scala NetBeans Plugin, in my research about how to bring netbeans working with Scala I came across some outdated documentation, in some of those the plugin working only with Scala 2.8.x + Netbeans 6.9, issues with OSX Lion and so on.
Another problem(no problem for you but for me) is the Scala NetBeans Plugin is based on Apache Ant build project, I prefer to use Apache Maven instead. I’ve got some karma by this liking. To save your time, I’ve written some steps to put Java, Scala, Maven and Netbeans working together.

Scala 2.9.1

To install Scala just download it, uncompress it to a directory, in this example, I’m going to use /Application/CustomApps so my SCALA_HOME is

/Applications/CustomApps/scala-2.9.1.final

Download Netbeans 7.1

Download Netbeans 7.1

Install it, probably, the installer will place it to /Applications/NetBeans/NetBeans7.1.app, let’s bind SCALA_HOME to be accessible in Netbeans. Edit Netbeans start up script

/Applications/NetBeans/NetBeans\ 7.1.app/Contents/Resources/NetBeans/bin/netbeans

After commented lines export SCALA_HOME environment variable, this way

export SCALA_HOME=/Applications/CustomApps/scala-2.9.1.final

Ps. There was a way to bind environment variables by adding a .plist in ~/.MacOSX/environment.plist. it’s deprecated in OSX Lion then you can edit Netbeans start up script that will work for all OSX versions.

Scala NetBeans Plugin(nbscala)

Download Scala plugin for Netbeans 7.1 and Scala 2.9.x. Uncompress it anywhere, but don’t forget the path :)

Installing the plugin

Open Netbeans, menu Tools -> Plugins -> Downloaded, button Add Plugins, select *.nbm files of nbscala plugin. Say yes/agree/allow for every dialogs.

Maven vs Ant

The Netbeans Scala plugin uses ant as builder if you don’t care about using ant, this tutorial ends up here. If you want to Maven(yes! Maven!) go ahead in the next topic.

Maven based Scala project

Let’s create a new ‘mavenized’ project, File -> New Project -> Maven -> Java Application go there and place the project in where do you want to. There is the maven-scala-plugin we’re going to use it, it is not needed download anything, Maven takes care about that, do you need to add some lines to your pom.xml.

The Maven repository for Scala

<repositories>
  <repository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
</repositories>

The build entry

<build>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <version>2.15.2</version>
        <executions>
          <execution>
            <phase>process-resources</phase>
            <goals>
              <goal>add-source</goal>
              <goal>compile</goal>
            </goals>
          </execution>
          <execution>
            <id>scala-test-compile</id>
            <phase>process-test-resources</phase>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <sendJavaToScalac>true</sendJavaToScalac>
          <args>
            <arg>-target:jvm-1.5</arg>
            <arg>-g:vars</arg>
            <arg>-deprecation</arg>
            <arg>-dependencyfile</arg>
            <arg>${project.build.directory}/.scala_dependencies</arg>
          </args>
        </configuration>
      </plugin>
    </plugins>
</build>

And scala-lang dependency

<dependency>
     <groupId>org.scala-lang</groupId>
     <artifactId>scala-library</artifactId>
     <version>2.9.0</version>
 </dependency>

Before run the project, you may create scala source directory src/main/scala and also the test src/test/scala.

A peace of scala code

Under src/main/scala create a directory mytest it will be our package. Now theMain.scala object in src/main/scala/mytest/Main.scala with content:

package mytest
import java.io.File

object Main {

  def main(args: Array[String]):Unit = {
    println("Hello this is my /tmp")
    for(file <- new File("/tmp").listFiles){
      println(file)
    }
  }
}

By default Netbeans continues point to App.java, you may change the main class in right-click in Project -> Properties -> Run in the Main Class type mytest.Main.

F6 and be happy.