Category: Maven

JLine is not loading in maven-scala-plugin with scala-console

For unknown reason maven-scala-plugin is not loading properly scala:console. After compile my app and run

mvn scala:console

It raises

Failed to created JLineReader: java.lang.NoClassDefFoundError: scala/tools/jline/console/completer/Completer
Falling back to SimpleReader.
Welcome to Scala version 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_29).
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Even though application context works, the collateral effect for this issue is you cannot neither get typed commands by up-arrow key nor get autocomple working by tab key.

I’ve fixed just including in my pom.xml the jline dependency

<dependency>
  <groupId>org.scala-lang</groupId>
  <artifactId>jline</artifactId>
  <version>2.9.0-1</version>
</dependency>

It’s not elegant but works. I’m looking for an environmental fix for this instead of a fix per project.

Indexing files with Scala and Elasticsearch

I was doing load testing in Elasticsearch., I’ve created a simple code in Scala to fetch files recursively and index them to Elasticsearch.

The code uses Java Mime Magic Library as a helper to get file description.

So let’s get started installing Elasticsearch

Installing and start Elasticsearch

curl -O http://cloud.github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.18.7.tar.gz
tar zxf elasticsearch-0.18.7.tar.gz
cd elasticsearch-0.18.7
bin/elasticsearch -f

Remove -f out if you don’t want start it in foreground.

Scala code

Our Scala code have 3 functions, one to list all files

def fetchFiles(path:String)(op:File => Unit){
  for (file <- new File(path).listFiles if !file.isHidden){
    op(file)
    if (file.isDirectory){
      fetchFiles(file.getAbsolutePath)(op)
    }
  }
}

A function to create the JSON.

def document(file:File) = {

  val json = jsonBuilder.startObject
  .field("name",file.getName)
  .field("parent",file.getParentFile.getAbsolutePath)
  .field("path",file.getAbsolutePath)
  .field("last_modified",new Date(file.lastModified))
  .field("size",file.length)
  .field("is_directory", file.isDirectory)

  if (file.isFile) {
    try{
      val m = Magic.getMagicMatch(file, true)
      json.field("description",m.getDescription)
      .field("extension",m.getExtension)
      .field("mimetype",m.getMimeType)
    }catch {
      case _ => json.field("description","unknown")
        .field("extension",file.getName.split("\\.").last.toLowerCase)
        .field("mimetype","application/octet-stream")
    }
  }
  json.endObject
}

Only files will be passed to Magic detection, there’s a treatment in case detector gets issue parsing the file. It’ll generate the final format to be indexed.

  {
      "name": "pragmatic-guide-to-git_p1_0.pdf",
      "parent": "/Users/shairon/Reference",
      "path": "/Users/shairon/Reference/pragmatic-guide-to-git_p1_0.pdf",
      "last_modified": "2010-11-26T18:55:43.000Z",
      "size": 1358963,
      "is_directory": false,
      "description": "PDF document",
      "extension": "pdf",
      "mimetype": "application/pdf"
  }

And finally the main

def main(args: Array[String]) = {
    val dir = new File(args(0))
    if (!dir.exists || dir.isFile || dir.isHidden) {
      printf("Directory not found %s\n",dir)
      System.exit(1)
    }

    val client = new TransportClient()
    client addTransportAddress(
      new InetSocketTransportAddress("0.0.0.0",9300)
    )
    fetchFiles( dir.getAbsolutePath){
      file => {
        printf("Indexing %s\n",file)
      client.prepareIndex("files", "file", DigestUtils.md5Hex(file.getAbsolutePath))
        .setSource(document(file))
        .execute.actionGet
      }
    }
    client.close
}

As you may notice, we’re running Elasticsearch and the program in the same machine(0.0.0.0), if you want to run Elasticsearch in other machine, change the ip/hostname at

client addTransportAddress(new InetSocketTransportAddress("ip/host-name-here",9300))

The index name and type is in the line

client.prepareIndex("files", "file", DigestUtils.md5Hex(file.getAbsolutePath))

it’s equivalent of a curl call

curl -XPUT 'http://0.0.0.0:9200/files/file/4cdb168a80e2adc397f44353b3223494' -d '...'

The only difference is port 9300, it’s used by Java Transport Client and 9200 is used straightforward by others clients.

Indexing

Indexing files is also simple. All we have to do is get this code put together so clone it https://github.com/shairontoledo/elasticsearch-filesystem-indexer

git clone git://github.com/shairontoledo/elasticsearch-filesystem-indexer.git
cd elasticsearch-filesystem-indexer

Install dependencies and compile it by maven

mvn install

Running

mvn exec:java -Dexec.mainClass=net.hashcode.fsindexer.Main -Dexec.args=/Users/me/directory/path

Set exec.args to a directory that you want to index.

Searching

After to index some files, you can search by

curl -XGET 'http://0.0.0.0:9200/files/file/_search?q=pdf&pretty=true'

You should see a response similar to

"took": 86,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 767,
    "max_score": 0.34046465,
    "hits": [
        {
            "_index": "files",
            "_type": "file",
            "_id": "a277bda1f97f8ffa6885347b1c76b8d3",
            "_score": 0.34046465,
            "_source": {
                "name": "agile-web-development-with-rails_p1_0.pdf",
                "parent": "/Users/shairon/Reference",
                "path": "/Users/shairon/Reference/agile-web-development-with-rails_p1_0.pdf",
                "last_modified": "2011-11-03T09:59:09.000Z",
                "size": 6700177,
                "is_directory": false,
                "description": "PDF document",
                "extension": "pdf",
                "mimetype": "application/pdf"
            }
        },

      ...

Now you have data, you can improve your queries and get started with Elasticsearch in your Scala application.

Language detection using pdf documents

This is a set of test for detect tika extracted documents.. It uses Language Detection Library for Java (langdetect.jar). Add more files to /docs/{lang}, the test will load them automatically.

Current support

  • Portuguese
  • English
  • French
  • German
  • Dutch
  • Italian
  • Spanish
  • Korean
  • Simplified Chinese
  • Traditional Chinese
  • Japanese

Install

Install Language Detection Library for Java manually

mvn install:install-file -Dfile=lib/langdetect.jar -DgroupId=com.cybozu.labs.langdetect -DartifactId=langdetect -Dversion=1.0 -Dpackaging=jar

The others dependencies will be installed by Maven.

Two steps to add your language

Create a directory under docs/{langX}, add some document to it and then create a new test method.

@Test
public void LanguageX() throws Exception {
  genericTest("Language X", "lang-x");
}

Running

export JAVA_OPTS=-Djava.awt.headless=true
mvn test

Log Output

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running TestSuite
[INFO ][LangMatcherTest] Setup
[INFO ][LangMatcherTest] Loading lang detector
[INFO ][LangMatcherTest] Loading tika
[INFO ][LangMatcherTest] Language Dutch(nl): loading
[INFO ][LangMatcherTest] Language Dutch(nl): 9 files
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/bereikbaarkaart-nl_print.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/de_bodem_onder_amsterdam.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/Economische verkenningen MRA2011_tcm14-228419.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/How To Reach Amsterdam RAI.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/IBM_Amsterdam_HDK.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/kwaliteitswijzer_291111_web_def.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/mas0.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/route-oracle-amsterdam1-157087-nl.pdf
[INFO ][LangMatcherTest] Language Dutch(nl): docs/nl/{00C1E905-406C-490B-8079-5B9DCA9927BA}_C_NED.pdf
[INFO ][LangMatcherTest] Language English(en): loading
[INFO ][LangMatcherTest] Language English(en): 8 files
[INFO ][LangMatcherTest] Language English(en): docs/en/188741.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/Article33.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/BOSTmap.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/ff-boston_tcm7-4572.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/file84471.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/InformingTheDebate_Final.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/preven_code_tcm3-4039.pdf
[INFO ][LangMatcherTest] Language English(en): docs/en/STATEOFBLACKBOSTON_000.pdf
[INFO ][LangMatcherTest] Language French(fr): loading

Dump files

When the test passes to the language you can get extracted text in dump directory in the root of the project.

dump/nl
dump/nl/bereikbaarkaart-nl_print.pdf.txt
dump/nl/de_bodem_onder_amsterdam.pdf.txt
dump/nl/Economische verkenningen MRA2011_tcm14-228419.pdf.txt
dump/nl/How To Reach Amsterdam RAI.pdf.txt
dump/nl/IBM_Amsterdam_HDK.pdf.txt
dump/nl/kwaliteitswijzer_291111_web_def.pdf.txt
dump/nl/mas0.pdf.txt
dump/nl/route-oracle-amsterdam1-157087-nl.pdf.txt
dump/nl/{00C1E905-406C-490B-8079-5B9DCA9927BA}_C_NED.pdf.txt
dump/en
dump/en/188741.pdf.txt
dump/en/Article33.pdf.txt
dump/en/BOSTmap.pdf.txt
dump/en/ff-boston_tcm7-4572.pdf.txt
dump/en/file84471.pdf.txt
dump/en/InformingTheDebate_Final.pdf.txt
dump/en/preven_code_tcm3-4039.pdf.txt
dump/en/STATEOFBLACKBOSTON_000.pdf.txt

Scala in a Mavenized Netbeans project

I like Netbeans development environment, with it I can handle Java, C++, Ruby and now Scala. I found Scala NetBeans Plugin, in my research about how to bring netbeans working with Scala I came across some outdated documentation, in some of those the plugin working only with Scala 2.8.x + Netbeans 6.9, issues with OSX Lion and so on.
Another problem(no problem for you but for me) is the Scala NetBeans Plugin is based on Apache Ant build project, I prefer to use Apache Maven instead. I’ve got some karma by this liking. To save your time, I’ve written some steps to put Java, Scala, Maven and Netbeans working together.

Scala 2.9.1

To install Scala just download it, uncompress it to a directory, in this example, I’m going to use /Application/CustomApps so my SCALA_HOME is

/Applications/CustomApps/scala-2.9.1.final

Download Netbeans 7.1

Download Netbeans 7.1

Install it, probably, the installer will place it to /Applications/NetBeans/NetBeans7.1.app, let’s bind SCALA_HOME to be accessible in Netbeans. Edit Netbeans start up script

/Applications/NetBeans/NetBeans\ 7.1.app/Contents/Resources/NetBeans/bin/netbeans

After commented lines export SCALA_HOME environment variable, this way

export SCALA_HOME=/Applications/CustomApps/scala-2.9.1.final

Ps. There was a way to bind environment variables by adding a .plist in ~/.MacOSX/environment.plist. it’s deprecated in OSX Lion then you can edit Netbeans start up script that will work for all OSX versions.

Scala NetBeans Plugin(nbscala)

Download Scala plugin for Netbeans 7.1 and Scala 2.9.x. Uncompress it anywhere, but don’t forget the path :)

Installing the plugin

Open Netbeans, menu Tools -> Plugins -> Downloaded, button Add Plugins, select *.nbm files of nbscala plugin. Say yes/agree/allow for every dialogs.

Maven vs Ant

The Netbeans Scala plugin uses ant as builder if you don’t care about using ant, this tutorial ends up here. If you want to Maven(yes! Maven!) go ahead in the next topic.

Maven based Scala project

Let’s create a new ‘mavenized’ project, File -> New Project -> Maven -> Java Application go there and place the project in where do you want to. There is the maven-scala-plugin we’re going to use it, it is not needed download anything, Maven takes care about that, do you need to add some lines to your pom.xml.

The Maven repository for Scala

<repositories>
  <repository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
</repositories>

The build entry

<build>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <version>2.15.2</version>
        <executions>
          <execution>
            <phase>process-resources</phase>
            <goals>
              <goal>add-source</goal>
              <goal>compile</goal>
            </goals>
          </execution>
          <execution>
            <id>scala-test-compile</id>
            <phase>process-test-resources</phase>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <sendJavaToScalac>true</sendJavaToScalac>
          <args>
            <arg>-target:jvm-1.5</arg>
            <arg>-g:vars</arg>
            <arg>-deprecation</arg>
            <arg>-dependencyfile</arg>
            <arg>${project.build.directory}/.scala_dependencies</arg>
          </args>
        </configuration>
      </plugin>
    </plugins>
</build>

And scala-lang dependency

<dependency>
     <groupId>org.scala-lang</groupId>
     <artifactId>scala-library</artifactId>
     <version>2.9.0</version>
 </dependency>

Before run the project, you may create scala source directory src/main/scala and also the test src/test/scala.

A peace of scala code

Under src/main/scala create a directory mytest it will be our package. Now theMain.scala object in src/main/scala/mytest/Main.scala with content:

package mytest
import java.io.File

object Main {

  def main(args: Array[String]):Unit = {
    println("Hello this is my /tmp")
    for(file <- new File("/tmp").listFiles){
      println(file)
    }
  }
}

By default Netbeans continues point to App.java, you may change the main class in right-click in Project -> Properties -> Run in the Main Class type mytest.Main.

F6 and be happy.