A quotation-based Scala DSL for scalable data analysis.
Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, declarative API which maximises reuse of native Scala syntax and constructs.
DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has two benefits.
First, it allows to reuse Scala-native, declarative constructs in the DSL.
Quoted Scala syntax such as
are thereby lifted to an intermediate representation called Emma Core.
Second, it allows to analyze and optimize Emma Core terms holistically.
Subterms of type
DataBag[A] are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.
The emma-examples module contains examples from various fields.
- Graph Analysis
- Supervised Learning
- Unsupervised Learning
- Text Processing
Check emma-language.org for further information.
- JDK 7+ (preferably JDK 8)
- Maven 3
mvn clean package -DskipTests
to build Emma without running any tests.
For more advanced build options including integration tests for the target runtimes please see the "Building Emma" section in the Wiki.