Java Application on Apache Spark using Eclipse
- Open Eclipse.
- Go to new project. and select Other.
- Go to Maven and select Maven Project.
- After couple of next's .
- It ask for Group_id and Artifact_id. For e.g. Group_id : anshul and Artifact_id : anshulProject
- Then Click Finish.
- Open app.java file.
- Open pom.xml - We need to enter the maven dependencies.
- Open pom.xml and add dependency inside a dependencies tags.
- <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.1.0</version> </dependency>
- import org.apache.spark.SparkConf;
- import org.apache.spark.api.java.JavaRDD;
- import org.apache.spark.api.java.JavaSparkContext;
- import org.apache.spark.api.java.function.Function;
- If it gives error then. Hover mouse to the error packages and choose Fix Project Set-up.
- Below code gives the count of number of lines with a character "a" and "b"
- Below the complete java class code.
package anshul.anshulProject; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; public class App { public static void main( String[] args ) { String logFile = "/home/anshul/Downloads/demo.java"; SparkConf conf = new SparkConf().setMaster("local[4]").setAppName("My App"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> logData = sc.textFile(logFile).cache(); long numberA = logData.filter(new Function<String,Boolean>() { public Boolean call(String s){ return s.contains("a"); } }).count(); long numberB = logData.filter(new Function<String,Boolean>() { public Boolean call(String s){ return s.contains("b"); } }).count(); System.out.println("Lines with A : "+numberA+" Lines with B : "+numberB); } }
And the Pom.xml code :
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>anshul</groupId> <artifactId>anshulProject</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>anshulProject</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.3.1</version> </dependency> </dependencies> </project>
Output Looks like :
Thank You :)
Anshul Shrivastava
Anshul Shrivastava