Skip to main content

Posts

Showing posts from June, 2017

Apache Spark Hadoop Word Count Example

Let's create a simple word count example using Spark and Hadoop file system in Linux environment. I hope you have already installed apache-spark Now create a simple text file. Let's name it as "input.txt" and let the content of the file be people are not as beautiful as they look, as they walk or as they talk. they are only as beautiful as they love, as they care as they share. Start you spark by typing spark-shell on you Linux terminal. [cloudera@quickstart ~]$ spark-shell after some time you should be logged into Scala terminal. In order to get the word count load the file content split the words get the count The only command you need to get the count of the file is scala> sc.textFile("input.txt").flatMap(_.split(" ")).count When you trying to execute this command you will get an error saying org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:8020/user/input.txt This is because Hadoop