Home >Java >javaTutorial >Introduction to three methods to implement WordCount
This article brings you an introduction to three methods of implementing WordCount. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
1. Streamlined Shell
cat /home/sev7e0/access.log | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1}'
#cat command displays the text content at one time
#tr -s ' ' '\n' Replaces the spaces in the text with the Enter key
#sort Sorts all specified files in series and writes the results to standard output.
#uniq -c Filters adjacent matching lines from the input file or standard input and writes them to the output file or standard output. -c adds a prefix number indicating the number of occurrences of the corresponding line before each line
#sort | uniq -c Used at the same time to count the number of occurrences
#sort -r Arrange the results in reverse order
#awk '{print $2,$1}' Output the results, with the text in front and the count in the back
2. Anti-human MapReduce
//mapreduce方式 public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // conf.set("fs.defaultFS", "hdfs://spark01:9000"); // conf.set("yarn.resourcemanager.hostname", "spark01"); Path out = new Path(args[1]); FileSystem fs = FileSystem.get(conf); //判断输出路径是否存在,当路径存在时mapreduce会报错 if (fs.exists(out)) { fs.delete(out, true); System.out.println("ouput is exit will delete"); } // 创建任务 Job job = Job.getInstance(conf, "wordcountDemo"); // 设置job的主类 job.setJarByClass(WordCount.class); // 主类 // 设置作业的输入路径 FileInputFormat.setInputPaths(job, new Path(args[0])); //设置map的相关参数 job.setMapperClass(WordCountMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); //设置reduce相关参数 job.setReducerClass(WordCountReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); //设置作业的输出路径 FileOutputFormat.setOutputPath(job, out); job.setNumReduceTasks(2); System.exit(job.waitForCompletion(true) ? 0 : 1); }
3. Easy-to-use spark
//spark版wordcount sc.textFile("/home/sev7e0/access.log").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreach(println(_))
The above is the detailed content of Introduction to three methods to implement WordCount. For more information, please follow other related articles on the PHP Chinese website!