We have been looking forward to lambda bringing the concept of closure to Java for a long time, but if we do not use it in collections, we will lose a lot of value. The problem of migrating existing interfaces to lambda style has been solved through default methods. In this article, we will deeply analyze the batch data operations in Java collections and unravel the mystery of lambda's strongest effect.
We have been waiting for a long time for lambda to bring the concept of closure to Java, but if we do not use it in collections, we will lose a lot of value. The problem of migrating existing interfaces to lambda style has been solved through default methods. In this article, we will deeply analyze the bulk data operation in Java collections and unravel the mystery of the most powerful role of lambda.
1. About JSR335
JSR is the abbreviation of Java Specification Requests, which means Java specification request, the main version of Java 8 An improvement is Project Lambda (JSR 335), which aims to make Java easier to code for multi-core processors.
2. External VS internal iteration
#In the past, Java collections were not able to express internal iteration, but only provided An external iteration method is provided, that is, for or while loop.
List persons = asList(new Person("Joe"), new Person("Jim"), new Person("John")); for (Person p : persons) { p.setLastName("Doe"); }
The above example is our previous approach, which is the so-called external iteration. The loop is a fixed sequence loop. In today's multi-core era, if we want to loop in parallel, we have to modify the above code. How much the efficiency can be improved is still uncertain, and it will bring certain risks (thread safety issues, etc.).
To describe internal iteration, we need to use a class library like Lambda. Let’s use lambda and Collection.forEachRewrite the above loop
persons.forEach(p->p.setLastName("Doe"));
Now the jdk library controls the loop. We don’t need to care about how the last name is set to each person object. The library can decide what to do based on the running environment. Parallel, out-of-order or lazy loading methods. This is internal iteration, and the client passes the behavior p.setLastName as data into the api. In fact, internal iteration is not closely related to batch operations of collections. With its help, we can feel the changes in grammatical expression. The really interesting thing related to batch operations is the new stream API. The new java.util.stream package has been added to JDK 8.
3.Stream API
Stream only represents the data stream and has no data structure, so it has been traversed It can no longer be traversed after one time (you need to pay attention to this when programming, unlike Collection, there is still data in it no matter how many times it is traversed). Its source can be Collection, array, io, etc.
3.1 Intermediate and end-point methods
The function of the stream is to provide an interface for operating big data, allowing data operations Easier and faster. It has methods such as filtering, mapping, and reducing the number of traversals. These methods are divided into two types: intermediate methods and terminal methods. The "stream" abstraction should be continuous by nature. Intermediate methods always return a Stream, so if we want to get the final result If so, an endpoint operation must be used to collect the final result produced by the stream. The difference between these two methods is to look at its return value. If it is a Stream, it is an intermediate method, otherwise it is an end method.
Briefly introduce several intermediate methods (filter, map) and end-point methods (collect, sum)
3.1.1Filter
Implementing the filtering function in the data stream is the most natural operation we can think of first. The Stream interface exposes a filter method, which can accept a Predicate implementation representing an operation to use a lambdaexpression that defines filter conditions.
List persons = … Stream personsOver18 = persons.stream().filter(p -> p.getAge() > 18);//过滤18岁以上的人
3.1.2Map
Suppose we filter some data now, such as when converting objects. The Map operation allows us to execute an implementation of Function (the generic T and R of Function43ca9160a1fbc6e1e17f36fac17e2094 represent execution input and execution result respectively), which accepts input parameters and returns them. First, let's take a look at how to describe it as an anonymous inner class:
Stream adult= persons .stream() .filter(p -> p.getAge() > 18) .map(new Function() { @Override public Adult apply(Person person) { return new Adult(person);//将大于18岁的人转为成年人 } });
Now, convert the above example into a lambda expression:
Stream map = persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person));
3.1.3Count
The count method is the end point method of a stream, which can make the result of the stream final Statistics, returns int. For example, let’s calculate the total number of people aged 18 or above
int countOfAdult=persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .count();
3.1.4Collect
collect The method is also the end method of a stream, which can collect the final results
List adultList= persons.stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .collect(Collectors.toList());
Or, if we want to use a specific implementation class to collect the results:
List adultList = persons .stream() .filter(p -> p.getAge() > 18) .map(person -> new Adult(person)) .collect(Collectors.toCollection(ArrayList::new));
篇幅有限,其他的中间方法和终点方法就不一一介绍了,看了上面几个例子,大家明白这两种方法的区别即可,后面可根据需求来决定使用。
3.2顺序流与并行流
每个Stream都有两种模式:顺序执行和并行执行。
顺序流:
List <Person> people = list.getStream.collect(Collectors.toList());
并行流:
List <Person> people = list.getStream.parallel().collect(Collectors.toList());
顾名思义,当使用顺序方式去遍历时,每个item读完后再读下一个item。而使用并行去遍历时,数组会被分成多个段,其中每一个都在不同的线程中处理,然后将结果一起输出。
3.2.1并行流原理:
List originalList = someData; split1 = originalList(0, mid);//将数据分小部分 split2 = originalList(mid,end); new Runnable(split1.process());//小部分执行操作 new Runnable(split2.process()); List revisedList = split1 + split2;//将结果合并
大家对hadoop有稍微了解就知道,里面的 MapReduce 本身就是用于并行处理大数据集的软件框架,其 处理大数据的核心思想就是大而化小,分配到不同机器去运行map,最终通过reduce将所有机器的结果结合起来得到一个最终结果,与MapReduce不同,Stream则是利用多核技术可将大数据通过多核并行处理,而MapReduce则可以分布式的。
3.2.2顺序与并行性能测试对比
如果是多核机器,理论上并行流则会比顺序流快上一倍,下面是测试代码
long t0 = System.nanoTime(); //初始化一个范围100万整数流,求能被2整除的数字,toArray()是终点方法 int a[]=IntStream.range(0, 1_000_000).filter(p -> p % 2==0).toArray(); long t1 = System.nanoTime(); //和上面功能一样,这里是用并行流来计算 int b[]=IntStream.range(0, 1_000_000).parallel().filter(p -> p % 2==0).toArray(); long t2 = System.nanoTime(); //我本机的结果是serial: 0.06s, parallel 0.02s,证明并行流确实比顺序流快 System.out.printf("serial: %.2fs, parallel %.2fs%n", (t1 - t0) * 1e-9, (t2 - t1) * 1e-9);
3.3关于Folk/Join框架
应用硬件的并行性在java 7就有了,那就是 java.util.concurrent 包的新增功能之一是一个 fork-join 风格的并行分解框架,同样也很强大高效,有兴趣的同学去研究,这里不详谈了,相比Stream.parallel()这种方式,我更倾向于后者。
4.总结
如果没有lambda,Stream用起来相当别扭,他会产生大量的匿名内部类,比如上面的3.1.2map例子,如果没有default method,集合框架更改势必会引起大量的改动,所以lambda+default method使得jdk库更加强大,以及灵活,Stream以及集合框架的改进便是最好的证明。
The above is the detailed content of Interpret the new features of Java8--the role of lambda. For more information, please follow other related articles on the PHP Chinese website!