Home >Backend Development >C#.Net Tutorial >C# traps you may not know, detailed explanation of sample code of IEnumerable interface

C# traps you may not know, detailed explanation of sample code of IEnumerable interface

黄舟
黄舟Original
2017-03-09 15:05:305209browse

C# Traps you may not know, detailed explanation of the sample code of the IEnumerable interface:

The importance of the IEnumerable enumerator interface, let’s talk about it No words are enough. Almost all collections implement this interface, and the core of Linq also relies on this universal interface. The for loop in C language is annoying to write, but foreach is much smoother.

I like this interface very much, but I also encounter a lot of questions when using it. Do you have the same confusion as me:

(1) What is the difference between IEnumerable and IEnumerator?

(2) Can enumerations be accessed out of bounds? What are the consequences of out-of-bounds access? Why can't the value of a collection be changed in an enumeration?

(3) What is the specific implementation of Linq? For example, Skip skips some elements. Are these elements accessed to ?

(4) What is the essence of IEnumerable?

(5) Will closure be formed in the IEnumerable enumeration? Will multiple enumeration processes interfere with each other? Can I dynamically change the elements of an enumeration within an enumeration?

….

If you are interested, let’s continue with the following content.

Before we begin, our article stipulates that an enumeration is an IEnumerable, an iteration is an IEnumerator, and one that has been instantiated (such as ToList()) is a collection.

1. IEnumerable and IEnumerator

IEnumerable has only one abstract method: GetEnumerator(), and IEnumerator is an iterator, which truly realizes the function of accessing a collection. IEnumerator has only one Current property and two methods: MoveNext and Reset.

There is a small problem. Isn’t it enough to just create one accessor interface? Why two interfaces that look confusing? One is called an enumerator and the other is called an iterator. Because

(1) Implementing IEnumerator is a dirty job, adding two methods and one attribute in vain, and these two methods are actually not easy to implement (will be mentioned later).

(2) It needs to maintain the initial state, know how to MoveNext, how to end, and return to the previous state of the iteration, which is not easy.

(3) Iteration is obviously not thread-safe. Each IEnumerable will generate a new IEnumerator, thus forming multiple iteration processes that do not affect each other. During the iteration process, the iteration collection cannot be modified, otherwise it is unsafe.

So as long as you implement IEnumerable, the compiler will help us implement IEnumerator. What's more, in most cases, they are inherited from existing collections, and there is generally no need to override the MoveNext and Reset methods. Of course, IEnumerable also has a generic implementation, which does not affect the discussion of the issue.

IEnumerable reminds us of a one-way linked list. In C, a pointer field is needed to save the information of the next node. So in IEnumerable, who helps save this information? Does this process take up memory? Does it occupy the program area or the heap area?

However, IEnumerable also has its shortcomings. It cannot go back or jump (it can only jump one by one), and it is not easy to implement Reset, and index access cannot be achieved. Think about it, if it is an enumeration process of a collection of instances, just return to the 0th element directly, but if this IEnumerable is a long access chain, it will be very difficult to find the original root! So the author of CLR via C# tells you that in fact, many Reset implementations are simply lies. Just know that this thing exists, and don't rely too much on it.

2. Is there any difference between foreach and MoveNext?

The biggest feature of IEnumerable is that it puts the access process under the control of the visitor himself. In C language, array control is completely external. This interface encapsulates the access process internally, further improving the encapsulation. For example:

public class People  //定义一个简单的实体类
    {
        public string Name { get; set; }
        public int Age { get; set; }
    }

    public class PersonList
    {
        private readonly List<People> peoples;

        public PersonList()  //为了方便,构造过程中插入元素
        {
            peoples = new List<People>();
            for (int i = 0; i < 5; i++)
            {
                peoples.Add(new People {Name = "P" + i, Age = 30 + i});
            }
        }

        public int OldAge = 31;
        public IEnumerable<People> OlderPeoples
        {
            get
            {
                foreach (People people in _people)
                {
                    if (people.Age > OldAge)
                        yield return people;
                }
                yield break;
            }
        }
    }

The essence of IEnumerable is a state machine, which is somewhat similar to the concept of events. It throws the implementation outside and realizes the travel between codes (think interstellar travel), this is The basics of Linq. Are cool iterators really as simple as we think?

In C language, an array is an array, a real memory space. So what does IEnumerable mean? If it is implemented by a real collection (such as List), then no problem, it is also real memory, but what if it is the above example? The yield return returned by filtering only returns elements, but this actual collection may not exist. If you decompile the yield return of a simple enumerator, you will find that it is actually a set of switch-cases, and the compiler is working in the background. Did a lot of work for us.

The generated new iterator, if MoveNext is not used, the Current is actually empty. Why is this? Why doesn't an iterator point directly to the head element?

(Thanks for the answer: Just like the head pointer of a one-way linked list in C language, you can specify an enumeration that does not contain any elements, which makes programming more convenient)

foreach every time Move forward one space and stop when it reaches the end. Wait, are you sure it will stop when it ends? Let’s do an experiment:

public IEnumerable<People> Peoples1   //直接返回集合
        {
            get { return peoples; }
        }public IEnumerable<People> Peoples2  //包含yield break;
        {
            get
            {
                foreach (var people in peoples)
                {
                    yield return people;
                }
                yield break;  //其实这个用不用都可以
            }
        }

The above two are our common methods. Pay attention to the second implementation. ReSharper marks yield break in gray (duplicate).

我们再写下如下的测试代码,peopleList集合只有五个元素,但尝试去MoveNext 8次。可以把peopleList.Peoples1换成2,3,分别测试。

            var peopleList = new PeopleList();  //内部构造函数插入了五个元素
            IEnumerator<People> e1 = peopleList.Peoples1.GetEnumerator();
            if (e1.Current == null)
            {
                Console.WriteLine("迭代器生成后Current为空");
            }
            int i = 0;
            while (i<8)  //总共只有五个元素,看看一直迭代会发生什么效果
            {
                e1.MoveNext();
                if (e1.Current == null)
                {
                    Console.WriteLine("迭代第{0}次后为空",i);
                }
                else
                {
                    Console.WriteLine("迭代第{0}次后为{1}",i,e1.Current.Name);
                }
                i++;
            }
//PeopleEnumerable1   (直接返回集合)
迭代器生成后Current为空
迭代第0次后为P0
迭代第1次后为P1
迭代第2次后为P2
迭代第3次后为P3
迭代第4次后为P4
迭代第5次后为空
迭代第6次后为空
迭代第7次后为空

//PeopleEnumerable2 (不加yield break)
迭代器生成后Current为空
迭代第0次后为P0
迭代第1次后为P1
迭代第2次后为P2
迭代第3次后为P3
迭代第4次后为P4
迭代第5次后为P4
迭代第6次后为P4
迭代第7次后为P4

//PeopleEnumerable2 (加上yield break)
迭代器生成后Current为空
迭代第0次后为P0
迭代第1次后为P1
迭代第2次后为P2
迭代第3次后为P3
迭代第4次后为P4
迭代第5次后为P4
迭代第6次后为P4
迭代第7次后为P4

越界枚举测试结果

真让人吃惊,返回原始集合,越界之后就返回null了,但如果是MoveNext,不论有没有加yield break, 越界迭代后还是返回最后一个元素! 也许就是我们在第1节里提到的,迭代器只返回上一次的状态,因为无法后移,所以就重复返回,那为什么List集合就不会这样呢?问题留给大家。

(感谢回答:越界枚举到底是null还是最后一个元素的问题,其实没有明确规定,具体看.NET的实现,在.NET Framework中,越界后依然是最后一个元素)。

不过各位看官尽管放心,在foreach的标准枚举过程下,枚举是肯定能枚举完的,这就说明了MoveNext和foreach两种在实现上的不同,显然foreach更安全。同时还注意,不能在yield过程中实现try-catch代码块,为什么呢?因为yield模式组合了来自不同位置的代码和逻辑,怎么可能靠编译给每个引用的代码块加上try-catch?这太复杂了。

枚举的特性在处理大数据的时候很有帮助,就是因为它的状态性,一个超大的文件,我只要每次读一部分,就可以顺次的读取下去,直到文件结束,由于不需要实例化集合,内存占用是很低的。对数据库也是如此,每次读取一部分,就能应对很多难以应付的情况。

3.在枚举中修改枚举器参数?

在枚举过程中,集合是不能被修改的,比如在foreach循环中,如果插入或者删除一个元素,肯定会报运行时异常。有经验的程序员告诉 你,此时用for循环。for和foreach的本质区别是什么呢?

在MoveNext中,我突然改变了枚举的参数,使得它的数据量变多或者变少了,又会发生什么?

           Console.WriteLine("不修改OldAge参数");
            foreach (var olderPeople in peopleList.OlderPeoples)
            {
                Console.WriteLine(olderPeople);

            }

            Console.WriteLine("修改了OldAge参数");
            i = 0;
            foreach (var olderPeople in peopleList.OlderPeoples)
            {
                Console.WriteLine(olderPeople);
                i++;
                if (i ==1)
                    peopleList.OldAge = 33;  //只枚举一次后,修改OldAge 的值
            }

测试结果是:

不修改OldAge参数
ID:2,NameP2,Age32
ID:3,NameP3,Age33
ID:4,NameP4,Age34

修改了OldAge参数
ID:2,NameP2,Age32
ID:4,NameP4,Age34

可以看到,在枚举过程中修改了控制枚举的值,能动态改变枚举的行为。上面是在一个yield结构中改变变量的情况,我们再试试在迭代器和Lambda表达式的情况(代码略), 得到结果是:

在迭代中修改变量值
ID:2,NameP2,Age32
ID:4,NameP4,Age34
在Lambda表达式中修改变量值
ID:2,NameP2,Age32
ID:4,NameP4,Age34

可以看出,外部修改变量能够控制内部的迭代过程,动态改变了“集合的元素”。 这是一个好事,因为它的行为确实是对的;也是坏事:在迭代过程中,修改了变量的值,上下文语境变化,可是如果还按之前的语境进行处理,显然就会酿成大错。 这里和闭包没关系。

因此,如果一个枚举需要在上下文会发生变化的情况下保持原有的行为,就需要手动保存变量的副本。

如果你把两个集合A,B用Concat函数顺次拼接起来,也就是A-B, 而且不实例化,那么在枚举A的阶段中,修改集合B的元素,会报错么? 为什么?

比如如下的测试代码:

       List<People> peoples=new List<People>(){new People(){Name = "PA"}};
            Console.WriteLine("将一个虚拟枚举A连接到集合B,并在枚举A阶段修改集合B的元素");
            var e8 = peopleList.PeopleEnumerable1.Concat(peoples);
            i = 0;
            foreach (var people in e8)
            {
                Console.WriteLine(people);
                i++;
                if (i == 1)   
                  peoples.Add(new People(){Name = "PB"});  //此时还在枚举PeopleEnumerable1阶段
        }

如果你想知道,可以自己做个试验(在我附件里也有这个例子)。留给大家讨论。

4. 更多LINQ的讨论

你可以在yield中插入任何代码,这就是延迟(Lazy)的表现,只是需要执行的时候才执行。 我们不难想象Linq很多函数的实现方式,比较有意思的包括Concat,它将两个集合连在了一起,就像下面这样:

public static IEnumerable<T> Concat<T>(this IEnumerable<T> source, IEnumerable<T> source2)
       {
           foreach (var r in source)
           {
               yield return r;
           }
           foreach (var r in source2)
           {
               yield return r;
           }
       }

还有Select, Where都好实现,就不讨论了。

Skip怎么实现的呢?  它跳过了集合中的一部分元素,我猜是这样的:

public static IEnumerable<T> Skip<T>(this IEnumerable<T> source, int count)
       {
           int t = 0;
           foreach (var r in source)
           {
               t++;
               if(t<=count)
                   continue;
               yield return r;
           }
       }

那么,被跳过的元素,到底被访问过没有?它的代码被执行了么?

 Console.WriteLine("Skip的元素是否会被访问到?");
 IEnumerable<People> e6 = peopleList.PeopleEnumerable1.Select(d =>
       {
              Console.WriteLine(d);
              return d;
       }).Skip(3);
 Console.WriteLine("只枚举,什么都不做:");
 foreach (var  r in e6){}  
 Console.WriteLine("转换为实体集合,再次枚举");
 IEnumerable<People> e7 = e6.ToList();
 foreach (var r in e7){}

测试结果如下:

只枚举,什么都不做:
ID:0,NameP0,Age30
ID:1,NameP1,Age31
ID:2,NameP2,Age32
ID:3,NameP3,Age33
ID:4,NameP4,Age34
转换为实体集合,再次枚举
ID:0,NameP0,Age30
ID:1,NameP1,Age31
ID:2,NameP2,Age32
ID:3,NameP3,Age33
ID:4,NameP4,Age34

可以看出,Skip虽然是跳过,但还是会“访问”元素的,因此会执行额外的操作,比如lambda表达式,这不论是枚举器还是实体集合都是如此。这个角度说,要优化表达式,应当尽可能在linq中早的Skip和Take,以减少额外的副作用。

但对于Linq to SQL的实现中,显然Skip是做过额外优化的。我们是否也能优化Skip的实现,使得上层尽可能提升海量数据下的Skip性能呢?

5. 有关IEnumerable枚举的更多问题

(1) 枚举过程如何暂停?有暂停这一说么? 如何取消?

(2) PLinq的实现原理是什么?它改变的到底是IEnumerable接口的哪种特性?是否产生了乱序枚举?这种乱序枚举到底是怎么实现?

(3) IEnumerable实现了链条结构,这是Linq的基础,但这个链条的本质是什么?

(4) Because IEnumerable represents status and delay, it is not difficult to understand that the essence of many asynchronous operations is IEnumerable. During one of my interviews, I was asked about the essence of asynchronousness. What do you think is the essence of asynchronousness? Async is not multithreading! The beauty of asynchronous is essentially the reorganization of code, because long-term asynchronous operations are state machines. . . For example, CCR library. I am not going to expand on it here because it is temporarily beyond the author’s knowledge reserve. I will talk about it next time.

(5) If the same enumerator is implemented in C language, can the same cool Linq be implemented without relying on a compiler? Let’s not mention the Lambda trick, let’s use function pointers.

(6) IEnumerable is written in MapReduce? Linq for MapReduce?

(7) How to Sort IEnumerable? Can it be instantiated into a set and then sorted? If it is a very large virtual collection, how to optimize it?

The above is the detailed content of C# traps you may not know, detailed explanation of sample code of IEnumerable interface. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn