转载

理解网络 IO 模型（四）

先回顾一下之前讲的select/poll。

poll对文件描述符的跟踪方式从array到list的改变，解决了监测文件个数有限的问题，但其它问题都没解决：1）每次select/poll都需要拷贝fd set。2）监测多个设备的事件时采用遍历的方式，每次select/poll时都至少一次的遍历。在监测的设备数量多且大部分不活跃的场景下，这都会大大降低效率！

那如何提高这个场景下的效率呢？epoll就是为此而生的：

Epoll is designed to function somewhat like select() or poll(), but with more options and with higher performance when large numbers of file descriptors are in use. Each call to select() orpoll() can involve an entirely new set of file descriptors, so the kernel must go through the process of validating each one, checking for I/O readiness, and adding the polling process to the appropriate wait queue. But the actual list of file descriptors tends not to change much between calls, so much of that work is unnecessary duplicated effort. The epoll calls get around this problem by separating that setup work from the act of waiting for a file descriptor to become ready.

先看下大致流程：

epoll里的add/del的实体是代表了fd的epitem，为了效率由rbtree管理。ep_pqueue相当于select/poll的poll_wqueues，用作epoll框架和具体设备实现之间的桥梁，由其上承载的epitem里的回调函数ep_ptable_queue_proc来完成：新建eppoll_entry，将其同时挂入epitem（用于epoll_ctl的删除操作）和具体设备（作用依旧，重要依旧）两者的等待队列里，另外，同时注册唤醒回调函数ep_poll_callback，用于唤醒进程前将epitem挂入eventpoll（epoll_create时创建）的rdlist里。

由以上流程分析，我们得出的理解是：

1）在select/poll没有将监测fd形式的实体独立出来，导致每次都需要重构所有实体。而epoll做到了，不再从大一统的poll_wqueues里直接分配新的entry，而是从独立出来的epitem里分配。可以说，这就是epoll_ctl的功劳！同时也消除了select/poll里每次调用都需要执行的内核和userspace之间的fd set拷贝。。。

2）宏观上看，eventpoll是一级fd，epitem是二级fd，具体设备的fd是三级fd，这样有了事件就会层层向上通知。当然，其实一个epoll里的一级fd（就是eventpoll）可能又是另一个epoll里的三级fd，如此递推。。。（想想多级文件系统的层层挂载）这样也就明白了，epoll_wait会将代表本进程的wait_queue_t挂在哪个结构的等待队列里。

3）由于每次selet/poll调用之间没有一个缓存结构，这样每次调用所有东西都得重来一遍，结果集自然也无法延续使用，根本实现不了诸如ET的新需求。而epoll呢，独立出一个eventpoll，其包含一个rdlist缓存（本次的epoll和下次的epoll就可以沟通了）。这样，ET的实现其实仅仅是一行代码：

if (!(epi->event.events & EPOLLET)) {

list_add_tail(&epi->rdllink, &ep->rdllist);

}

可以说，这就是epoll_create的功劳！综上，相对于select/poll，epoll的表象就是增加了epoll_create和epoll_ctl的系统调用，但本质是独立的力量。。。（全文完）

正文到此结束