为什么 python 的 select 分不清文件的可读可写?
使用 python 的 select.select
为了简单,就先没有使用 socket,而是使用 file
因为官方文档中说,可以是 python 文件对象,也可以是 socket 套接字
? 但是发现了和预期不符合的情况:
from pathlib import Pathimport select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:
    # 返回值是三个列表,包含已就绪对象,返回的三个列表是前三个参数的子集。当超时时间已到且没有文件描述符就绪时,返回三个空列表。
    ready_objects: tuple[list, list, list] = select.select(
        [fr, fw],
        [fr, fw],
        [fr, fw]
    )
    ready_readable_objects, ready_writeable_objects, ready_exception_objects = ready_objects
    logger.debug(ready_readable_objects)
    logger.debug(ready_writeable_objects)
    logger.debug(ready_exception_objects)
运行后输出如下 ? :
2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:20 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:21 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]
2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:22 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]
从结果来看,fr 和 fw 都被认为是可读可写对象了。
但是我预期的是 fr 是可读不可写,fw 是可写不可读
使用下面的代码用例也可以证明:fr 不可写
python">from pathlib import Pathimport select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:
    fr.write('hihi')
运行后输出如下 ? :
Traceback (most recent call last):  File "/Users/ponponon/Desktop/code/me/ideaboom/test_select/main.py", line 9, in <module>
    fr.write('hihi')
io.UnsupportedOperation: not writable
可以看到,当我们调用 fr 的 write 方法的时候报错了,fr 是个不可写对象
使用下面的代码用例也可以证明:fw 不可读
from pathlib import Pathimport select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:
    fw.read()
运行后输出如下 ? :
Traceback (most recent call last):  File "/Users/ponponon/Desktop/code/me/ideaboom/test_select/main.py", line 9, in <module>
    fw.read()
io.UnsupportedOperation: not readable
可以看到,当我们调用 fw 的 read 方法的时候报错了,fw 是个不可读对象
那为什么 select 会把 fr 认为是可写对象,fw 是可读对象呢?
select 这个模块是 c 写的,没有办法直接看 python 版本源码,下了 cpython 的源代码,又看不懂,谁能从 cpython 来分析一下是为什么呢?
大概的位置:Modules/selectmodule.c
static PyObject *select_select_impl(PyObject *module, PyObject *rlist, PyObject *wlist,
                   PyObject *xlist, PyObject *timeout_obj)
/*[clinic end generated code: output=2b3cfa824f7ae4cf input=e467f5d68033de00]*/
{
#ifdef SELECT_USES_HEAP
    pylist *rfd2obj, *wfd2obj, *efd2obj;
#else  /* !SELECT_USES_HEAP */
    /* XXX: All this should probably be implemented as follows:
     * - find the highest descriptor we're interested in
     * - add one
     * - that's the size
     * See: Stevens, APitUE, $12.5.1
     */
    pylist rfd2obj[FD_SETSIZE + 1];
    pylist wfd2obj[FD_SETSIZE + 1];
    pylist efd2obj[FD_SETSIZE + 1];
#endif /* SELECT_USES_HEAP */
    PyObject *ret = NULL;
    fd_set ifdset, ofdset, efdset;
    struct timeval tv, *tvp;
    int imax, omax, emax, max;
    int n;
    _PyTime_t timeout, deadline = 0;
    if (timeout_obj == Py_None)
        tvp = (struct timeval *)NULL;
    else {
        if (_PyTime_FromSecondsObject(&timeout, timeout_obj,
                                      _PyTime_ROUND_TIMEOUT) < 0) {
            if (PyErr_ExceptionMatches(PyExc_TypeError)) {
                PyErr_SetString(PyExc_TypeError,
                                "timeout must be a float or None");
            }
            return NULL;
        }
        if (_PyTime_AsTimeval(timeout, &tv, _PyTime_ROUND_TIMEOUT) == -1)
            return NULL;
        if (tv.tv_sec < 0) {
            PyErr_SetString(PyExc_ValueError, "timeout must be non-negative");
            return NULL;
        }
        tvp = &tv;
    }
#ifdef SELECT_USES_HEAP
    /* Allocate memory for the lists */
    rfd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    wfd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    efd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    if (rfd2obj == NULL || wfd2obj == NULL || efd2obj == NULL) {
        if (rfd2obj) PyMem_Free(rfd2obj);
        if (wfd2obj) PyMem_Free(wfd2obj);
        if (efd2obj) PyMem_Free(efd2obj);
        return PyErr_NoMemory();
    }
#endif /* SELECT_USES_HEAP */
    /* Convert iterables to fd_sets, and get maximum fd number
     * propagates the Python exception set in seq2set()
     */
    rfd2obj[0].sentinel = -1;
    wfd2obj[0].sentinel = -1;
    efd2obj[0].sentinel = -1;
    if ((imax = seq2set(rlist, &ifdset, rfd2obj)) < 0)
        goto finally;
    if ((omax = seq2set(wlist, &ofdset, wfd2obj)) < 0)
        goto finally;
    if ((emax = seq2set(xlist, &efdset, efd2obj)) < 0)
        goto finally;
    max = imax;
    if (omax > max) max = omax;
    if (emax > max) max = emax;
    if (tvp) {
        deadline = _PyDeadline_Init(timeout);
    }
    do {
        Py_BEGIN_ALLOW_THREADS
        errno = 0;
        n = select(
            max,
            imax ? &ifdset : NULL,
            omax ? &ofdset : NULL,
            emax ? &efdset : NULL,
            tvp);
        Py_END_ALLOW_THREADS
        if (errno != EINTR)
            break;
        /* select() was interrupted by a signal */
        if (PyErr_CheckSignals())
            goto finally;
        if (tvp) {
            timeout = _PyDeadline_Get(deadline);
            if (timeout < 0) {
                /* bpo-35310: lists were unmodified -- clear them explicitly */
                FD_ZERO(&ifdset);
                FD_ZERO(&ofdset);
                FD_ZERO(&efdset);
                n = 0;
                break;
            }
            _PyTime_AsTimeval_clamp(timeout, &tv, _PyTime_ROUND_CEILING);
            /* retry select() with the recomputed timeout */
        }
    } while (1);
#ifdef MS_WINDOWS
    if (n == SOCKET_ERROR) {
        PyErr_SetExcFromWindowsErr(PyExc_OSError, WSAGetLastError());
    }
#else
    if (n < 0) {
        PyErr_SetFromErrno(PyExc_OSError);
    }
#endif
    else {
        /* any of these three calls can raise an exception.  it's more
           convenient to test for this after all three calls... but
           is that acceptable?
        */
        rlist = set2list(&ifdset, rfd2obj);
        wlist = set2list(&ofdset, wfd2obj);
        xlist = set2list(&efdset, efd2obj);
        if (PyErr_Occurred())
            ret = NULL;
        else
            ret = PyTuple_Pack(3, rlist, wlist, xlist);
        Py_XDECREF(rlist);
        Py_XDECREF(wlist);
        Py_XDECREF(xlist);
    }
  finally:
    reap_obj(rfd2obj);
    reap_obj(wfd2obj);
    reap_obj(efd2obj);
#ifdef SELECT_USES_HEAP
    PyMem_Free(rfd2obj);
    PyMem_Free(wfd2obj);
    PyMem_Free(efd2obj);
#endif /* SELECT_USES_HEAP */
    return ret;
}
回答:
因为 Unix/Linux 底层设计就是如此。
首先你要理解 Unix/Linux 下“一切皆文件”,所有东西都在底层被抽象为“文件描述符”,无论 Socket 还是你所谓的“文件对象”皆是如此。
其次要注意区分 FileAccess 里的可读/可写(即文件权限中的可读、可写、可执行等等)、FileMode 里的可读/可写(即文件打开模式中的只读、追加、覆盖、创建或覆盖等等,也是你 Python 代码里通过 r/w 控制的)、和 select() 里判断可读/可写条件就绪,都可以叫“可读”/“可写”,但实质不是一回事儿。
最后 select 的实质是不断轮询队列是否有满足可读/可写条件就绪的文件描述符。但判断是否满足可读/可写,是由该文件自身的类型及其驱动决定的。对于 Socket 而言,可读就是接收缓冲区内有数据、可写就是发送缓冲区未满;但磁盘文件系统是没有缓冲区一说的、自然也就没有可读/可写条件就绪一说了。(其实还有个错误条件就绪,咱先忽略)
至于本地磁盘文件系统为什么被设计成永远是可读/可写、而不是永远不可读/不可写,这就是另一个话题了。
对细节感兴趣的话建议阅读《Advanced Programming in the UNIX Environment》这本书(中文翻译叫《UNIX 环境高级编程》)。
所以严格来说本地磁盘的文件描述符是不应该被用于 select 的。
回答:
在你自己提供的说明 select.select 
 中说的很详细(最顶上整体介绍的最后一句),就是 这个模块不能用于常规文件的!
已参与了 SegmentFault 思否社区 10 周年「问答」打卡 ,欢迎正在阅读的你也加入。
回答:
select
select() allows a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready"
for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform a
corresponding I/O operation (e.g., read(2), or a sufficiently
small write(2)) without blocking.
select 并不是用来判断通常文件意义上的“可读”,“可写”,而是判断对它调用 read, write 是否会阻塞。
以上是 为什么 python 的 select 分不清文件的可读可写? 的全部内容, 来源链接: utcz.com/p/938477.html


