为什么 python 的 select 分不清文件的可读可写？

Z时代
2024-02-12
分类：IT

使用 python 的 select.select

为了简单，就先没有使用 socket，而是使用 file

因为官方文档中说，可以是 python 文件对象，也可以是 socket 套接字

? 但是发现了和预期不符合的情况：

from pathlib import Path
import select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:
    # 返回值是三个列表，包含已就绪对象，返回的三个列表是前三个参数的子集。当超时时间已到且没有文件描述符就绪时，返回三个空列表。
    ready_objects: tuple[list, list, list] = select.select(
        [fr, fw],
        [fr, fw],
        [fr, fw]
    )
    ready_readable_objects, ready_writeable_objects, ready_exception_objects = ready_objects
    logger.debug(ready_readable_objects)
    logger.debug(ready_writeable_objects)    logger.debug(ready_exception_objects)

运行后输出如下 ? ：

2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:20 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]
2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:21 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]
2022-06-21 13:05:06.448 | DEBUG    | __main__:<module>:22 - [<_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='r' encoding='utf-8'>, <_io.TextIOWrapper name='/Users/ponponon/Desktop/code/me/ideaboom/test_select/run.log' mode='w' encoding='utf-8'>]

从结果来看，fr 和 fw 都被认为是可读可写对象了。

但是我预期的是 fr 是可读不可写，fw 是可写不可读

使用下面的代码用例也可以证明：fr 不可写

python">from pathlib import Path
import select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:    fr.write('hihi')

运行后输出如下 ? ：

Traceback (most recent call last):
  File "/Users/ponponon/Desktop/code/me/ideaboom/test_select/main.py", line 9, in <module>
    fr.write('hihi')io.UnsupportedOperation: not writable

可以看到，当我们调用 fr 的 write 方法的时候报错了，fr 是个不可写对象

使用下面的代码用例也可以证明：fw 不可读

from pathlib import Path
import select
from loguru import logger
BASE_DIR = Path(__file__).resolve().parent
with open(BASE_DIR/'run.log', 'r', encoding='utf-8') as fr, open(BASE_DIR/'run.log', 'w', encoding='utf-8') as fw:    fw.read()

运行后输出如下 ? ：

Traceback (most recent call last):
  File "/Users/ponponon/Desktop/code/me/ideaboom/test_select/main.py", line 9, in <module>
    fw.read()io.UnsupportedOperation: not readable

可以看到，当我们调用 fw 的 read 方法的时候报错了，fw 是个不可读对象

那为什么 select 会把 fr 认为是可写对象，fw 是可读对象呢？

select 这个模块是 c 写的，没有办法直接看 python 版本源码，下了 cpython 的源代码，又看不懂，谁能从 cpython 来分析一下是为什么呢？

大概的位置：Modules/selectmodule.c

static PyObject *
select_select_impl(PyObject *module, PyObject *rlist, PyObject *wlist,
                   PyObject *xlist, PyObject *timeout_obj)
/*[clinic end generated code: output=2b3cfa824f7ae4cf input=e467f5d68033de00]*/
{
#ifdef SELECT_USES_HEAP
    pylist *rfd2obj, *wfd2obj, *efd2obj;
#else  /* !SELECT_USES_HEAP */
    /* XXX: All this should probably be implemented as follows:
     * - find the highest descriptor we're interested in
     * - add one
     * - that's the size
     * See: Stevens, APitUE, $12.5.1
     */
    pylist rfd2obj[FD_SETSIZE + 1];
    pylist wfd2obj[FD_SETSIZE + 1];
    pylist efd2obj[FD_SETSIZE + 1];
#endif /* SELECT_USES_HEAP */
    PyObject *ret = NULL;
    fd_set ifdset, ofdset, efdset;
    struct timeval tv, *tvp;
    int imax, omax, emax, max;
    int n;
    _PyTime_t timeout, deadline = 0;
    if (timeout_obj == Py_None)
        tvp = (struct timeval *)NULL;
    else {
        if (_PyTime_FromSecondsObject(&timeout, timeout_obj,
                                      _PyTime_ROUND_TIMEOUT) < 0) {
            if (PyErr_ExceptionMatches(PyExc_TypeError)) {
                PyErr_SetString(PyExc_TypeError,
                                "timeout must be a float or None");
            }
            return NULL;
        }
        if (_PyTime_AsTimeval(timeout, &tv, _PyTime_ROUND_TIMEOUT) == -1)
            return NULL;
        if (tv.tv_sec < 0) {
            PyErr_SetString(PyExc_ValueError, "timeout must be non-negative");
            return NULL;
        }
        tvp = &tv;
    }
#ifdef SELECT_USES_HEAP
    /* Allocate memory for the lists */
    rfd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    wfd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    efd2obj = PyMem_NEW(pylist, FD_SETSIZE + 1);
    if (rfd2obj == NULL || wfd2obj == NULL || efd2obj == NULL) {
        if (rfd2obj) PyMem_Free(rfd2obj);
        if (wfd2obj) PyMem_Free(wfd2obj);
        if (efd2obj) PyMem_Free(efd2obj);
        return PyErr_NoMemory();
    }
#endif /* SELECT_USES_HEAP */
    /* Convert iterables to fd_sets, and get maximum fd number
     * propagates the Python exception set in seq2set()
     */
    rfd2obj[0].sentinel = -1;
    wfd2obj[0].sentinel = -1;
    efd2obj[0].sentinel = -1;
    if ((imax = seq2set(rlist, &ifdset, rfd2obj)) < 0)
        goto finally;
    if ((omax = seq2set(wlist, &ofdset, wfd2obj)) < 0)
        goto finally;
    if ((emax = seq2set(xlist, &efdset, efd2obj)) < 0)
        goto finally;
    max = imax;
    if (omax > max) max = omax;
    if (emax > max) max = emax;
    if (tvp) {
        deadline = _PyDeadline_Init(timeout);
    }
    do {
        Py_BEGIN_ALLOW_THREADS
        errno = 0;
        n = select(
            max,
            imax ? &ifdset : NULL,
            omax ? &ofdset : NULL,
            emax ? &efdset : NULL,
            tvp);
        Py_END_ALLOW_THREADS
        if (errno != EINTR)
            break;
        /* select() was interrupted by a signal */
        if (PyErr_CheckSignals())
            goto finally;
        if (tvp) {
            timeout = _PyDeadline_Get(deadline);
            if (timeout < 0) {
                /* bpo-35310: lists were unmodified -- clear them explicitly */
                FD_ZERO(&ifdset);
                FD_ZERO(&ofdset);
                FD_ZERO(&efdset);
                n = 0;
                break;
            }
            _PyTime_AsTimeval_clamp(timeout, &tv, _PyTime_ROUND_CEILING);
            /* retry select() with the recomputed timeout */
        }
    } while (1);
#ifdef MS_WINDOWS
    if (n == SOCKET_ERROR) {
        PyErr_SetExcFromWindowsErr(PyExc_OSError, WSAGetLastError());
    }
#else
    if (n < 0) {
        PyErr_SetFromErrno(PyExc_OSError);
    }
#endif
    else {
        /* any of these three calls can raise an exception.  it's more
           convenient to test for this after all three calls... but
           is that acceptable?
        */
        rlist = set2list(&ifdset, rfd2obj);
        wlist = set2list(&ofdset, wfd2obj);
        xlist = set2list(&efdset, efd2obj);
        if (PyErr_Occurred())
            ret = NULL;
        else
            ret = PyTuple_Pack(3, rlist, wlist, xlist);
        Py_XDECREF(rlist);
        Py_XDECREF(wlist);
        Py_XDECREF(xlist);
    }
  finally:
    reap_obj(rfd2obj);
    reap_obj(wfd2obj);
    reap_obj(efd2obj);
#ifdef SELECT_USES_HEAP
    PyMem_Free(rfd2obj);
    PyMem_Free(wfd2obj);
    PyMem_Free(efd2obj);
#endif /* SELECT_USES_HEAP */
    return ret;}

回答：

因为 Unix/Linux 底层设计就是如此。

首先你要理解 Unix/Linux 下“一切皆文件”，所有东西都在底层被抽象为“文件描述符”，无论 Socket 还是你所谓的“文件对象”皆是如此。

其次要注意区分 FileAccess 里的可读/可写（即文件权限中的可读、可写、可执行等等）、FileMode 里的可读/可写（即文件打开模式中的只读、追加、覆盖、创建或覆盖等等，也是你 Python 代码里通过 r/w 控制的）、和 select() 里判断可读/可写条件就绪，都可以叫“可读”/“可写”，但实质不是一回事儿。

最后 select 的实质是不断轮询队列是否有满足可读/可写条件就绪的文件描述符。但判断是否满足可读/可写，是由该文件自身的类型及其驱动决定的。对于 Socket 而言，可读就是接收缓冲区内有数据、可写就是发送缓冲区未满；但磁盘文件系统是没有缓冲区一说的、自然也就没有可读/可写条件就绪一说了。（其实还有个错误条件就绪，咱先忽略）

至于本地磁盘文件系统为什么被设计成永远是可读/可写、而不是永远不可读/不可写，这就是另一个话题了。

对细节感兴趣的话建议阅读《Advanced Programming in the UNIX Environment》这本书（中文翻译叫《UNIX 环境高级编程》）。

所以严格来说本地磁盘的文件描述符是不应该被用于 select 的。

回答：

在你自己提供的说明 select.select
中说的很详细（最顶上整体介绍的最后一句），就是这个模块不能用于常规文件的！

已参与了 SegmentFault 思否社区 10 周年「问答」打卡，欢迎正在阅读的你也加入。

回答：

select

select() allows a program to monitor multiple file descriptors,
waiting until one or more of the file descriptors become "ready"
for some class of I/O operation (e.g., input possible). A file
descriptor is considered ready if it is possible to perform a
corresponding I/O operation (e.g., read(2), or a sufficiently
small write(2)) without blocking.

select 并不是用来判断通常文件意义上的“可读”，“可写”，而是判断对它调用 read, write 是否会阻塞。

以上是为什么 python 的 select 分不清文件的可读可写？的全部内容，来源链接： utcz.com/p/938477.html