前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >yield关键字:听叔一句劝,这里的水很深,你可得把握住!

yield关键字:听叔一句劝,这里的水很深,你可得把握住!

作者头像
才浅Coding攻略
发布2022-12-12 17:16:41
发布2022-12-12 17:16:41
44000
代码可运行
举报
文章被收录于专栏:才浅coding攻略才浅coding攻略
运行总次数:0
代码可运行

阿巩

多日不见,甚是想念!!

最近两天因为一些原因小阿巩竟然咕咕咕了(呵,女人),具体原因先卖个关子,之后会水篇文和大伙分享。yield也是面试中常考的知识点,关于yield或者说关于python的生成器你真的了解吗?yield关键字到底返回了什么?在python中底层又是如何实现的呢?今天阿巩将从python源码出发和大家分享下这个神秘的关键字。日拱一卒,让我们开始吧!

yield是一个类似return 的关键字,在Python中如果一个函数中包含yield,我们就将它认为是一个generator生成器,每一个生成器都是一个迭代器(但迭代器不一定是生成器)。

函数执行过程中遇到一个yield就会中断一次,返回一个迭代值,函数保存自己的变量和状态(python 的 generator 只保留栈帧上下文)。下次迭代时从yield下一条语句继续执行,函数恢复之前状态,直到遇到下一个yield返回迭代值,如此循环。

迭代器是什么呢?最常见的就是for语句,Python内部就是把for后面的对象使用了内建函数iter,这个iter返回一个有迭代能力的对象,它主要映射到类里的__iter__魔术方法,这个函数返回的是一个实现了__next__魔术方法的对象,如果你自己实现了,可以返回self。

我们说的迭代器就是同时实现了__iter__和__next__的对象,并在__next__迭代完后抛出一个StopIteration异常。

带有yield的函数不仅仅是只用于for循环,而且可用于某个函数的参数,只要这个函数的参数也允许迭代参数。

下面通过一个经典的例子,看下使用了生成器的效果:

这是一个生成无限序列的例子,要求生成一个满足某条件的大列表,这个列表需要保存在内存中,很明显内存限制了这个问题。

代码语言:javascript
代码运行次数:0
复制
def get_primes(start):  # 使用return返回
      for element in magic_infinite_range(start):
            if is_prime(element):
            return element


def get_primes(number):  # 使用yield返回
    while True:
        if is_prime(number):
            yield number
        number += 1

使用生成器不需要返回整个列表,每次都只返回一个数据,避免了内存的限制问题。

我们来看下python 3.9.5中生成器部分的源码,首先看下python虚拟机的调用原理。python虚拟机的栈帧位置在\Include\cpython\frameobject.h

代码语言:javascript
代码运行次数:0
复制
/* Frame object interface */

#ifndef Py_CPYTHON_FRAMEOBJECT_H
#  error "this header file must not be included directly"
#endif

#ifdef __cplusplus
extern "C" {
#endif

typedef struct {
    int b_type;                 /* what kind of block this is */
    int b_handler;              /* where to jump to find handler */
    int b_level;                /* value stack level to pop to */
} PyTryBlock;

struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
       Frame evaluation usually NULLs it, but a frame that yields sets it
       to the current stack top. */
    PyObject **f_stacktop;
    PyObject *f_trace;          /* Trace function */
    char f_trace_lines;         /* Emit per-line trace events? */
    char f_trace_opcodes;       /* Emit per-opcode trace events? */

    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* Call PyFrame_GetLineNumber() instead of reading this field
       directly.  As of 2.3 f_lineno is only valid when tracing is
       active (i.e. when f_trace is set).  At other times we use
       PyCode_Addr2Line to calculate the line from the current
       bytecode index. */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
};


/* Standard object interface */

PyAPI_DATA(PyTypeObject) PyFrame_Type;

#define PyFrame_Check(op) Py_IS_TYPE(op, &PyFrame_Type)

PyAPI_FUNC(PyFrameObject *) PyFrame_New(PyThreadState *, PyCodeObject *,
                                        PyObject *, PyObject *);

/* only internal use */
PyFrameObject* _PyFrame_New_NoTrack(PyThreadState *, PyCodeObject *,
                                    PyObject *, PyObject *);


/* The rest of the interface is specific for frame objects */

/* Block management functions */

PyAPI_FUNC(void) PyFrame_BlockSetup(PyFrameObject *, int, int, int);
PyAPI_FUNC(PyTryBlock *) PyFrame_BlockPop(PyFrameObject *);

/* Conversions between "fast locals" and locals in dictionary */

PyAPI_FUNC(void) PyFrame_LocalsToFast(PyFrameObject *, int);

PyAPI_FUNC(int) PyFrame_FastToLocalsWithError(PyFrameObject *f);
PyAPI_FUNC(void) PyFrame_FastToLocals(PyFrameObject *);

PyAPI_FUNC(void) _PyFrame_DebugMallocStats(FILE *out);

PyAPI_FUNC(PyFrameObject *) PyFrame_GetBack(PyFrameObject *frame);

#ifdef __cplusplus
}
#endif

栈帧保存了代码块(函数)的信息和上下文,包含最后执行的指令、全局和局部命名空间、异常状态等信息。每一个栈帧都有自己的数据栈和block栈,独立的数据栈和block栈使得cpython解释器可以中断和恢复栈帧,生成器也正是利用了这点。

python代码首先被编译为多条字节码,再由python虚拟机来执行。可以用dis(func)来分析字节码。

代码语言:javascript
代码运行次数:0
复制
from dis import dis

def foo():
    x = 1
    def bar(y):
        z = y + 2
        return z
    return bar(x)


print(foo())
print(dis(foo))

经过上面对于调用栈的理解,再来看下生成器的具体实现。生成器的源码位于object/genobject.c。

代码语言:javascript
代码运行次数:0
复制
PyObject *
PyGen_New(PyFrameObject *f)
{
    return gen_new_with_qualname(&PyGen_Type, f, NULL, NULL);
}
代码语言:javascript
代码运行次数:0
复制
static PyObject *
gen_new_with_qualname(PyTypeObject *type, PyFrameObject *f,
                      PyObject *name, PyObject *qualname)
{
    PyGenObject *gen = PyObject_GC_New(PyGenObject, type); # 创建生成器对象
    if (gen == NULL) {
        Py_DECREF(f);
        return NULL;
    }
    gen->gi_frame = f;  # 赋予代码块
    f->f_gen = (PyObject *) gen;
    Py_INCREF(f->f_code);  # 引用计数+1
    gen->gi_code = (PyObject *)(f->f_code);
    gen->gi_running = 0;  # 0表示为执行,即生成器的初始状态
    gen->gi_weakreflist = NULL;
    gen->gi_exc_state.exc_type = NULL;
    gen->gi_exc_state.exc_value = NULL;
    gen->gi_exc_state.exc_traceback = NULL;
    gen->gi_exc_state.previous_item = NULL;
    if (name != NULL)
        gen->gi_name = name;
    else
        gen->gi_name = ((PyCodeObject *)gen->gi_code)->co_name;
    Py_INCREF(gen->gi_name);
    if (qualname != NULL)
        gen->gi_qualname = qualname;
    else
        gen->gi_qualname = gen->gi_name;
    Py_INCREF(gen->gi_qualname);
    _PyObject_GC_TRACK(gen);  # GC跟踪
    return (PyObject *)gen;
}

对于next和send函数我们来看下源码是怎么实现的。

代码语言:javascript
代码运行次数:0
复制
static PyObject *
gen_iternext(PyGenObject *gen)
{
    return gen_send_ex(gen, NULL, 0, 0);
}

PyObject *
_PyGen_Send(PyGenObject *gen, PyObject *arg)
{
    return gen_send_ex(gen, arg, 0, 0);
}

从上面的代码中可以看到,send和next都是调用的同一函数gen_send_ex,区别在于是否带有参数。

代码语言:javascript
代码运行次数:0
复制
static PyObject *
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc, int closing)
{
    PyThreadState *tstate = _PyThreadState_GET();
    PyFrameObject *f = gen->gi_frame;
    PyObject *result;

    if (gen->gi_running) {  # 判断生成器是否已经运行
        const char *msg = "generator already executing";
        if (PyCoro_CheckExact(gen)) {
            msg = "coroutine already executing";
        }
        else if (PyAsyncGen_CheckExact(gen)) {
            msg = "async generator already executing";
        }
        PyErr_SetString(PyExc_ValueError, msg);
        return NULL;
    }
    if (f == NULL || f->f_stacktop == NULL) {   # 如果代码块为空或调用栈为空,则抛出StopIteration异常
        if (PyCoro_CheckExact(gen) && !closing) {
            /* `gen` is an exhausted coroutine: raise an error,
               except when called from gen_close(), which should
               always be a silent method. */
            PyErr_SetString(
                PyExc_RuntimeError,
                "cannot reuse already awaited coroutine");
        }
        else if (arg && !exc) {
            /* `gen` is an exhausted generator:
               only set exception if called from send(). */
            if (PyAsyncGen_CheckExact(gen)) {
                PyErr_SetNone(PyExc_StopAsyncIteration);
            }
            else {
                PyErr_SetNone(PyExc_StopIteration);
            }
        }
        return NULL;
    }

    if (f->f_lasti == -1) {  # f_lasti=-1 代表首次执行
        if (arg && arg != Py_None) {  # 首次执行不允许带有参数
            const char *msg = "can't send non-None value to a "
                              "just-started generator";
            if (PyCoro_CheckExact(gen)) {
                msg = NON_INIT_CORO_MSG;
            }
            else if (PyAsyncGen_CheckExact(gen)) {
                msg = "can't send non-None value to a "
                      "just-started async generator";
            }
            PyErr_SetString(PyExc_TypeError, msg);
            return NULL;
        }
    } else {
        /* Push arg onto the frame's value stack */
        result = arg ? arg : Py_None;
        Py_INCREF(result);  # 该参数引用计数+1
        *(f->f_stacktop++) = result;  # 参数压栈
    }

    /* Generators always return to their most recent caller, not
     * necessarily their creator. */
    Py_XINCREF(tstate->frame);
    assert(f->f_back == NULL);
    f->f_back = tstate->frame;

    gen->gi_running = 1;  # 修改生成器执行状态
    gen->gi_exc_state.previous_item = tstate->exc_info;
    tstate->exc_info = &gen->gi_exc_state;

    if (exc) {
        assert(_PyErr_Occurred(tstate));
        _PyErr_ChainStackItem(NULL);
    }

    result = _PyEval_EvalFrame(tstate, f, exc);  # 执行字节码
    tstate->exc_info = gen->gi_exc_state.previous_item;
    gen->gi_exc_state.previous_item = NULL;
    gen->gi_running = 0;  # 恢复为未执行状态

    /* Don't keep the reference to f_back any longer than necessary.  It
     * may keep a chain of frames alive or it could create a reference
     * cycle. */
    assert(f->f_back == tstate->frame);
    Py_CLEAR(f->f_back);

    /* If the generator just returned (as opposed to yielding), signal
     * that the generator is exhausted. */
    if (result && f->f_stacktop == NULL) {
        if (result == Py_None) {
            /* Delay exception instantiation if we can */
            if (PyAsyncGen_CheckExact(gen)) {
                PyErr_SetNone(PyExc_StopAsyncIteration);
            }
            else {
                PyErr_SetNone(PyExc_StopIteration);
            }
        }
        else {
            /* Async generators cannot return anything but None */
            assert(!PyAsyncGen_CheckExact(gen));
            _PyGen_SetStopIterationValue(result);
        }
        Py_CLEAR(result);
    }
    else if (!result && PyErr_ExceptionMatches(PyExc_StopIteration)) {
        const char *msg = "generator raised StopIteration";
        if (PyCoro_CheckExact(gen)) {
            msg = "coroutine raised StopIteration";
        }
        else if (PyAsyncGen_CheckExact(gen)) {
            msg = "async generator raised StopIteration";
        }
        _PyErr_FormatFromCause(PyExc_RuntimeError, "%s", msg);

    }
    else if (!result && PyAsyncGen_CheckExact(gen) &&
             PyErr_ExceptionMatches(PyExc_StopAsyncIteration))
    {
        /* code in `gen` raised a StopAsyncIteration error:
           raise a RuntimeError.
        */
        const char *msg = "async generator raised StopAsyncIteration";
        _PyErr_FormatFromCause(PyExc_RuntimeError, "%s", msg);
    }

    if (!result || f->f_stacktop == NULL) {
        /* generator can't be rerun, so release the frame */
        /* first clean reference cycle through stored exception traceback */
        _PyErr_ClearExcState(&gen->gi_exc_state);
        gen->gi_frame->f_gen = NULL;
        gen->gi_frame = NULL;
        Py_DECREF(f);
    }

    return result;
}

send()和next()的区别就在于send可传递参数给yield表达式,这时候传递的参数就会作为yield表达式的值,而yield的参数是返回给调用者的值,也就是说send可以强行修改上一个yield表达式值。

第一次调用时候必须先next()或send(),否则会报错,send后之所以为None是因为这时候没有上一个yield,所以也可以认为next()等同于send(None)。我们再通过一个例子看下:

代码语言:javascript
代码运行次数:0
复制
def s():
    print('study yield')
    m = yield 2
    print(m)
    d = yield 21
    print('go on!')


c = s()
s_d1 = next(c)  # 相当于send(None)
s_d2 = c.send('Fighting!')  # (yield 5)表达式被赋予了'Fighting!'
print('My Birth Day:', s_d1, '.', s_d2)

从之前的源码中我们看到_PyEval_EvalFrame这个函数的功能为执行字节码并返回结果,我们继续用之前的例子测试:

代码语言:javascript
代码运行次数:0
复制
import sys
from dis import dis

def func():
    f = sys._getframe(0)
    print(f.f_lasti)  # f_lasti上一次执行的指令的偏移量
    print(f.f_back)  # f_back上一个Frame
    yield 1

    print(f.f_lasti)
    print(f.f_back)
    yield 2


a = func()
print(dis(func))
print(a.__next__())
print(a.__next__())
代码语言:javascript
代码运行次数:0
复制
 # 运行结果如下:
 26           0 LOAD_GLOBAL              0 (sys)
              2 LOAD_METHOD              1 (_getframe)
              4 LOAD_CONST               1 (0)
              6 CALL_METHOD              1
              8 STORE_FAST               0 (f)

 27          10 LOAD_GLOBAL              2 (print)
             12 LOAD_FAST                0 (f)
             14 LOAD_ATTR                3 (f_lasti)
             16 CALL_FUNCTION            1
             18 POP_TOP

 28          20 LOAD_GLOBAL              2 (print)
             22 LOAD_FAST                0 (f)
             24 LOAD_ATTR                4 (f_back)
             26 CALL_FUNCTION            1
             28 POP_TOP

 29          30 LOAD_CONST               2 (1)
             32 YIELD_VALUE             # 此时操作码为YIELD_VALUE,直接跳转上述goto语句,此时f_lasti为当前指令,f_back为当前frame
             34 POP_TOP

 31          36 LOAD_GLOBAL              2 (print)
             38 LOAD_FAST                0 (f)
             40 LOAD_ATTR                3 (f_lasti)
             42 CALL_FUNCTION            1
             44 POP_TOP

 32          46 LOAD_GLOBAL              2 (print)
             48 LOAD_FAST                0 (f)
             50 LOAD_ATTR                4 (f_back)
             52 CALL_FUNCTION            1
             54 POP_TOP

 33          56 LOAD_CONST               3 (2)
             58 YIELD_VALUE
             60 POP_TOP
             62 LOAD_CONST               0 (None)
             64 RETURN_VALUE
None
14
<frame at 0x000001DB7CC7D440, file 'D:/python_basic_practice/面试复习汇总/yield.py', line 38, code <module>>
#和下面的frame相同,属于同一个frame,也就是说在同一个函数(命名空间)内,frame是同一个。
1
40
<frame at 0x000001DB7CC7D440, file 'D:/python_basic_practice/面试复习汇总/yield.py', line 39, code <module>>
2

Process finished with exit code 0

参考:

https://www.python.org/downloads/release/python-395/

https://www.cnblogs.com/abdm-989/p/14398404.html

http://www.cnblogs.com/coder2012

https://blog.csdn.net/qq_33254870/article/details/85054559

END

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-05-18,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 才浅coding攻略 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档