阿巩
多日不见,甚是想念!!
最近两天因为一些原因小阿巩竟然咕咕咕了(呵,女人),具体原因先卖个关子,之后会水篇文和大伙分享。yield也是面试中常考的知识点,关于yield或者说关于python的生成器你真的了解吗?yield关键字到底返回了什么?在python中底层又是如何实现的呢?今天阿巩将从python源码出发和大家分享下这个神秘的关键字。日拱一卒,让我们开始吧!
yield是一个类似return 的关键字,在Python中如果一个函数中包含yield,我们就将它认为是一个generator生成器,每一个生成器都是一个迭代器(但迭代器不一定是生成器)。
函数执行过程中遇到一个yield就会中断一次,返回一个迭代值,函数保存自己的变量和状态(python 的 generator 只保留栈帧上下文)。下次迭代时从yield下一条语句继续执行,函数恢复之前状态,直到遇到下一个yield返回迭代值,如此循环。
迭代器是什么呢?最常见的就是for语句,Python内部就是把for后面的对象使用了内建函数iter,这个iter返回一个有迭代能力的对象,它主要映射到类里的__iter__魔术方法,这个函数返回的是一个实现了__next__魔术方法的对象,如果你自己实现了,可以返回self。
我们说的迭代器就是同时实现了__iter__和__next__的对象,并在__next__迭代完后抛出一个StopIteration异常。
带有yield的函数不仅仅是只用于for循环,而且可用于某个函数的参数,只要这个函数的参数也允许迭代参数。
下面通过一个经典的例子,看下使用了生成器的效果:
这是一个生成无限序列的例子,要求生成一个满足某条件的大列表,这个列表需要保存在内存中,很明显内存限制了这个问题。
def get_primes(start): # 使用return返回
for element in magic_infinite_range(start):
if is_prime(element):
return element
def get_primes(number): # 使用yield返回
while True:
if is_prime(number):
yield number
number += 1
使用生成器不需要返回整个列表,每次都只返回一个数据,避免了内存的限制问题。
我们来看下python 3.9.5中生成器部分的源码,首先看下python虚拟机的调用原理。python虚拟机的栈帧位置在\Include\cpython\frameobject.h
/* Frame object interface */
#ifndef Py_CPYTHON_FRAMEOBJECT_H
# error "this header file must not be included directly"
#endif
#ifdef __cplusplus
extern "C" {
#endif
typedef struct {
int b_type; /* what kind of block this is */
int b_handler; /* where to jump to find handler */
int b_level; /* value stack level to pop to */
} PyTryBlock;
struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* global symbol table (PyDictObject) */
PyObject *f_locals; /* local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop;
PyObject *f_trace; /* Trace function */
char f_trace_lines; /* Emit per-line trace events? */
char f_trace_opcodes; /* Emit per-opcode trace events? */
/* Borrowed reference to a generator, or NULL */
PyObject *f_gen;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
char f_executing; /* whether the frame is still executing */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
};
/* Standard object interface */
PyAPI_DATA(PyTypeObject) PyFrame_Type;
#define PyFrame_Check(op) Py_IS_TYPE(op, &PyFrame_Type)
PyAPI_FUNC(PyFrameObject *) PyFrame_New(PyThreadState *, PyCodeObject *,
PyObject *, PyObject *);
/* only internal use */
PyFrameObject* _PyFrame_New_NoTrack(PyThreadState *, PyCodeObject *,
PyObject *, PyObject *);
/* The rest of the interface is specific for frame objects */
/* Block management functions */
PyAPI_FUNC(void) PyFrame_BlockSetup(PyFrameObject *, int, int, int);
PyAPI_FUNC(PyTryBlock *) PyFrame_BlockPop(PyFrameObject *);
/* Conversions between "fast locals" and locals in dictionary */
PyAPI_FUNC(void) PyFrame_LocalsToFast(PyFrameObject *, int);
PyAPI_FUNC(int) PyFrame_FastToLocalsWithError(PyFrameObject *f);
PyAPI_FUNC(void) PyFrame_FastToLocals(PyFrameObject *);
PyAPI_FUNC(void) _PyFrame_DebugMallocStats(FILE *out);
PyAPI_FUNC(PyFrameObject *) PyFrame_GetBack(PyFrameObject *frame);
#ifdef __cplusplus
}
#endif
栈帧保存了代码块(函数)的信息和上下文,包含最后执行的指令、全局和局部命名空间、异常状态等信息。每一个栈帧都有自己的数据栈和block栈,独立的数据栈和block栈使得cpython解释器可以中断和恢复栈帧,生成器也正是利用了这点。
python代码首先被编译为多条字节码,再由python虚拟机来执行。可以用dis(func)来分析字节码。
from dis import dis
def foo():
x = 1
def bar(y):
z = y + 2
return z
return bar(x)
print(foo())
print(dis(foo))
经过上面对于调用栈的理解,再来看下生成器的具体实现。生成器的源码位于object/genobject.c。
PyObject *
PyGen_New(PyFrameObject *f)
{
return gen_new_with_qualname(&PyGen_Type, f, NULL, NULL);
}
static PyObject *
gen_new_with_qualname(PyTypeObject *type, PyFrameObject *f,
PyObject *name, PyObject *qualname)
{
PyGenObject *gen = PyObject_GC_New(PyGenObject, type); # 创建生成器对象
if (gen == NULL) {
Py_DECREF(f);
return NULL;
}
gen->gi_frame = f; # 赋予代码块
f->f_gen = (PyObject *) gen;
Py_INCREF(f->f_code); # 引用计数+1
gen->gi_code = (PyObject *)(f->f_code);
gen->gi_running = 0; # 0表示为执行,即生成器的初始状态
gen->gi_weakreflist = NULL;
gen->gi_exc_state.exc_type = NULL;
gen->gi_exc_state.exc_value = NULL;
gen->gi_exc_state.exc_traceback = NULL;
gen->gi_exc_state.previous_item = NULL;
if (name != NULL)
gen->gi_name = name;
else
gen->gi_name = ((PyCodeObject *)gen->gi_code)->co_name;
Py_INCREF(gen->gi_name);
if (qualname != NULL)
gen->gi_qualname = qualname;
else
gen->gi_qualname = gen->gi_name;
Py_INCREF(gen->gi_qualname);
_PyObject_GC_TRACK(gen); # GC跟踪
return (PyObject *)gen;
}
对于next和send函数我们来看下源码是怎么实现的。
static PyObject *
gen_iternext(PyGenObject *gen)
{
return gen_send_ex(gen, NULL, 0, 0);
}
PyObject *
_PyGen_Send(PyGenObject *gen, PyObject *arg)
{
return gen_send_ex(gen, arg, 0, 0);
}
从上面的代码中可以看到,send和next都是调用的同一函数gen_send_ex,区别在于是否带有参数。
static PyObject *
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc, int closing)
{
PyThreadState *tstate = _PyThreadState_GET();
PyFrameObject *f = gen->gi_frame;
PyObject *result;
if (gen->gi_running) { # 判断生成器是否已经运行
const char *msg = "generator already executing";
if (PyCoro_CheckExact(gen)) {
msg = "coroutine already executing";
}
else if (PyAsyncGen_CheckExact(gen)) {
msg = "async generator already executing";
}
PyErr_SetString(PyExc_ValueError, msg);
return NULL;
}
if (f == NULL || f->f_stacktop == NULL) { # 如果代码块为空或调用栈为空,则抛出StopIteration异常
if (PyCoro_CheckExact(gen) && !closing) {
/* `gen` is an exhausted coroutine: raise an error,
except when called from gen_close(), which should
always be a silent method. */
PyErr_SetString(
PyExc_RuntimeError,
"cannot reuse already awaited coroutine");
}
else if (arg && !exc) {
/* `gen` is an exhausted generator:
only set exception if called from send(). */
if (PyAsyncGen_CheckExact(gen)) {
PyErr_SetNone(PyExc_StopAsyncIteration);
}
else {
PyErr_SetNone(PyExc_StopIteration);
}
}
return NULL;
}
if (f->f_lasti == -1) { # f_lasti=-1 代表首次执行
if (arg && arg != Py_None) { # 首次执行不允许带有参数
const char *msg = "can't send non-None value to a "
"just-started generator";
if (PyCoro_CheckExact(gen)) {
msg = NON_INIT_CORO_MSG;
}
else if (PyAsyncGen_CheckExact(gen)) {
msg = "can't send non-None value to a "
"just-started async generator";
}
PyErr_SetString(PyExc_TypeError, msg);
return NULL;
}
} else {
/* Push arg onto the frame's value stack */
result = arg ? arg : Py_None;
Py_INCREF(result); # 该参数引用计数+1
*(f->f_stacktop++) = result; # 参数压栈
}
/* Generators always return to their most recent caller, not
* necessarily their creator. */
Py_XINCREF(tstate->frame);
assert(f->f_back == NULL);
f->f_back = tstate->frame;
gen->gi_running = 1; # 修改生成器执行状态
gen->gi_exc_state.previous_item = tstate->exc_info;
tstate->exc_info = &gen->gi_exc_state;
if (exc) {
assert(_PyErr_Occurred(tstate));
_PyErr_ChainStackItem(NULL);
}
result = _PyEval_EvalFrame(tstate, f, exc); # 执行字节码
tstate->exc_info = gen->gi_exc_state.previous_item;
gen->gi_exc_state.previous_item = NULL;
gen->gi_running = 0; # 恢复为未执行状态
/* Don't keep the reference to f_back any longer than necessary. It
* may keep a chain of frames alive or it could create a reference
* cycle. */
assert(f->f_back == tstate->frame);
Py_CLEAR(f->f_back);
/* If the generator just returned (as opposed to yielding), signal
* that the generator is exhausted. */
if (result && f->f_stacktop == NULL) {
if (result == Py_None) {
/* Delay exception instantiation if we can */
if (PyAsyncGen_CheckExact(gen)) {
PyErr_SetNone(PyExc_StopAsyncIteration);
}
else {
PyErr_SetNone(PyExc_StopIteration);
}
}
else {
/* Async generators cannot return anything but None */
assert(!PyAsyncGen_CheckExact(gen));
_PyGen_SetStopIterationValue(result);
}
Py_CLEAR(result);
}
else if (!result && PyErr_ExceptionMatches(PyExc_StopIteration)) {
const char *msg = "generator raised StopIteration";
if (PyCoro_CheckExact(gen)) {
msg = "coroutine raised StopIteration";
}
else if (PyAsyncGen_CheckExact(gen)) {
msg = "async generator raised StopIteration";
}
_PyErr_FormatFromCause(PyExc_RuntimeError, "%s", msg);
}
else if (!result && PyAsyncGen_CheckExact(gen) &&
PyErr_ExceptionMatches(PyExc_StopAsyncIteration))
{
/* code in `gen` raised a StopAsyncIteration error:
raise a RuntimeError.
*/
const char *msg = "async generator raised StopAsyncIteration";
_PyErr_FormatFromCause(PyExc_RuntimeError, "%s", msg);
}
if (!result || f->f_stacktop == NULL) {
/* generator can't be rerun, so release the frame */
/* first clean reference cycle through stored exception traceback */
_PyErr_ClearExcState(&gen->gi_exc_state);
gen->gi_frame->f_gen = NULL;
gen->gi_frame = NULL;
Py_DECREF(f);
}
return result;
}
send()和next()的区别就在于send可传递参数给yield表达式,这时候传递的参数就会作为yield表达式的值,而yield的参数是返回给调用者的值,也就是说send可以强行修改上一个yield表达式值。
第一次调用时候必须先next()或send(),否则会报错,send后之所以为None是因为这时候没有上一个yield,所以也可以认为next()等同于send(None)。我们再通过一个例子看下:
def s():
print('study yield')
m = yield 2
print(m)
d = yield 21
print('go on!')
c = s()
s_d1 = next(c) # 相当于send(None)
s_d2 = c.send('Fighting!') # (yield 5)表达式被赋予了'Fighting!'
print('My Birth Day:', s_d1, '.', s_d2)
从之前的源码中我们看到_PyEval_EvalFrame这个函数的功能为执行字节码并返回结果,我们继续用之前的例子测试:
import sys
from dis import dis
def func():
f = sys._getframe(0)
print(f.f_lasti) # f_lasti上一次执行的指令的偏移量
print(f.f_back) # f_back上一个Frame
yield 1
print(f.f_lasti)
print(f.f_back)
yield 2
a = func()
print(dis(func))
print(a.__next__())
print(a.__next__())
# 运行结果如下:
26 0 LOAD_GLOBAL 0 (sys)
2 LOAD_METHOD 1 (_getframe)
4 LOAD_CONST 1 (0)
6 CALL_METHOD 1
8 STORE_FAST 0 (f)
27 10 LOAD_GLOBAL 2 (print)
12 LOAD_FAST 0 (f)
14 LOAD_ATTR 3 (f_lasti)
16 CALL_FUNCTION 1
18 POP_TOP
28 20 LOAD_GLOBAL 2 (print)
22 LOAD_FAST 0 (f)
24 LOAD_ATTR 4 (f_back)
26 CALL_FUNCTION 1
28 POP_TOP
29 30 LOAD_CONST 2 (1)
32 YIELD_VALUE # 此时操作码为YIELD_VALUE,直接跳转上述goto语句,此时f_lasti为当前指令,f_back为当前frame
34 POP_TOP
31 36 LOAD_GLOBAL 2 (print)
38 LOAD_FAST 0 (f)
40 LOAD_ATTR 3 (f_lasti)
42 CALL_FUNCTION 1
44 POP_TOP
32 46 LOAD_GLOBAL 2 (print)
48 LOAD_FAST 0 (f)
50 LOAD_ATTR 4 (f_back)
52 CALL_FUNCTION 1
54 POP_TOP
33 56 LOAD_CONST 3 (2)
58 YIELD_VALUE
60 POP_TOP
62 LOAD_CONST 0 (None)
64 RETURN_VALUE
None
14
<frame at 0x000001DB7CC7D440, file 'D:/python_basic_practice/面试复习汇总/yield.py', line 38, code <module>>
#和下面的frame相同,属于同一个frame,也就是说在同一个函数(命名空间)内,frame是同一个。
1
40
<frame at 0x000001DB7CC7D440, file 'D:/python_basic_practice/面试复习汇总/yield.py', line 39, code <module>>
2
Process finished with exit code 0
参考:
https://www.python.org/downloads/release/python-395/
https://www.cnblogs.com/abdm-989/p/14398404.html
http://www.cnblogs.com/coder2012
https://blog.csdn.net/qq_33254870/article/details/85054559
END