文章/答案/技术大牛

发布

社区首页 >专栏 >【深度学习】Tensorflow2.x入门（一）建立模型的三种模式

【深度学习】Tensorflow2.x入门（一）建立模型的三种模式

黄博的机器学习圈子

发布于 2020-12-11 01:47:40

1.8K00

代码可运行

文章被收录于专栏：机器学习初学者精选文章机器学习初学者精选文章

运行总次数：0

代码可运行

前言

最近做实验比较焦虑，因此准备结合推荐算法梳理下Tensorflow2.x的知识。介绍Tensorflow2.x的文章有很多，但本文（系列）是按照作者构建模型的思路来展开的，因此不会从Eager Execution开始。另外，尽量摆脱小白文，加入自己的理解。本文约2.7k字，预计阅读10分钟。

Tensorflow2.x的三种建模方式

Tensorflow2.x创建模型的方式主要有三种：

Sequential API，顺序模型；
Function API，函数式模型；
Subclassing API，子类化模型；

其中Sequential API只适用于简单的层堆叠，很难实现复杂模型，而Function API与Subclassing API各有优劣，也不必区分，因为可以进行混搭。

1. Sequential API

顺序API是layer-by-layer的方式，适用于简单的层堆栈，但对于构建多输入、多输出的模型难以实现。个人并不推荐使用这种方式构建模型，因此简单放个例子：

model = Sequential(
    [
        Input(shape=(3,)),
        Dense(2, activation='relu', name='layer1'),
        Dense(3, activation='relu', name='layer2'),
        Dense(4, name='layer3'),
    ]
)

2. Function API

函数式API能很好的处理非线性拓扑、共享层、具有多输入多输出的模型。且模型通常都是层的有向无环图（DAG），因此函数式API是构建层计算图的一种方式。

以下是Encoder-Decoder结构：

def get_models():
    encoder_input = Input(shape=(28, 28, 1), name="img")
    x = layers.Conv2D(16, 3, activation="relu")(encoder_input)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.MaxPooling2D(3)(x)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.Conv2D(16, 3, activation="relu")(x)
    encoder_output = layers.GlobalMaxPooling2D()(x)

    x = layers.Reshape((4, 4, 1))(encoder_output)
    x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
    x = layers.Conv2DTranspose(32, 3, activation="relu")(x)
    x = layers.UpSampling2D(3)(x)
    x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
    decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(x)
    
  autoencoder = Model(encoder_input, decoder_output, name="autoencoder")

    return encoder, autoencoder

有时候，内置的tf.keras层并不满足我们构建复杂的模型，因此需要实现Subclassing API中的自定义层。

3. Subclassing API

子类化API是通过继承tf.keras.layers.Layer类或tf.keras.Model类的自定义层和自定义模型。它们与函数式API并不冲突，特别是自定义层---创建自己的层来扩展API，很方便的与函数式API结合构建模型。

3.1 Layer类

Keras的一个中心抽象是Layer类。层封装了状态（权重）和从输入到输出的转换（层的前向传播）。

一个简单的线性层定义如下：

class Linear(keras.layers.Layer):
    def __init__(self, units=32, input_dim=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.w = self.add_weight(
            shape=(input_dim, units), initializer="random_normal", trainable=True
        )
        self.b = self.add_weight(shape=(units,), initializer="zeros", trainable=True)

    def call(self, inputs, **kwargs):
        return tf.matmul(inputs, self.w) + self.b

有几个注意点：

可以在__iniit__()方法中创建类子层（tf.keras的内置层API，或者是自定义的），并可以在call()中调用；
在定义变量时，有时候会看到：

  w_init = tf.random_normal_initializer()
          self.w = tf.Variable(
              initial_value=w_init(shape=(input_dim, units), dtype="float32"),
              trainable=True,
          )

与add_weight()方法相同，但上述需要先定义初始化，再构造变量，而add_weight()可以在定义变量的同时进行初始化，推荐add_weight()方法；

有时候变量会定义在build(self, input_shape)方法中，一种是因为书写习惯，另一种更重要的原因是「有时候事先并不知道输入的大小（即没有input_dim)，希望在对层实例化后的某个时间再延迟创建权重」：

  def build(self, input_shape):
          self.w = self.add_weight(
              shape=(input_shape[-1], self.units),
              initializer="random_normal",
              trainable=True,
          )
          self.b = self.add_weight(
              shape=(self.units,), initializer="random_normal", trainable=True
          )

其中input_shape代表输入的形状；

call(self, inputs, **kwargs)，其中inputs是张量或张量的嵌套结构（多输入，张量列表），**kwargs是非张量参数。更一般的，call()方法应该为：

  call(self, inputs, training=None, mask=None, **kwargs):

training和mask是call()方法中的特权参数，training针对BatchNormalization和Dropout层在训练和推断期间具有不同的行为，mask则是当先前层生成了掩码时，Keras会自动将正确的mask传递给__call__()，具体可见下文。

3.2 Model类

Layer类通常是来定义内部的计算模块，例如一个FM、self-attention等，Model类则是用来定义整个外部模型，例如DeepFM、SASRec等。

Model类与Layer具有相同的API，但有以下区别：

Model会公开内置训练fit()、评估evaluate()、预测predict()；
model.layers属性会公开其内部层的列表；
会公开保存和序列化API（save()、save_weights()）；

例如：

class MyModel(keras.Model):
  def __init__(self, units=32, **kwargs):
    super(MyModel, self).__init(**kwrags)
    self.units = units
    self.linear = Linear(self.units)  # 去除input_dim
    
 def call(self, inputs, **kwargs):
    outputs = self.linear(inputs)
    return outputs
  

model = MyModel(32)
# model.compile(...)
# model.fit(...)

3.3 call()方法

上述提到，call()中包含两个特权参数，training和mask。

「training」：

模型中，BatchNormalization和Dropout层，在训练和推断期间具有不同的行为（简单说一下「推断」的含义，模型经过训练后，可以高效的从新的数据推断各种结论，即「预测」）。我们简单来看一下Dropout与BatchNormalizationAPI中的描述：

❝Dropout： Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer. ❞

简单来说，当traning=True时，dropout不会在推理（inference）中起作用。在训练时，自动默认为True。

❝BatchNormalization: 「training」: Python boolean indicating whether the layer should behave in training mode or in inference mode.

training=True: The layer will normalize its inputs using the mean and variance of the current batch of inputs.
training=False: The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.

❞

在call()方法中，当training=True时，使用当前batch的输入平均值和方差对输入进行归一化，training=False则是使用在「训练期间」学习到的移动统计数据的均值与方差做归一化。

所以training是一个布尔参数，call()方法通过公开它，用来控制模型在哪个模式下运行（训练或推断）。

【注】对于Dropout层，默认即可，而BatchNormalization则需要自己考量，另外training与trainable是不同的，trainable=False是来冻结该层的，具体的可以看API。「当然可以不指定training，因为在fit()时，模型会根据相应的阶段（训练、推断）决定使用training值。」

「mask」：

对于mask参数，当我们构建Attention机制或者序列模型时会使用到。如果先前的层生成了掩码，这里特别指的是tf.keras.layers.Embedding层，它包含了mask_zero参数，如果指定为True，那么Keras会自动将正确的mask参数传递给__call__()【函数式 API中，掩码会自动传播】。

当然如果不使用mask参数，对于生成掩码的层Embedding也会公开一个compute_mask(input, previous_mask)方法计算mask；

class MyLayer(layers.Layer):
    def __init__(self, **kwargs):
        super(MyLayer, self).__init__(**kwargs)
        self.embedding = layers.Embedding(input_dim=5000, output_dim=16, mask_zero=True)
        self.lstm = layers.LSTM(32)

    def call(self, inputs):
        x = self.embedding(inputs)
        # Note that you could also prepare a `mask` tensor manually.
        # It only needs to be a boolean tensor
        # with the right shape, i.e. (batch_size, timesteps).
        mask = self.embedding.compute_mask(inputs)
        output = self.lstm(x, mask=mask)  # The layer will ignore the masked values
        return output