Module

Module是BigDL中网络构建的基本单位，网络的每一种层都实现为一个Module.

首先来看一下Module的继承体系.

class-structure

说明一下, 最底层的是AbstractModule抽象类, 这是所有Module的基类.

然后是它的一个子类TensorModule, 这也是一个抽象类, 大部分层都是继承这个类, 比如全连接, 卷积等.

另外它还有几个子类

Container Module的容器, 可以理解为把多个Module打包成一个
Cell 为循环神经元设计的, 比如RNNCell, LSTMCell都继承自这个类
CAddTable, CSubTable... 这是其他的多输入多输出的比较特殊的Module

Container, Sequenstial, Graph

请参考 Containers

注意这里有两个比较特殊的容器, Sequenstial和Graph, 这是我们构建模型的顶层API.

再来回顾一下模型定义的两种方式.

采用Sequential的形式定义为：

val model = Sequential()
model.add(Linear(...))
model.add(Sigmoid())
model.add(Softmax())

Functional的形式为：

val linear = Linear(...).inputs()
val sigmoid = Sigmoid().inputs(linear)
val softmax = Softmax().inputs(sigmoid)
val model = Graph(Seq[linear], Seq[softmax])

前者是向Sequential容器内不断添加新的层, 后者则是使用各模块本身的inputs方法连接起来, 最后使用输入节点和输出节点构建一个Graph.

AbstractModule

com.intel.analytics.bigdl.nn.abstractnn包内定义了AbstractModule，它是所有Module的原始基类：

package com.intel.analytics.bigdl.nn.abstractnn
/**
 * Module is the basic component of a neural network. It forward activities and backward gradients.
 * Modules can connect to others to construct a complex neural network.
 *
 * @tparam A Input data type
 * @tparam B Output data type
 * @tparam T The numeric type in this module parameters.
 */
abstract class AbstractModule[A <: Activity: ClassTag, B <: Activity: ClassTag, T: ClassTag](
  implicit ev: TensorNumeric[T]) extends Serializable with InferShape

这是个泛型类，Abstract[A,B,T]，有3个参数A,B,T，A是输入的类型，B是输出的类型，都要求是Activity的子类，然后T是此Module使用参数的类型，可以是Double或者Float.

一个Module的核心功能肯定是前向传播进行推断和反向传播更新梯度, 分别对应forward和backward方法, 这两个方法是给出具体实现了的:

  /**
   * Takes an input object, and computes the corresponding output of the module. After a forward,
   * the output state variable should have been updated to the new value.
   *
   * @param input input data
   * @return output data
   */
  final def forward(input: A): B = {
    val before = System.nanoTime()
    try {
      updateOutput(input)
    } catch {
      case l: LayerException =>
        l.layerMsg = this.toString() + "/" + l.layerMsg
        throw l
      case e: Throwable =>
        throw new LayerException(this.toString(), e)
    }
    forwardTime += System.nanoTime() - before

    output
  }

  /**
   * Performs a back-propagation step through the module, with respect to the given input. In
   * general this method makes the assumption forward(input) has been called before, with the same
   * input. This is necessary for optimization reasons. If you do not respect this rule, backward()
   * will compute incorrect gradients.
   *
   * @param input input data
   * @param gradOutput gradient of next layer
   * @return gradient corresponding to input data
   */
  def backward(input: A, gradOutput: B): A = {
    val before = System.nanoTime()
    updateGradInput(input, gradOutput)
    accGradParameters(input, gradOutput)
    backwardTime += System.nanoTime() - before

    gradInput
  }

暂时可以只关注forward方法, 可以发现它就是调用了updateOutput(input), 然后做一些统计工作.

  /**
   * Computes the output using the current parameter set of the class and input. This function
   * returns the result which is stored in the output field.
   *
   * @param input
   * @return
   */
  def updateOutput(input: A): B

这是个抽象方法, 留给子类去实现. 我们继承这个类然后实现这个方法即可实现前向传播进行inference了.

TensorModule

上面定义了Module的抽象类, 然后BigDL具体使用的是它的一个子类TensorModule[T], 分别将AbstractModule的三个参数类型设为Tensor[T], Tensor[T], T, 也就是输入输出都是Tensor类型, 这样就可以把具体的计算过程全部使用Tensor的运算实现.

/**
 * [[TensorModule]] is an abstract sub-class of [[AbstractModule]], whose
 * input and output type both are [[Tensor]].
 *
 * @tparam T The numeric type in this module parameters
 */
abstract class TensorModule[T: ClassTag]
  (implicit ev: TensorNumeric[T]) extends AbstractModule[Tensor[T], Tensor[T], T]

例子

看一个最简单的例子, Add层, 它简单的将每个输入各加上一个值:

/**
 * adds a bias term to input data ;
 *
 * @param inputSize size of input data
 */
@SerialVersionUID(4268487849759172896L)
class Add[T: ClassTag](val inputSize: Int
  )(implicit ev: TensorNumeric[T]) extends TensorModule[T] with Initializable {

  val bias = Tensor[T](inputSize)

  override def updateOutput(input: Tensor[T]): Tensor[T] = {
    output.resizeAs(input).copy(input)
    if (input.isSameSizeAs(bias)) {
      output.add(bias)
    } else {
      val batchSize = input.size(1)
      ones.resize(batchSize).fill(ev.one)
      val biasLocal = bias.view(bias.size.product)
      val outputLocal = output.view(batchSize, output.size.product/batchSize)
      outputLocal.addr(ev.fromType[Int](1), ones, biasLocal)
    }
    output
  }

output和bias都是一个Tensor类的对象, 如果二者的尺寸相同的话, 直接调用Tensor的add方法

output.add(bias)

就可以得到输出.