TMDSEVM572X: caffe-jacinto训练网络，网络结构不符合结构

user6446474

Part Number: TMDSEVM572X

使用caffe-jacinto去训练一个网络时，定义网络的最后一层的卷积核为1*1，输出为1。训练时报错（附图），但是看官方例程的网络也有相似的结构，只是输出不一样，我是模仿官方结构的。请问问题出在哪儿？

训练报错：

官方相似结构（object detection的mobilenet结构），图中的conv3_1/sep和conv3_2/sep都是kernel为1*1，group都为1，但output channel和group是不匹配的。

本人网络最后部分：

layer {
  name: "fu1_1/dw"
  type: "Convolution"
  bottom: "conv7_3"
  top: "fu1_1/dw"
  convolution_param {
    num_output: 64
    bias_term: false
    pad: 1
    kernel_size: 3
    group: 64
    stride: 1
    weight_filler {
      type: "msra"
    }
    dilation: 1
  }
}
layer {
  name: "fu1_1/dw/bn"
  type: "BatchNorm"
  bottom: "fu1_1/dw"
  top: "fu1_1/dw"
  batch_norm_param {
    scale_bias: true
  }
}
layer {
  name: "relu1_1/dw"
  type: "ReLU"
  bottom: "fu1_1/dw"
  top: "fu1_1/dw"
}
layer {
  name: "fu1_1/sep"
  type: "Convolution"
  bottom: "fu1_1/dw"
  top: "fu1_1/sep"
  convolution_param {
    num_output: 64
    bias_term: false
    pad: 0
    kernel_size: 1
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    dilation: 1
  }
}
layer {
  name: "fu1_1/sep/bn"
  type: "BatchNorm"
  bottom: "fu1_1/sep"
  top: "fu1_1/sep"
  batch_norm_param {
    scale_bias: true
  }
}
layer {
  name: "relu1_1/sep"
  type: "ReLU"
  bottom: "fu1_1/sep"
  top: "fu1_1/sep"
}
layer {
  name: "fu1_2/dw"
  type: "Convolution"
  bottom: "fu1_1/sep"
  top: "fu1_2/dw"
  convolution_param {
    num_output: 64
    bias_term: false
    pad: 1
    kernel_size: 3
    group: 64
    stride: 1
    weight_filler {
      type: "msra"
    }
    dilation: 1
  }
}
layer {
  name: "fu1_2/dw/bn"
  type: "BatchNorm"
  bottom: "fu1_2/dw"
  top: "fu1_2/dw"
  batch_norm_param {
    scale_bias: true
  }
}
layer {
  name: "relu1_2/dw"
  type: "ReLU"
  bottom: "fu1_2/dw"
  top: "fu1_2/dw"
}
layer {
  name: "fu1_2/sep"
  type: "Convolution"
  bottom: "fu1_2/dw"
  top: "estdmap"
  convolution_param {
    num_output: 1
    bias_term: false
    pad: 0
    kernel_size: 1
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    dilation: 1
  }
}

本人网络最后部分结构：

4 年多前

0 Shine 4 年多前

TI__Guru**** 357097 points

请问使用的是哪个版本的软件包？

0 user6446474 4 年多前回复 Shine

Intellectual 480 points

Processor SDK Linux 06_03_00_106，使用Machine Learning中TIDL部分提到的caffe-jacinto。

0 Shine 4 年多前回复 user6446474

TI__Guru**** 357097 points

我把您的问题转到e2e上，请关注下面的帖子。
https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1035218/am5728-caffe-jacinto-failed-to-train

0 user6446474 4 年多前回复 Shine

Intellectual 480 points

谢谢帮助。

+1 Shine 4 年多前回复 user6446474

TI__Guru**** 357097 points

下面是工程师的回复，请查看。

Please comment out three lines in layer_factory.cpp as shown below, recompile caffe-jacinto and it should work:

https://git.ti.com/cgit/jacinto-ai/caffe-jacinto/tree/src/caffe/layer_factory.cpp#n62

//if(conv_param.num_output() == conv_param.group()) { // return CreateLayerBase<ConvolutionDepthwiseLayer>(param, ftype, btype); //}

Details:ConvolutionDepthwiseLayer is just a faster implementation specifically for Depthwise layers - it is not mandatory.
The check shown above should have ensured that the input channels output channels and groups are same as done in (https://git.ti.com/cgit/jacinto-ai/caffe-jacinto/tree/src/caffe/layers/conv_dw_layer.cpp#n17)

But input channels is not available inside layer_factory.cpp - so the condition to instantiate ConvolutionDepthwiseLayer is not fully correct.

Hope this helps.

0 user6446474 4 年多前回复 Shine

Intellectual 480 points

已收到通知！谢谢！我按照回复尝试后，再进行反馈。

0 user6446474 4 年多前回复 Shine

Intellectual 480 points

问题已解决。

处理器

处理器论坛

TMDSEVM572X: caffe-jacinto训练网络，网络结构不符合结构