问tensorflow对象检测api和基于图像帧的bboxes
EN

Stack Overflow用户

提问于 2018-02-20 07:02:19

回答 1查看 844关注 0票数 0

通过标签python阅读其他人的问题答案，我面临着一项由Banach TensorFlow对象检测API异常行为所做的惊人的工作。因此，我想重新尝试一下他所做的，以便更深入地理解Tensorflow对象检测API。我一步一步地遵循他所做的事情，就像我使用杂货数据集一样。faster_rcnn_resnet101模型采用默认参数，batch_size = 1。

真正的区别是，我使用的不是Shelf_Images，每个类的注释和bbs，而是Product_Images，其中有10个文件夹(每个文件夹对应一个类)，在每个文件夹中，您可以看到没有任何背景的全尺寸香烟图像。Product_Images的平均尺寸为600*1200，而Shelf_Images为3900*2100。所以，我想为什么我不能把这些完整的图像取出来，然后在上面进行训练，取得成功。顺便说一句，我不需要像Banach那样手动裁剪图像，因为600*1200非常适合于faster_rcnn_resnet101神经网络模型及其输入图像的默认参数。

示例之一，类外的Pall Mall

这看起来很简单，因为我可以通过图像的轮廓来创建bboxes。因此，我只需要为每个图像创建注释，并从其中创建tf_records以供培训。我采用了根据图像轮廓创建bboxes的公式。

x_min = str(1)
y_min = str(1)
x_max = str(img.width - 10)
y_max = str(img.height - 10)

xml注释的示例

<annotation>
    <folder>VOC2007</folder>
    <filename>B1_N1.jpg</filename>
    <path>/.../grocery-detection/data/images/1/B1_N1.jpg</path>
    <source>
        <database>The VOC2007 Database</database>
        <annotation>PASCAL VOC2007</annotation>
        <image>flickr</image>
        <flickrid>192073981</flickrid>
    </source>
    <owner>
        <flickrid>tobeng</flickrid>
        <name>?</name>
    </owner>
    <size>
        <width>811</width>
        <height>1274</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>1</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>1</xmin>
            <ymin>1</ymin>
            <xmax>801</xmax>
            <ymax>1264</ymax>
        </bndbox>
    </object>
</annotation>

在脚本对所有文件夹映像进行迭代之后，我为每个图像注释获得了与我在VOC2007 xml类型中显示的类似的内容。然后，我在每个注释上创建了tf_records迭代，就像tensorflow在正在运行示例中所做的那样，所有这些现在看起来都很棒，可以在annotation k80上进行培训了。

用于创建的feature_dict示例

feature_dict = {
      'image/height': dataset_util.int64_feature(height),
      'image/width': dataset_util.int64_feature(width),
      'image/filename': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/source_id': dataset_util.bytes_feature(
          data['filename'].encode('utf8')),
      'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
      'image/encoded': dataset_util.bytes_feature(encoded_jpg),
      'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
      'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
      'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
      'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
      'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
      'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
      'image/object/class/label': dataset_util.int64_list_feature(classes),
      'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
      'image/object/truncated': dataset_util.int64_list_feature(truncated),
      'image/object/view': dataset_util.bytes_list_feature(poses),
}

经过每步12458步1幅图像后，模型收敛到局部极小值。我保存了所有的检查点和图表。接下来，我用它创建了推理图，并运行了object_detection_tutorial.py来展示它是如何在我的测试映像上工作的。但我对结果一点也不满意。最后一幅图像大小为1024 × 760，第三幅图像的顶部为3264 × 2448。因此，我尝试了不同尺寸的香烟图像，以避免丢失图像细节在图像缩放的模型。

输出:带有预测bboxes的分类图像