900字范文 > 【STDC】《Rethinking BiSeNet For Real-time Semantic Segmentation》

【STDC】《Rethinking BiSeNet For Real-time Semantic Segmentation》

时间：2024-02-01 02:39:57

CVPR-

好久没有写博客了，抽个空赶紧把阅读笔记梳理下，头发秃了容易忘事 🕔，哈哈

文章目录

1 Background and Motivation2 Advantages / Contributions3 Method3.1 Encoding Network3.2 Design of Decoder 4 Experiments5.1 Datasets5.2 Ablation Study5.4 Compare with State-of-the-arts 6 Conclusion（own）

1 Background and Motivation

看标题，rethinking，噢，懂了懂了，基于 BiSeNet 的改进，

BiSeNet 是采用 context path 和 spatial path 双路结构，配合【SENet】《Squeeze-and-Excitation Networks》，以增强语义分割网络的特征提取能力！

本文基于 BiSeNet，指出

BiSeNet 的 context path 中 backbone 套用现有的分类网络，没有针对 segmentation 任务专门设计，影响语义分割的精度；spatial path 虽提升了细节分割能力，但同时引入额外的计算量，增加了 inference 的负担，影响分割的速度

本文专门设计利于分割的轻量级 context path——Short-Term Dense Concatenate network（STDC），提出不影响推理速度的 spatial path（Detail Guidance module），achieve state-of-the-art speed-accuracy trade-off

2 Advantages / Contributions

设计STDC网络，设计Detail Guidance module作为分割的 decoder， achieve state-of-the-art speed-accuracy trade-off

3 Method

3.1 Encoding Network

也即 BiSENet 的 context path 部分，图 3 a) 即 STDC 网络，由 b)、c) 所示的 STDC 模块组成

b 和 c 每个module 都有 n 个 block，区别仅是跨 stage 时候，block2 的 stride 为 2 的区别，module 很像 densenet，通道数随着 block 的深入呈指数下降，最后一个 block 的通道数和倒数第二个 block 的通道数一样，STDC module 参数量计算如下

M 和 N 是输入输出 channels，n 是 block 个数

类 DenseNet 的设计结构，可提取scalable receptive field and multi-scale information，且参数量随着 n 的增大反而下降了（n>=2）

作者实验设定 n = 4

网路的细节设定如下

每个 stage 两行 R 含义是 STDC module © 和 STDC module (d) 的堆叠数量

we only use one convolutional block in each of Stage 1&2, which is proved to be sufficient according to our experiences.

哈哈哈，stage1 和 stage2 中的 channels 还没有升上去，指数级的通道下降方式吃不消

3.2 Design of Decoder

这里涉及到了作者设计的 Detail Aggregation Module（下图的 b，c 结构），以及 BiSENet 中的 ARM（Attention Refinement Module）和 FFM （Feature Fusion Module）模块

注意 stage5 后的结果，global average pooling ->up-sampling 后和 refine（ARM 模块，也即使 SE attention）的 stage4 stage5 特征 concat 作为 FFM 模块输入之一了

FFM 的另一个输入来自被 Detail Aggregation Module 监督的 stage3 特征，细节如下

GT 通过拉普拉斯金字塔，上采样 concat 配合 learn-able 1x1 conv 生成二值 mask 来监督由 stage3 生成的 Detail head

作者引入的 Detail Aggregation Module（上图 b、c）的作用为：

leading to more precise preservation of spatial details in low-level layerswithout extra computation cost in the inference time.

由于产生的 detail GT 前景较少，背景较多，直接用 binary cross-entropy 监督容易导致正负样本不均衡，作者在 binary cross-entropy 基础上，辅助了 Dice Loss

Dice Loss 的介绍可以参考

医学影像分割—Dice Loss

Note that this branch(Detail Aggregation Module) is discarded in the inference phase.

4 Experiments

5.1 Datasets

ImageNetCityscapesCamVid

5.2 Ablation Study

1）Effectiveness of STDC Module

blocks 越多，越快，精度越高

2）Effectiveness of Our backbone

3）Effectiveness of Detail Guidance

对比有无 Detail Guidance 的 stage3 （b、C）

The features of Stage 3 with detail guidance encode more spatial information comparing to that of Stage 3 without detail guidance.

5.4 Compare with State-of-the-arts

1）Results on ImageNet

2）Results on Cityscapes

3）Results on CamVid

6 Conclusion（own）

global pooling -> context information

/MichaelFan01/STDC-Seg

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。