💡💡💡 提出了一种基于Transformer的盲点网络(TBSN)架构,通过分析和重新设计Transformer运算符以满足盲点要求。TBSN遵循扩张BSN的架构原则,并结合空间和通道自注意力层来增强网络能力。
💡💡💡如何使用:1)结合C3k2二次创新使用;
💡💡💡亮点包括: 1. 提出了一种新的基于Transformer的盲点网络(TBSN)架构;2. 引入了知识蒸馏策略来提高计算效率;3. 在多个真实世界的图像去噪数据集上进行了广泛的实验,并且与当前最先进的SSID方法相比具有更好的性能;
改进1)
💡💡💡为本专栏订阅者提供创新点改进代码,改进网络结构图,方便paper写作!!!
💡💡💡适用场景:红外、小目标检测、工业缺陷检测、医学影像、遥感目标检测、低对比度场景
💡💡💡适用任务:所有改进点适用【检测】、【分割】、【pose】、【分类】等
💡💡💡全网独家首发创新,【自研多个自研模块】,【多创新点组合适合paper 】!!!
Ultralytics YOLO11是一款尖端的、最先进的模型,它在之前YOLO版本成功的基础上进行了构建,并引入了新功能和改进,以进一步提升性能和灵活性。YOLO11设计快速、准确且易于使用,使其成为各种物体检测和跟踪、实例分割、图像分类以及姿态估计任务的绝佳选择。
结构图如下:
C3k2,结构图如下
C3k2,继承自类C2f,其中通过c3k设置False或者Ture来决定选择使用C3k还是Bottleneck
实现代码ultralytics/nn/modules/block.py
借鉴V10 PSA结构,实现了C2PSA和C2fPSA,最终选择了基于C2的C2PSA(可能涨点更好?)
实现代码ultralytics/nn/modules/block.py
分类检测头引入了DWConv(更加轻量级,为后续二次创新提供了改进点),结构图如下(和V8的区别):
实现代码ultralytics/nn/modules/head.py
论文:https://arxiv.org/pdf/2404.07846
摘要:本文针对自监督图像去噪(SSID)中盲点网络(BSN)的网络架构,指出现有的基于卷积层的BSN存在局限性,而变形器(transformer)则具有潜在的克服卷积局限性的能力,并且在各种图像恢复任务中取得了成功。然而,变形器的注意力机制可能会违反盲点要求,因此限制了它们在SSID中的适用性。因此,本文提出了一种基于变形器的盲点网络(TBSN),通过分析和重新设计变形器操作符以满足盲点要求。具体来说,TBSN遵循扩张BSN的架构原则,并结合空间和通道自注意层来增强网络能力。对于空间自注意力,我们应用精细的掩码来限制其感受野,从而模拟扩张卷积。对于通道自注意力,我们观察到在多尺度架构的深层中,当通道数大于空间大小时,它可能会泄漏盲点信息。为了消除这种影响,我们将通道分成几组,并分别执行通道注意力。此外,我们引入知识蒸馏策略,将TBSN蒸馏成更小的去噪器,以提高计算效率同时保持性能。在真实世界的图像去噪数据集上进行了广泛的实验,结果表明TBSN大大扩展了感受野,并展示了对抗最先进SSID方法的有利性能。代码和预训练模型将在https://github.com/nagejacob/TBSN上公开发布。
核心源码:
class DilatedOCA(nn.Module):
def __init__(self, dim, window_size, overlap_ratio, num_heads, dim_head, bias):
super(DilatedOCA, self).__init__()
self.num_spatial_heads = num_heads
self.dim = dim
self.window_size = window_size
self.overlap_win_size = int(window_size * overlap_ratio) + window_size
self.dim_head = dim_head
self.inner_dim = self.dim_head * self.num_spatial_heads
self.scale = self.dim_head**-0.5
self.unfold = nn.Unfold(kernel_size=(self.overlap_win_size, self.overlap_win_size), stride=window_size, padding=(self.overlap_win_size-window_size)//2)
self.qkv = nn.Conv2d(self.dim, self.inner_dim*3, kernel_size=1, bias=bias)
self.project_out = nn.Conv2d(self.inner_dim, dim, kernel_size=1, bias=bias)
self.rel_pos_emb = RelPosEmb(
block_size = window_size,
rel_size = window_size + (self.overlap_win_size - window_size),
dim_head = self.dim_head
)
self.fixed_pos_emb = FixedPosEmb(window_size, self.overlap_win_size)
def forward(self, x):
b, c, h, w = x.shape
qkv = self.qkv(x)
qs, ks, vs = qkv.chunk(3, dim=1)
# spatial attention
qs = rearrange(qs, 'b c (h p1) (w p2) -> (b h w) (p1 p2) c', p1 = self.window_size, p2 = self.window_size)
ks, vs = map(lambda t: self.unfold(t), (ks, vs))
ks, vs = map(lambda t: rearrange(t, 'b (c j) i -> (b i) j c', c = self.inner_dim), (ks, vs))
# print(f'qs.shape:{qs.shape}, ks.shape:{ks.shape}, vs.shape:{vs.shape}')
#split heads
qs, ks, vs = map(lambda t: rearrange(t, 'b n (head c) -> (b head) n c', head = self.num_spatial_heads), (qs, ks, vs))
# attention
qs = qs * self.scale
spatial_attn = (qs @ ks.transpose(-2, -1))
spatial_attn += self.rel_pos_emb(qs)
spatial_attn += self.fixed_pos_emb()
spatial_attn = spatial_attn.softmax(dim=-1)
out = (spatial_attn @ vs)
out = rearrange(out, '(b h w head) (p1 p2) c -> b (head c) (h p1) (w p2)', head = self.num_spatial_heads, h = h // self.window_size, w = w // self.window_size, p1 = self.window_size, p2 = self.window_size)
# merge spatial and channel
out = self.project_out(out)
return out
class FeedForward(nn.Module):
def __init__(self, dim, ffn_expansion_factor, bias):
super(FeedForward, self).__init__()
hidden_features = int(dim * ffn_expansion_factor)
self.project_in = nn.Conv2d(dim, hidden_features, kernel_size=3, stride=1, dilation=2, padding=2, bias=bias)
self.project_out = nn.Conv2d(hidden_features, dim, kernel_size=3, stride=1, dilation=2, padding=2, bias=bias)
def forward(self, x):
x = self.project_in(x)
x = F.gelu(x)
x = self.project_out(x)
return x
class DTAB(nn.Module):
def __init__(self, dim, window_size=4, overlap_ratio=0.5, num_channel_heads=4, num_spatial_heads=2, spatial_dim_head=16, ffn_expansion_factor=1, bias=False, LayerNorm_type='BiasFree'):
super(DTAB, self).__init__()
self.spatial_attn = DilatedOCA(dim, window_size, overlap_ratio, num_spatial_heads, spatial_dim_head, bias)
self.channel_attn = DilatedMDTA(dim, num_channel_heads, bias)
self.norm1 = LayerNorm(dim, LayerNorm_type)
self.norm2 = LayerNorm(dim, LayerNorm_type)
self.norm3 = LayerNorm(dim, LayerNorm_type)
self.norm4 = LayerNorm(dim, LayerNorm_type)
self.channel_ffn = FeedForward(dim, ffn_expansion_factor, bias)
self.spatial_ffn = FeedForward(dim, ffn_expansion_factor, bias)
def forward(self, x):
x = x + self.channel_attn(self.norm1(x))
x = x + self.channel_ffn(self.norm2(x))
x = x + self.spatial_attn(self.norm3(x))
x = x + self.spatial_ffn(self.norm4(x))
return x
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有