YOLOv12源码分析+如何训练自己的数据集（NEU-DET缺陷检测为案列）

原创

AI小怪兽

发布于 2025-02-25 15:04:09

26800

代码可运行

文章被收录于专栏：毕业设计毕业设计 YOLO大作战

运行总次数：0

代码可运行

💡💡💡本文内容：YOLOv12创新点A2C2f和Area Attention结构分析，以及如何训练自己的私有数据集

1.YOLOv12介绍

论文：[2502.12524] YOLOv12: Attention-Centric Real-Time Object Detectors

摘要：

长期以来，提升YOLO框架的网络架构至关重要，但相关改进主要聚焦于基于CNN的优化，尽管注意力机制已被证实具备更卓越的建模能力。这种现状源于注意力模型在速度上始终无法与CNN模型相媲美。本研究提出了一种以注意力机制为核心的YOLO框架——YOLOv12，在保持与先前CNN模型相当速度的同时，充分释放了注意力机制的性能优势。

YOLOv12在保持具有竞争力的推理速度下，其准确率超越了所有主流实时目标检测器。具体而言，YOLOv12-N在T4 GPU上以1.64ms的推理延迟实现了40.6%的mAP，相较先进的YOLOv10-N/YOLOv11-N分别提升2.1%/1.2%的mAP，同时保持相近速度。该优势在其他模型规模上同样显著。相较于改进DETR的端到端实时检测器，YOLOv12也展现出优越性：例如YOLOv12-S以42%的速度优势超越RT-DETR-R18/RT-DETRv2-R18，仅需36%的计算量和45%的参数量。更多对比详见图1。

结构图如下：

本文旨在解决这些挑战，并进一步构建了一个以注意力为中心的YOLO框架，即YOLOv12。我们引入了三项关键改进。首先，我们提出了一个简单高效的区域注意力模块（A²），它以一种非常简单的方式在保持较大感受野的同时减少了注意力的计算复杂度，从而提高了速度。其次，我们引入了残差高效层聚合网络（R-ELAN），以应对注意力机制（尤其是大规模模型）引入的优化挑战。R-ELAN在原始ELAN的基础上引入了两项改进：（i）基于块的残差设计与缩放技术；（ii）重新设计的特征聚合方法。第三，我们在传统注意力机制的基础上进行了一些架构改进，以适应YOLO系统。我们升级了传统的注意力中心架构，包括：引入FlashAttention以解决注意力的内存访问问题，移除位置编码等设计以使模型更快速、更简洁，将MLP比率从4调整为1.2以平衡注意力机制和前馈网络之间的计算量，从而获得更好的性能，减少堆叠块的深度以促进优化，以及尽可能多地利用卷积操作来发挥其计算效率。

总之，YOLOv12的贡献可以概括为以下两点：1）它建立了一个以注意力为中心的、简单而高效的YOLO框架，通过方法创新和架构改进，打破了CNN模型在YOLO系列中的主导地位。2）YOLOv12在不依赖预训练等额外技术的情况下，实现了快速推理速度和更高的检测精度的最新成果，展现了其潜力。

1.1 Area Attention

YOLOv12设计了区域注意力模块（A2），将特征图划分为简单的垂直或水平区域，减少了注意力机制的计算复杂度，同时保持了较大的感受野。

核心源码如下：

代码位置ultralytics/nn/modules/block.py

class AAttn(nn.Module):
    """
    Area-attention module with the requirement of flash attention.

    Attributes:
        dim (int): Number of hidden channels;
        num_heads (int): Number of heads into which the attention mechanism is divided;
        area (int, optional): Number of areas the feature map is divided. Defaults to 1.

    Methods:
        forward: Performs a forward process of input tensor and outputs a tensor after the execution of the area attention mechanism.

    Examples:
        >>> import torch
        >>> from ultralytics.nn.modules import AAttn
        >>> model = AAttn(dim=64, num_heads=2, area=4)
        >>> x = torch.randn(2, 64, 128, 128)
        >>> output = model(x)
        >>> print(output.shape)
    
    Notes: 
        recommend that dim//num_heads be a multiple of 32 or 64.

    """

    def __init__(self, dim, num_heads, area=1):
        """Initializes the area-attention module, a simple yet efficient attention module for YOLO."""
        super().__init__()
        self.area = area

        self.num_heads = num_heads
        self.head_dim = head_dim = dim // num_heads
        all_head_dim = head_dim * self.num_heads

        self.qkv = Conv(dim, all_head_dim * 3, 1, act=False)
        self.proj = Conv(all_head_dim, dim, 1, act=False)
        self.pe = Conv(all_head_dim, dim, 7, 1, 3, g=dim, act=False)


    def forward(self, x):
        """Processes the input tensor 'x' through the area-attention"""
        B, C, H, W = x.shape
        N = H * W

        qkv = self.qkv(x).flatten(2).transpose(1, 2)
        if self.area > 1:
            qkv = qkv.reshape(B * self.area, N // self.area, C * 3)
            B, N, _ = qkv.shape
        q, k, v = qkv.view(B, N, self.num_heads, self.head_dim * 3).split(
            [self.head_dim, self.head_dim, self.head_dim], dim=3
        )

        if x.is_cuda and USE_FLASH_ATTN:
            x = flash_attn_func(
                q.contiguous().half(),
                k.contiguous().half(),
                v.contiguous().half()
            ).to(q.dtype)
        elif x.is_cuda and not USE_FLASH_ATTN:
            x = sdpa(q.permute(0, 2, 1, 3), k.permute(0, 2, 1, 3), v.permute(0, 2, 1, 3), attn_mask=None, dropout_p=0.0, is_causal=False)
            x = x.permute(0, 2, 1, 3)
        else:
            q = q.permute(0, 2, 3, 1)
            k = k.permute(0, 2, 3, 1)
            v = v.permute(0, 2, 3, 1)
            attn = (q.transpose(-2, -1) @ k) * (self.head_dim ** -0.5)
            max_attn = attn.max(dim=-1, keepdim=True).values 
            exp_attn = torch.exp(attn - max_attn)
            attn = exp_attn / exp_attn.sum(dim=-1, keepdim=True)
            x = (v @ attn.transpose(-2, -1))
            x = x.permute(0, 3, 1, 2)
            v = v.permute(0, 3, 1, 2)

        if self.area > 1:
            x = x.reshape(B // self.area, N * self.area, C)
            v = v.reshape(B // self.area, N * self.area, C)
            B, N, _ = x.shape

        x = x.reshape(B, H, W, C).permute(0, 3, 1, 2)
        v = v.reshape(B, H, W, C).permute(0, 3, 1, 2)
        
        x = x + self.pe(v)
        x = self.proj(x)
        return x

1.2 A2C2f

A2C2f模块全称为“Area-Attention Enhanced Cross-Feature module”，是YOLOv12中提出的一种改进型特征提取模块，结合了区域注意力（Area-Attention）和残差连接，主要用于提升特征提取的效率和精度

A2C2f模块由以下关键部分组成：

cv1和cv2：两层1×1卷积，分别用于输入特征的降维和输出特征的升维。
ABlock模块：A2C2f的核心，包含区域注意力（Area-Attention）和MLP（多层感知机）层，用于快速特征提取和注意力机制的增强。
残差连接：可选的残差连接，用于稳定训练并增强特征的表达能力。

代码位置ultralytics/nn/modules/block.py

class ABlock(nn.Module):
    """
    ABlock class implementing a Area-Attention block with effective feature extraction.

    This class encapsulates the functionality for applying multi-head attention with feature map are dividing into areas
    and feed-forward neural network layers.

    Attributes:
        dim (int): Number of hidden channels;
        num_heads (int): Number of heads into which the attention mechanism is divided;
        mlp_ratio (float, optional): MLP expansion ratio (or MLP hidden dimension ratio). Defaults to 1.2;
        area (int, optional): Number of areas the feature map is divided.  Defaults to 1.

    Methods:
        forward: Performs a forward pass through the ABlock, applying area-attention and feed-forward layers.

    Examples:
        Create a ABlock and perform a forward pass
        >>> model = ABlock(dim=64, num_heads=2, mlp_ratio=1.2, area=4)
        >>> x = torch.randn(2, 64, 128, 128)
        >>> output = model(x)
        >>> print(output.shape)
    
    Notes: 
        recommend that dim//num_heads be a multiple of 32 or 64.
    """

    def __init__(self, dim, num_heads, mlp_ratio=1.2, area=1):
        """Initializes the ABlock with area-attention and feed-forward layers for faster feature extraction."""
        super().__init__()

        self.attn = AAttn(dim, num_heads=num_heads, area=area)
        mlp_hidden_dim = int(dim * mlp_ratio)
        self.mlp = nn.Sequential(Conv(dim, mlp_hidden_dim, 1), Conv(mlp_hidden_dim, dim, 1, act=False))

        self.apply(self._init_weights)

    def _init_weights(self, m):
        """Initialize weights using a truncated normal distribution."""
        if isinstance(m, nn.Conv2d):
            nn.init.trunc_normal_(m.weight, std=0.02)
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        """Executes a forward pass through ABlock, applying area-attention and feed-forward layers to the input tensor."""
        x = x + self.attn(x)
        x = x + self.mlp(x)
        return x


class A2C2f(nn.Module):  
    """
    A2C2f module with residual enhanced feature extraction using ABlock blocks with area-attention. Also known as R-ELAN

    This class extends the C2f module by incorporating ABlock blocks for fast attention mechanisms and feature extraction.

    Attributes:
        c1 (int): Number of input channels;
        c2 (int): Number of output channels;
        n (int, optional): Number of 2xABlock modules to stack. Defaults to 1;
        a2 (bool, optional): Whether use area-attention. Defaults to True;
        area (int, optional): Number of areas the feature map is divided. Defaults to 1;
        residual (bool, optional): Whether use the residual (with layer scale). Defaults to False;
        mlp_ratio (float, optional): MLP expansion ratio (or MLP hidden dimension ratio). Defaults to 1.2;
        e (float, optional): Expansion ratio for R-ELAN modules. Defaults to 0.5;
        g (int, optional): Number of groups for grouped convolution. Defaults to 1;
        shortcut (bool, optional): Whether to use shortcut connection. Defaults to True;

    Methods:
        forward: Performs a forward pass through the A2C2f module.

    Examples:
        >>> import torch
        >>> from ultralytics.nn.modules import A2C2f
        >>> model = A2C2f(c1=64, c2=64, n=2, a2=True, area=4, residual=True, e=0.5)
        >>> x = torch.randn(2, 64, 128, 128)
        >>> output = model(x)
        >>> print(output.shape)
    """

    def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        assert c_ % 32 == 0, "Dimension of ABlock be a multiple of 32."

        # num_heads = c_ // 64 if c_ // 64 >= 2 else c_ // 32
        num_heads = c_ // 32

        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv((1 + n) * c_, c2, 1)  # optional act=FReLU(c2)

        init_values = 0.01  # or smaller
        self.gamma = nn.Parameter(init_values * torch.ones((c2)), requires_grad=True) if a2 and residual else None

        self.m = nn.ModuleList(
            nn.Sequential(*(ABlock(c_, num_heads, mlp_ratio, area) for _ in range(2))) if a2 else C3k(c_, c_, 2, shortcut, g) for _ in range(n)
        )

    def forward(self, x):
        """Forward pass through R-ELAN layer."""
        y = [self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        if self.gamma is not None:
            return x + self.gamma.view(1, -1, 1, 1) * self.cv2(torch.cat(y, 1))
        return self.cv2(torch.cat(y, 1))

2.NEU-DET数据集介绍

NEU-DET钢材表面缺陷共有六大类，一共1800张，

类别分别为：'crazing','inclusion','patches','pitted_surface','rolled-in_scale','scratches'

数据集下载地址：

https://download.csdn.net/download/m0_63774211/89846379?spm=1001.2014.3001.5503

标签可视化：

3.如何训练YOLOv12模型

3.1 NEU-DET.yaml

path: D:/ultralytics-main/data/NEU-DET  # dataset root dir
train: train.txt  # train images (relative to 'path') 118287 images
val: val.txt  # val images (relative to 'path') 5000 images

# number of classes
nc: 6

# class names
names:
  0: crazing
  1: inclusion
  2: patches
  3: pitted_surface
  4: rolled-in_scale  
  5: scratches

3.2 如何训练

import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('ultralytics/cfg/models/v12/yolov12n.yaml')
    #model.load('yolo12n.pt') # loading pretrain weights
    model.train(data='data/NEU-DET.yaml',
                cache=False,
                imgsz=640,
                epochs=200,
                batch=16,
                close_mosaic=10,
                device='0',
                optimizer='SGD', # using SGD
                project='runs/train',
                name='exp',
                )

3.3 训练结果可视化

YOLOv12原始mAP50为0.763

YOLOv12n summary (fused): 352 layers, 2,557,898 parameters, 0 gradients, 6.3 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 11/11 [00:11<00:00,  1.04s/it]
                   all        324        747      0.718      0.714      0.763      0.435
               crazing         47        104      0.497      0.433      0.431      0.178
             inclusion         71        190      0.741      0.721      0.802      0.434
               patches         59        149      0.826      0.926      0.942      0.641
        pitted_surface         61         93      0.789      0.645       0.76      0.467
       rolled-in_scale         56        117      0.656      0.624      0.709      0.334
             scratches         54         94        0.8      0.936      0.935      0.556

预测结果：

关注后获取YOLOv12 windows下环境！！！

原文地址：

https://blog.csdn.net/m0_63774211/article/details/145771893

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

yolo

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

yolo

#YOLOv12

登录后参与评论

0 条评论

热度