2024

EfficientSAM

YunYang Xiong et al., Meta, CVPR 2024

  • 与SAM的对比
  • 参数数量
模型参数数量
SAM-B136M
FastSAM-s11M
FastSAM-x(default)68M
MobileSAM9.66M

SAM2

Nikhila Ravi etal, Meta, 2024

  • SAM segment anything的后续工作

Code

  • 几种模式
    • pixel prompt:根据给定的positive或negative的点生成mask
    • box:根据给定选择框生成mask
    • automask:为整张图像自动生成mask(原理似乎是对整张图像以一定密度进行像素点采样,效率比较低)
  • 输出的mask格式的使用:
    • 参考代码
      def show_masks(img, masks, borders=True):
          if len(masks) == 0:
              return
       
          mask_vis = np.ones((masks.shape[1], masks.shape[2], 3), dtype=np.uint8)
          for i in range(masks.shape[0]):
              mask = masks[i].astype(bool)
              color_mask = np.random.randint(0, 256, size=3, dtype=np.uint8)
              mask_vis[mask] = color_mask
              if borders:
                  contours, _ = cv2.findContours(mask.astype(np.uint8),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) 
                  # Try to smooth contours
                  contours = [cv2.approxPolyDP(contour, epsilon=0.01, closed=True) for contour in contours]
                  cv2.drawContours(mask_vis, contours, -1, (0,0,1,0.4), thickness=1) 
          
          alpha=0.5
          overlay = cv2.addWeighted(img, 1 - alpha, mask_vis.astype('uint8'), alpha, 0)
          return overlay
  • Pixel Prompt
    • 参考代码
      torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
       
      if torch.cuda.get_device_properties(0).major >= 8:
          torch.backends.cuda.matmul.allow_tf32 = True
          torch.backends.cudnn.allow_tf32 = True
       
      DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
      # sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt"
      # model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
      sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt"
      model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
       
      sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False)
      predictor = SAM2ImagePredictor(sam2)
       
      img_bgr = cv2.imread('test.png')
      img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param)
      pts= [[756,331]] ## only one point, in (x,y)
      labels = [1,] ## label of the lonely point, 1=fg, 0=bg
      multimask_output = False ## if ture, returns multiple masks per prompt set.
      start = time.perf_counter()
      predictor.set_image(img_rgb)
      masks, scores, _ = predictor.predict(point_coords=pts, point_labels=labels, multimask_output=multimask_output)
      stop  = time.perf_counter()
  • Automask自动生成mask:
    • 参考代码
    ## Use OpenCV to show the result, does not require matplotlib.
    ## Generate all possible mask automatically (slow).
     
    import cv2
    import torch
    import base64
     
    import numpy as np
    # import supervision as sv ## mask visualization lib
     
    from sam2.build_sam import build_sam2
    from sam2.sam2_image_predictor import SAM2ImagePredictor
    from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
     
    import time
     
    torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
     
    if torch.cuda.get_device_properties(0).major >= 8:
        torch.backends.cuda.matmul.allow_tf32 = True
        torch.backends.cudnn.allow_tf32 = True
     
    DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt"
    # model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
    sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt"
    model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
     
    sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False)
    mask_generator = SAM2AutomaticMaskGenerator(sam2)
     
    img_bgr = cv2.imread('test.png')
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param)
    start = time.perf_counter()
    masks = mask_generator.generate(img_rgb) # "Images are expected to be an np.ndarray in RGB format, and of shape  HWC"
    stop  = time.perf_counter()
    print("Runtime: {}ms".format((stop-start)*1000))

2023

Segment anything

Alexander Kirillov et al., Meta, 2023

Code

  • 几种模式
    • 根据prompt生成mask
    • 为整张图像生成mask
  • 应用实例:
  • segment-anything/notebooks/predictor_example.ipynb at main · facebookresearch/segment-anything
  • 初始化:
    sam_checkpoint = "segment-anything/model/sam_vit_h_4b8939.pth"
    model_type = "vit_h"
    device = "cuda"
    sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
    sam.to(device=device)
  • 调用(可以不给参数?)
    mask_generator_ = SamAutomaticMaskGenerator(
        model=sam,
        points_per_side=32,
        pred_iou_thresh=0.9,
        stability_score_thresh=0.96,
        crop_n_layers=1,
        crop_n_points_downscale_factor=2,
        min_mask_region_area=100,  # Requires open-cv to run post-processing
    )
     
  • 返回的masks:一个列表,其中每一项为一个dict,对应一个分割出的mask。
    • segmentation - [np.ndarray] - 原图尺寸大小(h,w)的二维numpy.ndarray,用True/False标识出该mask所分割出的区域
    • area - [int] - the area of the mask in pixels
    • bbox - [List[int]] - 分割出物体的boundary box,格式为 [x, y, w, h]
    • predicted_iou - [float] - the model’s own prediction for the quality of the mask
    • point_coords - [List[List[float]]] - the sampled input point that generated this mask
    • stability_score - [float] - an additional measure of mask quality
    • crop_box - List[int] - the crop of the image used to generate this mask in xywh format

MobileSAM

Chaoning Zhang et al., Kyung Hee University, 2023

  • 将图像编码器改为TinyViT(参考论文Fast Pretraining Distillation for Small Vision Transformers)

MobileSAMv2

Chaoning Zhang et al., Kyung Hee University, 2023

FastSAM

*Xu Zhao et al., CASIA自动化所, 2023 https://arxiv.org/pdf/2306.12156.pdf https://github.com/CASIA-IVA-Lab/FastSAM

  • based on YOLOv8-seg,使用YOLACT(实例分割)
  • YOLO v8 backbone network ->

3D Skeletonization of Complex Grapevines for Robotic Pruning

Eric Schneider et al., CMU Kantor Lab, IROS 2023

  • 纯图像处理方向的研究
  • 相机系统
    • 一系列立体图像,配准成为点云,产生图像分割遮罩
    • A. Silwal, T. Parhar, F. Yandun, H. Baweja, and G. Kantor, “A robust illumination-invariant camera system for agricultural applications,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3292–3298.
  • 图像处理方法
    • 用MMLab segmentation toolkit,尝试了不同模型,最终选择UNet+geometric augmentation
    • 得到分割后遮罩的2D骨架,对2D骨架进行dilation
    • Make skeletal model
      • 假设较短距离内的点应当相连,先对每一个点一定范围内的相邻点进行扫描(用一个球体),再使用k-d tree
      • Minimum Spanning Tree (MST) using Kruskal algorithm, where Euclidean distance is the edge cost

2022

Review the state-of-the-art technologies of semantic segmentation based on deep learning

Yujian Mo et al., NeuroComputing 2022

解决方案

MethodPublishYear
FCNCVPR2015Fully Convolutional Network,全卷积神经网络,将传统CNN最后的全连接换层为卷积层
U-netMICCAI2015架构:U型
DeconvNetICCV2015
SegNetTPAMI2015
Deeplab v1ICLR2015
Dilated ConvolutionsICLR2016
Deeplab v3arXiv2016?
PSPNetCVPR2017
RefineNetCVPR2017
Large Kernel MattersCVPR2017
Deeplab v2TPAMI2018
Deeplab v3+ECCV2018
DUCWACV2018
ICNetECCV2018
BiSeNetECCV2018
AdaptSegNetCVPR2018
ERFNetTITS2018
EncNetCVPR2018
CCNetICCV2019

数据集

  • NYU Depth V2
  • PASCAL-VOC 2012
  • ADE20K
  • ScanNet
  • WoodScape
  • KITTI-2012

2018

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen et al., ECCV2018

  • 语义分割
  • 空洞卷积(Dilated Convolution)/扩张卷积(Atrous Convolution)
  • DeepLabV3+
  • Pytorch版本