2024

EfficientSAM

YunYang Xiong et al., Meta, CVPR 2024

与SAM的对比
参数数量

模型	参数数量
SAM-B	136M
FastSAM-s	11M
FastSAM-x(default)	68M
MobileSAM	9.66M

SAM2

Nikhila Ravi etal, Meta, 2024

SAM segment anything的后续工作

Code

几种模式
- pixel prompt：根据给定的positive或negative的点生成mask
- box：根据给定选择框生成mask
- automask：为整张图像自动生成mask（原理似乎是对整张图像以一定密度进行像素点采样，效率比较低）

输出的mask格式的使用：

参考代码

def show_masks(img, masks, borders=True):
    if len(masks) == 0:
        return
 
    mask_vis = np.ones((masks.shape[1], masks.shape[2], 3), dtype=np.uint8)
    for i in range(masks.shape[0]):
        mask = masks[i].astype(bool)
        color_mask = np.random.randint(0, 256, size=3, dtype=np.uint8)
        mask_vis[mask] = color_mask
        if borders:
            contours, _ = cv2.findContours(mask.astype(np.uint8),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) 
            # Try to smooth contours
            contours = [cv2.approxPolyDP(contour, epsilon=0.01, closed=True) for contour in contours]
            cv2.drawContours(mask_vis, contours, -1, (0,0,1,0.4), thickness=1) 
    
    alpha=0.5
    overlay = cv2.addWeighted(img, 1 - alpha, mask_vis.astype('uint8'), alpha, 0)
    return overlay

Pixel Prompt

参考代码

torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
 
if torch.cuda.get_device_properties(0).major >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
 
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt"
# model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
 
sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False)
predictor = SAM2ImagePredictor(sam2)
 
img_bgr = cv2.imread('test.png')
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param)
pts= [[756,331]] ## only one point, in (x,y)
labels = [1,] ## label of the lonely point, 1=fg, 0=bg
multimask_output = False ## if ture, returns multiple masks per prompt set.
start = time.perf_counter()
predictor.set_image(img_rgb)
masks, scores, _ = predictor.predict(point_coords=pts, point_labels=labels, multimask_output=multimask_output)
stop  = time.perf_counter()

Automask自动生成mask：

参考代码

## Use OpenCV to show the result, does not require matplotlib.
## Generate all possible mask automatically (slow).
 
import cv2
import torch
import base64
 
import numpy as np
# import supervision as sv ## mask visualization lib
 
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
 
import time
 
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
 
if torch.cuda.get_device_properties(0).major >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
 
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt"
# model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
 
sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False)
mask_generator = SAM2AutomaticMaskGenerator(sam2)
 
img_bgr = cv2.imread('test.png')
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param)
start = time.perf_counter()
masks = mask_generator.generate(img_rgb) # "Images are expected to be an np.ndarray in RGB format, and of shape  HWC"
stop  = time.perf_counter()
print("Runtime: {}ms".format((stop-start)*1000))

2023

Segment anything

Alexander Kirillov et al., Meta, 2023

https://arxiv.org/abs/2304.02643

Code

几种模式
- 根据prompt生成mask
- 为整张图像生成mask
应用实例：
- segment-anything/notebooks/automatic_mask_generator_example.ipynb at main · facebookresearch/segment-anything
segment-anything/notebooks/predictor_example.ipynb at main · facebookresearch/segment-anything

初始化：

sam_checkpoint = "segment-anything/model/sam_vit_h_4b8939.pth"
model_type = "vit_h"
device = "cuda"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)

调用（可以不给参数？）

mask_generator_ = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=32,
    pred_iou_thresh=0.9,
    stability_score_thresh=0.96,
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100,  # Requires open-cv to run post-processing
)

返回的masks：一个列表，其中每一项为一个dict，对应一个分割出的mask。
- segmentation - [np.ndarray] - 原图尺寸大小(h,w)的二维numpy.ndarray，用True/False标识出该mask所分割出的区域
- area - [int] - the area of the mask in pixels
- bbox - [List[int]] - 分割出物体的boundary box，格式为 [x, y, w, h]
- predicted_iou - [float] - the model’s own prediction for the quality of the mask
- point_coords - [List[List[float]]] - the sampled input point that generated this mask
- stability_score - [float] - an additional measure of mask quality
- crop_box - List[int] - the crop of the image used to generate this mask in xywh format

MobileSAM

Chaoning Zhang et al., Kyung Hee University, 2023

将图像编码器改为TinyViT（参考论文Fast Pretraining Distillation for Small Vision Transformers）

MobileSAMv2

Chaoning Zhang et al., Kyung Hee University, 2023

FastSAM

*Xu Zhao et al., CASIA自动化所, 2023 https://arxiv.org/pdf/2306.12156.pdf https://github.com/CASIA-IVA-Lab/FastSAM

based on YOLOv8-seg，使用YOLACT（实例分割）
YOLO v8 backbone network ->

3D Skeletonization of Complex Grapevines for Robotic Pruning

Eric Schneider et al., CMU Kantor Lab, IROS 2023

纯图像处理方向的研究
相机系统
- 一系列立体图像，配准成为点云，产生图像分割遮罩
- A. Silwal, T. Parhar, F. Yandun, H. Baweja, and G. Kantor, “A robust illumination-invariant camera system for agricultural applications,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3292–3298.
图像处理方法
- 用MMLab segmentation toolkit，尝试了不同模型，最终选择UNet+geometric augmentation
- 得到分割后遮罩的2D骨架，对2D骨架进行dilation
- Make skeletal model
  - 假设较短距离内的点应当相连，先对每一个点一定范围内的相邻点进行扫描（用一个球体），再使用k-d tree
  - Minimum Spanning Tree (MST) using Kruskal algorithm, where Euclidean distance is the edge cost

2022

Review the state-of-the-art technologies of semantic segmentation based on deep learning

Yujian Mo et al., NeuroComputing 2022

解决方案

Method	Publish	Year
FCN	CVPR	2015	Fully Convolutional Network，全卷积神经网络，将传统CNN最后的全连接换层为卷积层
U-net	MICCAI	2015	架构：U型
DeconvNet	ICCV	2015
SegNet	TPAMI	2015
Deeplab v1	ICLR	2015
Dilated Convolutions	ICLR	2016
Deeplab v3	arXiv	2016?
PSPNet	CVPR	2017
RefineNet	CVPR	2017
Large Kernel Matters	CVPR	2017
Deeplab v2	TPAMI	2018
Deeplab v3+	ECCV	2018
DUC	WACV	2018
ICNet	ECCV	2018
BiSeNet	ECCV	2018
AdaptSegNet	CVPR	2018
ERFNet	TITS	2018
EncNet	CVPR	2018
CCNet	ICCV	2019

数据集

NYU Depth V2
PASCAL-VOC 2012
ADE20K
ScanNet
WoodScape
KITTI-2012

2018

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen et al., ECCV2018

语义分割
空洞卷积（Dilated Convolution）/扩张卷积（Atrous Convolution）
DeepLabV3+
Pytorch版本

俺寻思

Explorer

segmentation_semantic

2024

EfficientSAM

SAM2

Code

2023

Segment anything

Code

MobileSAM

MobileSAMv2

FastSAM

3D Skeletonization of Complex Grapevines for Robotic Pruning

2022

Review the state-of-the-art technologies of semantic segmentation based on deep learning

解决方案

数据集

2018

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Table of Contents

Backlinks

俺寻思

Explorer

segmentation_semantic

2024 §

EfficientSAM §

SAM2 §

Code §

2023 §

Segment anything §

Code §

MobileSAM §

MobileSAMv2 §

FastSAM §

3D Skeletonization of Complex Grapevines for Robotic Pruning §

2022 §

Review the state-of-the-art technologies of semantic segmentation based on deep learning §

解决方案 §

数据集 §

2018 §

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation §

Table of Contents

Backlinks

2024

EfficientSAM

SAM2

Code

2023

Segment anything

Code

MobileSAM

MobileSAMv2

FastSAM

3D Skeletonization of Complex Grapevines for Robotic Pruning

2022

Review the state-of-the-art technologies of semantic segmentation based on deep learning

解决方案

数据集

2018

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation