2024
EfficientSAM
YunYang Xiong et al., Meta, CVPR 2024
- 与SAM的对比
- 参数数量
| 模型 | 参数数量 |
|---|---|
| SAM-B | 136M |
| FastSAM-s | 11M |
| FastSAM-x(default) | 68M |
| MobileSAM | 9.66M |
SAM2
Nikhila Ravi etal, Meta, 2024
- SAM segment anything的后续工作
Code
- 几种模式
- pixel prompt:根据给定的positive或negative的点生成mask
- box:根据给定选择框生成mask
- automask:为整张图像自动生成mask(原理似乎是对整张图像以一定密度进行像素点采样,效率比较低)
- 输出的mask格式的使用:
- 参考代码
def show_masks(img, masks, borders=True): if len(masks) == 0: return mask_vis = np.ones((masks.shape[1], masks.shape[2], 3), dtype=np.uint8) for i in range(masks.shape[0]): mask = masks[i].astype(bool) color_mask = np.random.randint(0, 256, size=3, dtype=np.uint8) mask_vis[mask] = color_mask if borders: contours, _ = cv2.findContours(mask.astype(np.uint8),cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) # Try to smooth contours contours = [cv2.approxPolyDP(contour, epsilon=0.01, closed=True) for contour in contours] cv2.drawContours(mask_vis, contours, -1, (0,0,1,0.4), thickness=1) alpha=0.5 overlay = cv2.addWeighted(img, 1 - alpha, mask_vis.astype('uint8'), alpha, 0) return overlay
- 参考代码
- Pixel Prompt
- 参考代码
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__() if torch.cuda.get_device_properties(0).major >= 8: torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.allow_tf32 = True DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt" # model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml" sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt" model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml" sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False) predictor = SAM2ImagePredictor(sam2) img_bgr = cv2.imread('test.png') img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param) pts= [[756,331]] ## only one point, in (x,y) labels = [1,] ## label of the lonely point, 1=fg, 0=bg multimask_output = False ## if ture, returns multiple masks per prompt set. start = time.perf_counter() predictor.set_image(img_rgb) masks, scores, _ = predictor.predict(point_coords=pts, point_labels=labels, multimask_output=multimask_output) stop = time.perf_counter()
- 参考代码
- Automask自动生成mask:
- 参考代码
## Use OpenCV to show the result, does not require matplotlib. ## Generate all possible mask automatically (slow). import cv2 import torch import base64 import numpy as np # import supervision as sv ## mask visualization lib from sam2.build_sam import build_sam2 from sam2.sam2_image_predictor import SAM2ImagePredictor from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator import time torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__() if torch.cuda.get_device_properties(0).major >= 8: torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.allow_tf32 = True DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # sam2_checkpoint = "../checkpoints/sam2.1_hiera_tiny.pt" # model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml" sam2_checkpoint = "../checkpoints/sam2.1_hiera_large.pt" model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml" sam2 = build_sam2(model_cfg, sam2_checkpoint, device =DEVICE, apply_postprocessing=False) mask_generator = SAM2AutomaticMaskGenerator(sam2) img_bgr = cv2.imread('test.png') img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB) ## input must be RGB (by default. Seems BGR is supported by using some param) start = time.perf_counter() masks = mask_generator.generate(img_rgb) # "Images are expected to be an np.ndarray in RGB format, and of shape HWC" stop = time.perf_counter() print("Runtime: {}ms".format((stop-start)*1000))
2023
Segment anything
Alexander Kirillov et al., Meta, 2023
Code
- 几种模式
- 根据prompt生成mask
- 为整张图像生成mask
- 应用实例:
- segment-anything/notebooks/predictor_example.ipynb at main · facebookresearch/segment-anything
- 初始化:
sam_checkpoint = "segment-anything/model/sam_vit_h_4b8939.pth" model_type = "vit_h" device = "cuda" sam = sam_model_registry[model_type](checkpoint=sam_checkpoint) sam.to(device=device) - 调用(可以不给参数?)
mask_generator_ = SamAutomaticMaskGenerator( model=sam, points_per_side=32, pred_iou_thresh=0.9, stability_score_thresh=0.96, crop_n_layers=1, crop_n_points_downscale_factor=2, min_mask_region_area=100, # Requires open-cv to run post-processing ) - 返回的masks:一个列表,其中每一项为一个dict,对应一个分割出的mask。
segmentation-[np.ndarray]- 原图尺寸大小(h,w)的二维numpy.ndarray,用True/False标识出该mask所分割出的区域area-[int]- the area of the mask in pixelsbbox-[List[int]]- 分割出物体的boundary box,格式为 [x, y, w, h]predicted_iou-[float]- the model’s own prediction for the quality of the maskpoint_coords-[List[List[float]]]- the sampled input point that generated this maskstability_score-[float]- an additional measure of mask qualitycrop_box-List[int]- the crop of the image used to generate this mask inxywhformat
MobileSAM
Chaoning Zhang et al., Kyung Hee University, 2023
- 将图像编码器改为TinyViT(参考论文Fast Pretraining Distillation for Small Vision Transformers)
MobileSAMv2
Chaoning Zhang et al., Kyung Hee University, 2023
FastSAM
*Xu Zhao et al., CASIA自动化所, 2023 https://arxiv.org/pdf/2306.12156.pdf https://github.com/CASIA-IVA-Lab/FastSAM
- based on YOLOv8-seg,使用YOLACT(实例分割)
- YOLO v8 backbone network ->
3D Skeletonization of Complex Grapevines for Robotic Pruning
Eric Schneider et al., CMU Kantor Lab, IROS 2023
- 纯图像处理方向的研究
- 相机系统
- 一系列立体图像,配准成为点云,产生图像分割遮罩
- A. Silwal, T. Parhar, F. Yandun, H. Baweja, and G. Kantor, “A robust illumination-invariant camera system for agricultural applications,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3292–3298.
- 图像处理方法
- 用MMLab segmentation toolkit,尝试了不同模型,最终选择UNet+geometric augmentation
- 得到分割后遮罩的2D骨架,对2D骨架进行dilation
- Make skeletal model
- 假设较短距离内的点应当相连,先对每一个点一定范围内的相邻点进行扫描(用一个球体),再使用k-d tree
- Minimum Spanning Tree (MST) using Kruskal algorithm, where Euclidean distance is the edge cost
2022
Review the state-of-the-art technologies of semantic segmentation based on deep learning
Yujian Mo et al., NeuroComputing 2022
解决方案
| Method | Publish | Year | |
|---|---|---|---|
| FCN | CVPR | 2015 | Fully Convolutional Network,全卷积神经网络,将传统CNN最后的全连接换层为卷积层 |
| U-net | MICCAI | 2015 | 架构:U型 |
| DeconvNet | ICCV | 2015 | |
| SegNet | TPAMI | 2015 | |
| Deeplab v1 | ICLR | 2015 | |
| Dilated Convolutions | ICLR | 2016 | |
| Deeplab v3 | arXiv | 2016? | |
| PSPNet | CVPR | 2017 | |
| RefineNet | CVPR | 2017 | |
| Large Kernel Matters | CVPR | 2017 | |
| Deeplab v2 | TPAMI | 2018 | |
| Deeplab v3+ | ECCV | 2018 | |
| DUC | WACV | 2018 | |
| ICNet | ECCV | 2018 | |
| BiSeNet | ECCV | 2018 | |
| AdaptSegNet | CVPR | 2018 | |
| ERFNet | TITS | 2018 | |
| EncNet | CVPR | 2018 | |
| CCNet | ICCV | 2019 |
数据集
- NYU Depth V2
- PASCAL-VOC 2012
- ADE20K
- ScanNet
- WoodScape
- KITTI-2012
2018
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Liang-Chieh Chen et al., ECCV2018
- 语义分割
- 空洞卷积(Dilated Convolution)/扩张卷积(Atrous Convolution)
- DeepLabV3+
- Pytorch版本