Manipulation

主要问题

物体的表征（representation）？根据具体任务有所不同
- 刚体：
  - 重排列任务：空间关系？
- 3D deformable object
  - 塑形任务
  - 打包装箱任务
- 2D deformable object
  - 叠衣服：2D的拓扑表征是什么？
- 1D deformable object
  - 绳结：knot theory的拓扑
  - 几何模型

2024

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Bowen Jiang et al., CMU RSS2024

什么是Spatially-Grounded motion primitives？
- what：primitive类型，如grasp或push
- where：在哪里发生（spatially-grounded），如gripper在哪里产生接触
- how：如何执行？如应该向那个方向推或grasp orientation（将primitive参数化）

Interactive Robot-Environment Self-Calibration via Compliant Exploratory Actions

Podshara Chanrungmaneekul et al., Rice, IROS 2024

主要贡献：不需要额外传感器，只通过机械臂末端主动与环境进行touching和sliding的交互，获取contact信息，校正自身的坐标系信息identifying the robot-environment spatial parameters self-calibration问题estimating the pose of a robot manipulator in an uncalibrated environment without the requirements of any external sensors such as cameras（an accurate pose of the robot frame relative to a frame of the workspace）
通过touching和sliding动作主动的与环境交互，使用Particle Filter对robot-environment （环境为静态的）的空间关系进行建模
存在的问题：
- 扩展到矫正更多的传感器，如相机

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers

Alper Ahmetoglu et al., Bogazici University, IROS 2024

主要贡献
- relational object symbols是什么？compute discrete selfattention weights from object features and treat them as relational symbols between objects
存在的问题
Relational DeepSym，输出多个物体间的关系
用self-attention mechanism架构，提取relational symbols between objects，将关系离散化
输入为一系列物体特征（状态向量），包括物体类型（如short block，long block)及其位姿（位置和欧拉角），共8个维度，方法可扩展到更多不同类型的输入
high-level action为pick-place()

Δyi and Δyj are the y-axis pick and release positions relative to the center of the object (Fig. 2(a)). Δyi and Δyj can take discrete values of {−1, 0, 1} which correspond to 7.5 cm left, center, and 7.5 cm right of the object center, respectively。总计种
状态描述符，目的是将状态向量转化成描述物体的特征向量和关系符号，第k种关系

The goal is to transform the state vector O = {o1, … , on} where each oi ∈ Rdo is a feature vector describing object i into a set of object symbols Z = {z1, … , zn} and relational symbols Rk = {r(k) 11 , … , r(nkn)} where zi ∈ {0, 1}dz is a dz-dimensional binary vector (i.e., an object symbol) for the ith object and r(k) ij ∈ {0, 1} is a binary value for the kth relation between the ith and jth objects.

2023

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

Youngsun Wi et atl., UMich, CoRL 2023

主要贡献：
- language-conditioned and contact-aware spatial action map representation language guide manipulation（任务是诸如wipe_desk，sweep_to_dustpan，push_button）
- MPPI控制器算法预测接触目标和接触constraint
- 无需fine-tuning即可泛化至未见过的物体
实现方法：
- generate word-wise heatmaps that high light the spatial locations in the RGB images corresponding to specific words in the language instruction

SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications

Christian Pek et al., TUdelft, Autonomous robots 2023

2022

A Bimanual Manipulation Taxonomy

Franziska Krebs et al., H2T KIT (Tamim Asfour), RAL 2022

A General Method for Autonomous Assembly of Arbitrary Parts in the Presence of Uncertainty

Shichen Cao et al., IROS 2022

Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects

Upinder Kaur et al., Purdue, CASE 2022

2D object (ziplock bag)

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Weiyu Liu et al., UW, ICRA 2022

主要贡献
- 实现对物体的重新排列任务。输入为点云，通过transformer神经网络，目标为给定的排列方式指令（结构化语言）we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration.
- enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.
实现方法：
- 架构：transformer
- 训练数据量：100,000个序列，其中80%用于训练 We introduce a dataset for four representative structures containing more than 100,000 rearrangement sequences in total. We split the dataset into 80% training, 10% validation, and 10% testing.
存在的问题：
- 直接通过自然语言，而不是结构化语言指定目标结构
- 未解决placement in clutter或finding the optimal order of rearranging actions，排列顺序为预先给定build in a predefined order
- 不同场景，不限于桌面
具体细节
- Transformer的作用：encoder接受语言输入，decoder预测应该执行的运动 The encoders can directly predict what objects to move and also provide a context for an autoregressive transformer decoder to predict where the objects should go and how they should be oriented.
- structured language command是什么样的？
  - 范例”“Rearrange objects that have the same class as the yellow object into a circle””
  - The language instructions involve many different semantic and geometric properties for both grounding objects and specifying the spatial structures.
  - 其中为word tokens
  - We map each unique word token from the language instructions to an embedding with a learned mapping $h_w(w_i) -> c_i$

2021

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Wentao Yuan et al., UW, CoRL 2021

主要贡献：
- extracting object-centric embeddings from RGB images that generalizes zero-shot to different number and type of objects
- 获取物体的空间关系，仅依赖logical supervison a framework for learning object embeddings that capture continuous spatial relations with only logical supervi sion
- 数据库
- 此外在测试任务包括空间关系的问答
实现方法：
- 使用Visual Transformer
- 输出top、on_surface、stacked、aligned等
- 在真实环境中测试了open loop planning
存在的问题

2013

Robotic surface assembly via contact state transitions

Amar Saric et al., UNCC, CASE 2013

俺寻思

Explorer

manipulation

Manipulation

主要问题

2024

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Interactive Robot-Environment Self-Calibration via Compliant Exploratory Actions

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers

2023

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications

2022

A Bimanual Manipulation Taxonomy

A General Method for Autonomous Assembly of Arbitrary Parts in the Presence of Uncertainty

Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

2021

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

2013

Robotic surface assembly via contact state transitions

Table of Contents

Backlinks

俺寻思

Explorer

manipulation

Manipulation §

主要问题 §

2024 §

HACMan++: Spatially-Grounded Motion Primitives for Manipulation §

Interactive Robot-Environment Self-Calibration via Compliant Exploratory Actions §

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers §

2023 §

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation §

SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications §

2022 §

A Bimanual Manipulation Taxonomy §

A General Method for Autonomous Assembly of Arbitrary Parts in the Presence of Uncertainty §

Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects §

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects §

2021 §

SORNet: Spatial Object-Centric Representations for Sequential Manipulation §

2013 §

Robotic surface assembly via contact state transitions §

Table of Contents

Backlinks

Manipulation

主要问题

2024

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Interactive Robot-Environment Self-Calibration via Compliant Exploratory Actions

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers

2023

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications

2022

A Bimanual Manipulation Taxonomy

A General Method for Autonomous Assembly of Arbitrary Parts in the Presence of Uncertainty

Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

2021

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

2013

Robotic surface assembly via contact state transitions