Manipulation

主要问题

  • 物体的表征(representation)?根据具体任务有所不同
    • 刚体:
      • 重排列任务:空间关系?
    • 3D deformable object
      • 塑形任务
      • 打包装箱任务
    • 2D deformable object
      • 叠衣服:2D的拓扑表征是什么?
    • 1D deformable object
      • 绳结:knot theory的拓扑
      • 几何模型

2024

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Bowen Jiang et al., CMU RSS2024

  • 什么是Spatially-Grounded motion primitives?
    • what:primitive类型,如grasp或push
    • where:在哪里发生(spatially-grounded),如gripper在哪里产生接触
    • how:如何执行?如应该向那个方向推或grasp orientation(将primitive参数化)

Interactive Robot-Environment Self-Calibration via Compliant Exploratory Actions

Podshara Chanrungmaneekul et al., Rice, IROS 2024

  • 主要贡献:不需要额外传感器,只通过机械臂末端主动与环境进行touching和sliding的交互,获取contact信息,校正自身的坐标系信息identifying the robot-environment spatial parameters self-calibration问题estimating the pose of a robot manipulator in an uncalibrated environment without the requirements of any external sensors such as camerasan accurate pose of the robot frame relative to a frame of the workspace
  • 通过touching和sliding动作主动的与环境交互,使用Particle Filter对robot-environment (环境为静态的)的空间关系进行建模
  • 存在的问题:
    • 扩展到矫正更多的传感器,如相机

Discovering Predictive Relational Object Symbols With Symbolic Attentive Layers

Alper Ahmetoglu et al., Bogazici University, IROS 2024

  • 主要贡献
    • relational object symbols是什么?compute discrete selfattention weights from object features and treat them as relational symbols between objects
  • 存在的问题
  • Relational DeepSym,输出多个物体间的关系
  • 用self-attention mechanism架构,提取relational symbols between objects,将关系离散化
  • 输入为一系列物体特征(状态向量),包括物体类型(如short block,long block)及其位姿(位置和欧拉角),共8个维度,方法可扩展到更多不同类型的输入
  • high-level action为pick-place()

    Δyi and Δyj are the y-axis pick and release positions relative to the center of the object (Fig. 2(a)). Δyi and Δyj can take discrete values of {−1, 0, 1} which correspond to 7.5 cm left, center, and 7.5 cm right of the object center, respectively。总计

  • 状态描述符,目的是将状态向量转化成描述物体的特征向量和关系符号,第k种关系

    The goal is to transform the state vector O = {o1, … , on} where each oi ∈ Rdo is a feature vector describing object i into a set of object symbols Z = {z1, … , zn} and relational symbols Rk = {r(k) 11 , … , r(nkn)} where zi ∈ {0, 1}dz is a dz-dimensional binary vector (i.e., an object symbol) for the ith object and r(k) ij ∈ {0, 1} is a binary value for the kth relation between the ith and jth objects.


2023

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

Youngsun Wi et atl., UMich, CoRL 2023

  • 主要贡献:
    • language-conditioned and contact-aware spatial action map representation language guide manipulation(任务是诸如wipe_desksweep_to_dustpanpush_button
    • MPPI控制器算法预测接触目标和接触constraint
    • 无需fine-tuning即可泛化至未见过的物体
  • 实现方法:
    • generate word-wise heatmaps that high light the spatial locations in the RGB images corresponding to specific words in the language instruction

SpaTiaL: monitoring and planning of robotic tasks using spatio-temporal logic specifications

Christian Pek et al., TUdelft, Autonomous robots 2023


2022

A Bimanual Manipulation Taxonomy

Franziska Krebs et al., H2T KIT (Tamim Asfour), RAL 2022

A General Method for Autonomous Assembly of Arbitrary Parts in the Presence of Uncertainty

Shichen Cao et al., IROS 2022

Human-Like Multimodal Perception and Purposeful Manipulation for Deformable Objects

Upinder Kaur et al., Purdue, CASE 2022

  • 2D object (ziplock bag)

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Weiyu Liu et al., UW, ICRA 2022

  • 主要贡献
    • 实现对物体的重新排列任务。输入为点云,通过transformer神经网络,目标为给定的排列方式指令(结构化语言)we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration.
    • enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.
  • 实现方法:
    • 架构:transformer
    • 训练数据量:100,000个序列,其中80%用于训练 We introduce a dataset for four representative structures containing more than 100,000 rearrangement sequences in total. We split the dataset into 80% training, 10% validation, and 10% testing.
  • 存在的问题:
    • 直接通过自然语言,而不是结构化语言指定目标结构
    • 未解决placement in clutterfinding the optimal order of rearranging actions,排列顺序为预先给定build in a predefined order
    • 不同场景,不限于桌面
  • 具体细节
    • Transformer的作用:encoder接受语言输入,decoder预测应该执行的运动 The encoders can directly predict what objects to move and also provide a context for an autoregressive transformer decoder to predict where the objects should go and how they should be oriented.
    • structured language command是什么样的?
      • 范例”“Rearrange objects that have the same class as the yellow object into a circle””
      • The language instructions involve many different semantic and geometric properties for both grounding objects and specifying the spatial structures.
      • 其中为word tokens
      • We map each unique word token from the language instructions to an embedding with a learned mapping $h_w(w_i) -> c_i$

2021

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Wentao Yuan et al., UW, CoRL 2021

  • 主要贡献:
    • extracting object-centric embeddings from RGB images that generalizes zero-shot to different number and type of objects
    • 获取物体的空间关系,仅依赖logical supervison a framework for learning object embeddings that capture continuous spatial relations with only logical supervi sion
    • 数据库
    • 此外在测试任务包括空间关系的问答
  • 实现方法:
    • 使用Visual Transformer
    • 输出top、on_surface、stacked、aligned等
    • 在真实环境中测试了open loop planning
  • 存在的问题

2013

Robotic surface assembly via contact state transitions

Amar Saric et al., UNCC, CASE 2013