当前位置：首页 > news >正文

【机器人】复现 DOV-SG 机器人导航 | 动态开放词汇 | 3D 场景图

news 2025/7/5 5:14:37

DOV-SG 建了动态 3D 场景图，并使用LLM大型语言模型进行任务分解，从而能够在交互式探索过程中对 3D 场景图进行局部更新。

来自RA-L 2025，适合长时间的语言引导移动操作，动态开放词汇 3D 场景图。

论文地址：Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

代码地址：https://github.com/BJHYZJ/DovSG

本文分享DOV-SG复现和模型推理的过程～

下面是一个导航示例：

导航过程：（绿色点是当前位置，红色点目标位置，紫红色是导航轨迹）

1、创建Conda环境

首先创建一个Conda环境，名字为dovsg，python版本为3.9，然后进入dovsg环境

对于的两行执行命令：

conda create -n dovsg python=3.9 -y
conda activate dovsg

然后下载代码，进入代码工程：https://github.com/BJHYZJ/DovSG.git

git clone https://github.com/BJHYZJ/DovSG.git
cd DovSG

成功后如下图所示：

2、安装 PyTorch

使用 torch==2.3.1 、cuda-12.1 的版本进行安装，执行下面命令：

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

等待安装完成，打印信息：

Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.1.105 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.3 torch-2.3.1+cu121 torchaudio-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1 typing-extensions-4.12.2

3、安装 Segment-Anything-2

这些需要指定 segment-anything-2的代码版本为 '7e1596c'，兼容后面其他依赖库

执行下面命令：

cd third_party
git clone https://github.com/facebookresearch/sam2.git segment-anything-2
cd segment-anything-2
git checkout 7e1596c

运行过程打印信息：

然后修改 setup.py 代码，有两处需要修改的

# line 27: "numpy>=1.24.4" ==> "numpy>=1.23.0",

# line 144: python_requires=">=3.10.0" ==> python_requires=">=3.9.0"

再进行安装segment-anything-2，执行下面命令：

pip install -e ".[demo]"

等待安装完成～

  Attempting uninstall: SAM-2Found existing installation: SAM-2 1.0Uninstalling SAM-2-1.0:Successfully uninstalled SAM-2-1.0
Successfully installed SAM-2-1.0 anyio-4.9.0 argon2-cffi-25.1.0 argon2-cffi-bindings-21.2.0 
arrow-1.3.0 asttokens-3.0.0 async-lru-2.0.5 attrs-25.3.0 babel-2.17.0 beautifulsoup4-4.13.4 
bleach-6.2.0 certifi-2025.6.15 cffi-1.17.1 charset_normalizer-3.4.2 comm-0.2.2 contourpy-1.3.0 cycler-0.12.1 
debugpy-1.8.14 decorator-5.2.1 defusedxml-0.7.1 exceptiongroup-1.3.0 executing-2.2.0 fastjsonschema-2.21.1 
fonttools-4.58.4 fqdn-1.5.1 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 idna-3.10 importlib-metadata-8.7.0 
importlib-resources-6.5.2 ipykernel-6.29.5 ipython-8.18.1 ipywidgets-8.1.7 isoduration-20.11.0 
jedi-0.19.2 json5-0.12.0 jsonpointer-3.0.0 jsonschema-4.24.0 jsonschema-specifications-2025.4.1 
jupyter-1.1.1 jupyter-client-8.6.3 jupyter-console-6.6.3 jupyter-core-5.8.1 jupyter-events-0.12.0 
jupyter-lsp-2.2.5 jupyter-server-2.16.0 jupyter-server-terminals-0.5.3 jupyterlab-4.4.4 jupyterlab-pygments-0.3.0 jupyterlab-server-2.27.3 jupyterlab_widgets-3.0.15 kiwisolver-1.4.7 
matplotlib-3.9.4 matplotlib-inline-0.1.7 mistune-3.1.3 nbclient-0.10.2 nbconvert-7.16.6 nbformat-5.10.4 nest-asyncio-1.6.0 notebook-7.4.4 notebook-shim-0.2.4 opencv-python-4.11.0.86 overrides-7.7.0 pandocfilters-1.5.1 parso-0.8.4 pexpect-4.9.0 platformdirs-4.3.8 
prometheus-client-0.22.1 prompt-toolkit-3.0.51 psutil-7.0.0 ptyprocess-0.7.0 pure-eval-0.2.3 
pycparser-2.22 pygments-2.19.2 pyparsing-3.2.3 python-dateutil-2.9.0.post0 python-json-logger-3.3.0 
pyzmq-27.0.0 referencing-0.36.2 requests-2.32.4 rfc3339-validator-0.1.4 rfc3986-validator-0.1.1 
rpds-py-0.25.1 send2trash-1.8.3 six-1.17.0 sniffio-1.3.1 soupsieve-2.7 stack-data-0.6.3 terminado-0.18.1 
tinycss2-1.4.0 tomli-2.2.1 tornado-6.5.1 traitlets-5.14.3 types-python-dateutil-2.9.0.20250516 
uri-template-1.3.0 urllib3-2.5.0 wcwidth-0.2.13 webcolors-24.11.1 webencodings-0.5.1 websocket-client-1.8.0 widgetsnbextension-4.0.14 zipp-3.23.0

4、安装 GroundingDINO

这些需要指定 GroundingDINO 的代码版本为 '856dde2'，兼容后面其他依赖库

执行下面命令：

cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git GroundingDINO
cd GroundingDINO/
git checkout 856dde2

运行过程打印信息：

再进行安装GroundingDINO，执行下面命令：

pip install -e .

等待安装完成～

5、安装 RAM & Tag2Text

这些需要指定 recognize-anything 的代码版本为 '88c2b0c'，兼容后面其他依赖库

执行下面命令：

cd ..
git clone https://github.com/xinyu1205/recognize-anything.git
cd recognize-anything/
git checkout 88c2b0c

再分别执行下面命令，进行安装：

pip install -r requirements.txt
pip install -e .

运行过程打印信息：

等待安装完成～

6、安装 ACE

执行下面命令：

cd ../../ace/dsacstar/
conda install opencv
python setup.py install

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$ python setup.py install
Detected active conda environment: /home/lgp/anaconda3/envs/dovsg
Assuming OpenCV dependencies in:
........

........

creating dist
creating 'dist/dsacstar-0.0.0-py3.9-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing dsacstar-0.0.0-py3.9-linux-x86_64.egg
creating /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Extracting dsacstar-0.0.0-py3.9-linux-x86_64.egg to /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Adding dsacstar 0.0.0 to easy-install.pth file

Installed /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages/dsacstar-0.0.0-py3.9-linux-x86_64.egg
Processing dependencies for dsacstar==0.0.0
Finished processing dependencies for dsacstar==0.0.0
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/ace/dsacstar$

7、安装 LightGlue

这些需要指定 LightGlue 的代码版本为 'edb2b83'，兼容后面其他依赖库

执行下面命令：

cd ../../third_party/
git clone https://github.com/cvg/LightGlue.git
cd LightGlue/
git checkout edb2b83
python -m pip install -e .

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ git clone https://github.com/cvg/LightGlue.git
正克隆到 'LightGlue'...
remote: Enumerating objects: 386, done.
remote: Counting objects: 100% (205/205), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 386 (delta 147), reused 86 (delta 86), pack-reused 181 (from 2)
接收对象中: 100% (386/386), 17.43 MiB | 13.39 MiB/s, 完成.
处理 delta 中: 100% (236/236), 完成.
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ ls
DROID-SLAM GroundingDINO LightGlue pytorch3d recognize-anything segment-anything-2
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party$ cd LightGlue/
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ git checkout edb2b83
注意：正在切换到 'edb2b83'。

..............................

HEAD 目前位于 edb2b83 fix compilation for torch v2.2.1 (#124)
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/LightGlue$ python -m pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG/third_party/LightGlue
Installing build dependencies ... done
Checking if build backend supports build_editable ... done

................

Successfully built lightglue
Installing collected packages: kornia_rs, kornia, lightglue
Successfully installed kornia-0.8.1 kornia_rs-0.1.9 lightglue-0.0

再安装 Faiss库：

conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl

8、安装 PyTorch3d

这些需要指定 PyTorch3d 的代码版本为 '05cbea1'，兼容后面其他依赖库

执行下面命令：

cd ..
git clone https://github.com/facebookresearch/pytorch3d.git                                        
cd pytorch3d/
git checkout 05cbea1
python setup.py install

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$ python setup.py install
......................

Using /home/lgp/anaconda3/envs/dovsg/lib/python3.9/site-packages
Finished processing dependencies for pytorch3d==0.7.7
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/pytorch3d$

9、安装其他依赖包和dovsg库

首先安装一些依赖包，执行下面命令：

cd ../../
pip install ipython cmake pybind11 ninja scipy==1.10.1 scikit-learn==1.4.0 pandas==2.0.3 hydra-core opencv-python openai-clip timm matplotlib==3.7.2 imageio timm open3d numpy-quaternion more-itertools pyliblzfse einops transformers pytorch-lightning wget gdown tqdm zmq torch_geometric numpy==1.23.0  # -i https://pypi.tuna.tsinghua.edu.cn/simple

再安装 protobuf、MinkowskiEngine 、graspnet api

pip install protobuf==3.19.0
pip install git+https://github.com/pccws/MinkowskiEngine
pip install graspnetAPI

还需要安装 torch-cluster（先用wget下载xx.whl文件到本地，在用pip进行安装）

wget https://data.pyg.org/whl/torch-2.3.0%2Bcu121/torch_cluster-1.6.3%2Bpt23cu121-cp39-cp39-linux_x86_64.whlpip install torch_cluster-1.6.3+pt23cu121-cp39-cp39-linux_x86_64.whl

安装一些依赖包，执行下面命令：

pip install numpy==1.23.0 supervision==0.14.0 shapely alphashape 
pip install pyrealsense2 open_clip_torch graphviz pyrender
pip install openai==1.56.1
pip install transforms3d==0.3.1 scikit-image==0.19.3

最后安装 dovsg：

pip install -e .

等待安装完成～

(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$ pip install -e .
Obtaining file:///home/lgp/2025_project/DovSG
Preparing metadata (setup.py) ... done
Installing collected packages: dovsg
Running setup.py develop for dovsg
Successfully installed dovsg
(dovsg) lgp@lgp-MS-7E07:~/2025_project/DovSG$

补丁2025/7/4：可视化需要 graphviz

sudo apt-get install graphviz
conda install -c conda-forge graphviz python-graphviz

10、安装 DROID-SLAM

这里的 DROID-SLAM 和 DOV-SG 需要分割开，创建一个新的Conda环境进行搭建。

这些需要指定 DROID-SLAM 的代码版本为 8016d2b，兼容其他依赖库,执行下面命令：

cd ./third_party/
git clone https://github.com/princeton-vl/DROID-SLAM.git
cd DROID-SLAM/
git checkout 8016d2b

等待下载完成～

在DROID-SLAM/thirdparty/中需要存放：eigen、lietorch、tartanair_tools等依赖库，需要执行：

git submodule update --init thirdparty/lietorch

这样拉取并初始化所有子模块，在 DROID-SLAM 根目录下执行上面命令，这样会把 thirdparty/lietorch 等子模块都拉下来。

1、创建Conda环境

首先创建一个Conda环境，名字为droidenv，python版本为3.9，然后进入droidenv环境

对于的两行执行命令：

conda create -n droidenv python=3.9 -y
conda activate droidenv

2、安装PyTorch

conda install pytorch=1.10 torchvision torchaudio cudatoolkit=11.3 -c pytorch -y

3、安装依赖包

conda install suitesparse -c conda-forge -y
pip install open3d==0.15.2 scipy opencv-python==4.7.0.72 matplotlib pyyaml==6.0.2 tensorboard # -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install evo --upgrade --no-binary evo
pip install gdown
pip install numpy==1.23.0 numpy-quaternion==2023.0.4

等待下载完成～

4、安装torch-sactter

wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl

5、安装 DROID-SLAM

配置使用gcc-10/g++10

sudo apt install gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10

系统默认 CUDA 12.1，临时切换为 CUDA 11.3

注意：临时切换只在当前 shell session 生效，关闭终端后恢复原状态（CUDA 12.1）

export CUDA_HOME=/usr/local/cuda-11.3
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

(droidenv) lgp@lgp-MS-7E07:~/2025_project/DovSG/third_party/DROID-SLAM$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

进行安装DROID-SLAM：

python setup.py install

等待编译完成～

11、下载模型权重

在项目中，一共使用了7个模型（有些太多了），各个模型的版本及下载链接/方法如下：

anygrasp: when you get anygrasp license from here, it will provid checkpoint for you.
bert-base-uncased: https://huggingface.co/google-bert/bert-base-uncased
CLIP-ViT-H-14-laion2B-s32B-b79K: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints

模型权重存放的路径：

DovSG/
├── checkpoints
│ ├── anygrasp
│ │ ├── checkpoint_detection.tar
│ │ └── checkpoint_tracking.tar
│ ├── bert-base-uncased
│ │ ├── config.json
│ │ ├── model.safetensors
│ │ ├── tokenizer_config.json
│ │ ├── tokenizer.json
│ │ └── vocab.txt
│ ├── CLIP-ViT-H-14-laion2B-s32B-b79K
│ │ └── open_clip_pytorch_model.bin
│ ├── droid-slam
│ │ └── droid.pth
│ ├── GroundingDINO
│ │ ├── groundingdino_swint_ogc.pth
│ │ └── GroundingDINO_SwinT_OGC.py
│ ├── recognize_anything
│ │ └── ram_swin_large_14m.pth
│ └── segment-anything-2
│ └── sam2_hiera_large.pt
└── license
├── licenseCfg.json
├── ZhijieYan.lic
├── ZhijieYan.public_key
└── ZhijieYan.signature
...

下载大模型的权重需要：

需要在本地安装 Git LFS 工具（用于处理大文件）：

sudo apt-get install git-lfs

安装后，在终端执行以下命令启用 LFS 支持：

git lfs install

2、bert-base-uncased 权重

执行下面命令，进行下载：

mkdir checkpoints
cd checkpoints/
git clone https://huggingface.co/google-bert/bert-base-uncased

等待下载完成～

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ mkdir checkpoints
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG$ cd checkpoints/
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/google-bert/bert-base-uncased
正克隆到 'bert-base-uncased'...
remote: Enumerating objects: 85, done.
remote: Total 85 (delta 0), reused 0 (delta 0), pack-reused 85 (from 1)
展开对象中: 100% (85/85), 330.58 KiB | 912.00 KiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased

拉取对应的权重：

cd bert-base-uncased
git lfs pull

3、CLIP-ViT-H-14-laion2B-s32B-b79K 权重

执行下面命令，进行下载：

cd ../
git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K

等待下载完成～

(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ git clone https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
正克隆到 'CLIP-ViT-H-14-laion2B-s32B-b79K'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 47 (delta 2), reused 0 (delta 0), pack-reused 39 (from 1)
展开对象中: 100% (47/47), 1.08 MiB | 1.64 MiB/s, 完成.
(base) lgp@lgp-MS-7E07:~/2025_project/DovSG/checkpoints$ ls
bert-base-uncased CLIP-ViT-H-14-laion2B-s32B-b79K

拉取对应的权重：

cd CLIP-ViT-H-14-laion2B-s32B-b79K
git lfs pull

4、droid-slam 、GroundingDINO、recognize_anything、segment-anything-2 权重

执行下面命令，创建不同权重的文件夹：

cd ../
mkdir droid-slam
mkdir GroundingDINO
mkdir recognize_anything
mkdir segment-anything-2

这些权重只能在网页下载后，复制到对应文件夹中

droid-slam: https://drive.google.com/file/u/0/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing&pli=1
GroundingDINO: https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth and https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
recognize_anything: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/blob/main/ram_swin_large_14m.pth
segment-anything-2: https://github.com/facebookresearch/sam2?tab=readme-ov-file#download-checkpoints

12、下载数据集

数据集下载地址：https://drive.google.com/drive/folders/13v5QOrqjxye__kJwDIuD7kTdeSSNfR5x

下载后解压到DovSG主目录中，生成一个data_example目录；

备注：poses_droidslam是后面运行生成的，这里忽略～

13、使用DROID-SLAM进行姿势估计

激活 Conda 环境，执行下面命令：

conda deactivate 
conda activate droidenv

修改代码：third_party/DROID-SLAM/droid_slam/trajectory_filler.py

在第90行的for循环需要修改：

        # for (tstamp, image, intrinsic) in image_stream:for (tstamp, image, pose, intrinsic) in image_stream:tstamps.append(tstamp)images.append(image)intrinsics.append(intrinsic)if len(tstamps) == 16:pose_list += self.__fill(tstamps, images, intrinsics)tstamps, images, intrinsics = [], [], []

因为在image_stream返回了四个值的，这样才对：for (tstamp, image, pose, intrinsic) in image_stream

运行姿势估计，执行下面命令：

python dovsg/scripts/pose_estimation.py \--datadir "data_example/room1" \--calib "data_example/room1/calib.txt" \--t0 0 \--stride 1 \--weights "checkpoints/droid-slam/droid.pth" \--buffer 2048

程序运行结束后，我们将看到一个名为的新文件夹poses_droidslam，data_example/room1其中包含所有视点的姿势。

运行信息：

Pose Estimation:: 100%|██████████████████████████████████████████████████████████| 739/739 [00:25<00:00, 29.32it/s]
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
################################
Global BA Iteration #1
Global BA Iteration #2
Global BA Iteration #3
Global BA Iteration #4
Global BA Iteration #5
Global BA Iteration #6
Global BA Iteration #7
Global BA Iteration #8
Global BA Iteration #9
Global BA Iteration #10
Global BA Iteration #11
Global BA Iteration #12
Result Pose Number is 739

14、可视化重建的场景

根据DROID-SLAM估计的姿势，可视化重建场景

激活 Conda 环境，执行下面命令：

conda deactivate 
conda activate dovsg

重建3D场景，执行下面命令：

python dovsg/scripts/show_pointcloud.py \--tags "room1" \--pose_tags "poses_droidslam"

可视化效果：

15、进行DOV-SG推理

执行下面命令：

python demo.py \--tags "room1" \--preprocess \--debug \--task_scene_change_level "Minor Adjustment" \--task_description "Please move the red pepper to the plate, then move the green pepper to plate."

该代码的思路流程：

使用相机对房间进行扫描，收集 RGB-D 数据。
基于收集到的 RGB-D 数据估计相机姿态。
根据检测到的地面（floor）进行坐标系变换。
训练重定位模型（ACE），为后续操作提供支持。
生成视图数据集（View Dataset）。
利用视觉语言模型（VLMs）对现实世界中的对象进行表示，使其如同 3D 场景图（3D Scene Graph）中的节点（nodes）一般；同时采用基于规则（rule-based）的方法提取对象间的关系（relationships）。
提取 LightGlue 特征，以辅助后续的重定位任务。
将其应用于大语言模型（LLM）的任务规划中。
在执行重定位（relocalization）子任务时，对 3D 场景图进行持续更新（continuously updating）。

根据检测到的地面，进行坐标系变换：

get floor pcd and transform scene.: 100%|████████████████████████████████████████| 247/247 [00:41<00:00, 5.93it/s]

训练重定位模型（ACE），为后续操作提供支持：
Train ACE
create save folder: data_example/room1/ace
filling training buffers with 1000000/8000000 samples
filling training buffers with 2000000/8000000 samples
filling training buffers with 3000000/8000000 samples
filling training buffers with 4000000/8000000 samples
filling training buffers with 5000000/8000000 samples
filling training buffers with 6000000/8000000 samples
filling training buffers with 7000000/8000000 samples
filling training buffers with 8000000/8000000 samples
Train ACE Over!

运行打印信息：

final text_encoder_type: bert-base-uncased
==> Initializing CLIP model...
==> Done initializing CLIP model.
BertLMHeadModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
--------------
checkpoints/recognize_anything/ram_swin_large_14m.pth
--------------
load checkpoint from checkpoints/recognize_anything/ram_swin_large_14m.pth
vit: swin_l
semantic meomry: 100%|███████████████████████████████████████████████████████████| 247/247 [04:15<00:00, 1.03s/it]
.........

检测出的物体：

LLM大语言模型任务规划过程，打印信息：

[{'action': 'Go to', 'object1': 'red pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'red pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'red pepper', 'object2': 'plate'}, {'action': 'Go to', 'object1': 'green pepper', 'object2': None}, {'action': 'Pick up', 'object1': 'green pepper'}, {'action': 'Go to', 'object1': 'plate', 'object2': None}, {'action': 'Place', 'object1': 'green pepper', 'object2': 'plate'}]
Initializing Instance Localizer.

Data process over!

===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/0_start.npy
Sampling 64 hypotheses.

通过ICP匹配，执行重定位子任务，对 3D 场景图进行持续更新

IPC Number: 5182, 7589, 6760
IPC Number: 20009, 35374, 27853
IPC Number: 80797, 179609, 129217

导航过程：（绿色点是当前位置，红色点目标位置，紫红色是导航轨迹）

Now are in step 0

Runing Go to(red pepper, None) Task.
A is red pepper
B is None
====> A* planning.
[[2.33353067 0.83389901 3.92763996]
[2.05 0.55 4.19324287]
[1.85 0.2 5.09701148]]

机器人找到物体，进行操作（请将红辣椒移到盘子里，然后将青椒移到盘子里）：

data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/navigation_vis.jpg
please move the agent to target point (Press Enter).===> get observations from robot.
observation save path: data_example/room1/memory/3_0.1_0.01_True_0.2_0.5/Minor Adjustment long_term_task: Please move the red pepper to the plate, then move the green pepper to plate./step_0/observations/1_after_Go to(red pepper, None).npy