載入自訂四足機器人環境¶

在本教學中，我們將了解如何使用 MuJoCo/Ant-v5 框架來建立四足步行環境，使用模型檔案（結尾為 .xml），而無需建立新的類別。

步驟

取得您的 MJCF (或 URDF) 機器人模型檔案。
- 建立您自己的模型（請參閱指南）或，
- 尋找現成的模型（在本教學中，我們將使用來自 MuJoCo Menagerie 合集中的模型）。
使用 xml_file 參數載入模型。
調整環境參數以獲得所需的行為。
1. 調整環境模擬參數。
2. 調整環境終止參數。
3. 調整環境獎勵參數。
4. 調整環境觀察參數。
訓練智能體來移動您的機器人。

讀者應熟悉 Gymnasium API 和程式庫、機器人學基礎知識，以及包含它們使用的機器人模型的 Gymnasium/MuJoCo 環境。熟悉 MJCF 檔案模型格式和 MuJoCo 模擬器並非必要，但建議熟悉。

設定¶

我們將需要 gymnasium>=1.0.0。

pip install "gymnasium>=1.0.0"

步驟 0.1 - 下載機器人模型¶

在本教學中，我們將從出色的 MuJoCo Menagerie 機器人模型合集中載入 Unitree Go1 機器人。 Unitree Go1 robot in a flat terrain scene

Go1 是一個四足機器人，控制它移動是一個重要的學習問題，比 Gymnasium/MuJoCo/Ant 環境更困難。

我們可以下載整個 MuJoCo Menagerie 合集（其中包括 Go1），

git clone https://github.com/google-deepmind/mujoco_menagerie.git

您可以使用任何其他四足機器人進行本教學，只需為您的機器人調整環境參數值即可。

步驟 1 - 載入模型¶

要載入模型，我們只需將 xml_file 參數與 Ant-v5 框架一起使用。

import gymnasium
import numpy as np
env = gymnasium.make('Ant-v5', xml_file='./mujoco_menagerie/unitree_go1/scene.xml')

雖然這足以載入模型，但我們需要調整一些環境參數以獲得我們環境所需的行為，目前我們也將明確設定模擬、終止、獎勵和觀察參數，我們將在下一步中調整這些參數。

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0,
    frame_skip=1,
    max_episode_steps=1000,
)

步驟 2 - 調整環境參數¶

調整環境參數對於獲得學習所需的行為至關重要。在以下小節中，鼓勵讀者查閱參數的文件以獲取更詳細的資訊。

步驟 2.1 - 調整環境模擬參數¶

感興趣的參數是 frame_skip、reset_noise_scale 和 max_episode_steps。

我們想要調整 frame_skip 參數，使 dt 達到可接受的值（典型值為 dt \(\in [0.01, 0.1]\) 秒），

提醒：\(dt = frame\_skip \times model.opt.timestep\)，其中 model.opt.timestep 是 MJCF 模型檔案中選擇的積分器時間步長。

我們使用的 Go1 模型具有 0.002 的積分器時間步長，因此透過選擇 frame_skip=25，我們可以將 dt 的值設定為 0.05s。

為了避免過度擬合策略，reset_noise_scale 應設定為適合機器人大小的值，我們希望該值盡可能大，而不會使狀態的初始分佈無效（Terminal，無論控制動作如何），對於 Go1，我們選擇值 0.1。

並且 max_episode_steps 決定每個 episode 在 truncation 之前的步數，這裡我們將其設定為 1000 以與基於 Gymnasium/MuJoCo 環境保持一致，但如果您需要更高的值，您可以這樣設定。

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,  # set to avoid policy overfitting
    frame_skip=25,  # set dt=0.05
    max_episode_steps=1000,  # kept at 1000
)

步驟 2.2 - 調整環境終止參數¶

終止對於機器人環境很重要，以避免採樣「無用」的時間步長。

感興趣的參數是 terminate_when_unhealthy 和 healthy_z_range。

我們想要設定 healthy_z_range，以便在機器人摔倒或跳得非常高時終止環境，這裡我們必須選擇一個對機器人高度有邏輯的值，對於 Go1，我們選擇 (0.195, 0.75)。注意：healthy_z_range 檢查機器人高度的絕對值，因此如果您的場景包含不同的海拔高度，則應將其設定為 (-np.inf, np.inf)

我們也可以設定 terminate_when_unhealthy=False 以完全停用終止，這在 Go1 的情況下是不希望的。

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0.195, 0.75),  # set to avoid sampling steps where the robot has fallen or jumped too high
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

注意：如果您需要不同的終止條件，您可以編寫自己的 TerminationWrapper（請參閱文件）。

步驟 2.3 - 調整環境獎勵參數¶

感興趣的參數是 forward_reward_weight、ctrl_cost_weight、contact_cost_weight、healthy_reward 和 main_body。

對於參數 forward_reward_weight、ctrl_cost_weight、contact_cost_weight 和 healthy_reward，我們必須選擇對我們的機器人有意義的值，您可以使用預設的 MuJoCo/Ant 參數作為參考，如果您的環境需要更改，可以調整它們。在 Go1 的情況下，我們只更改 ctrl_cost_weight，因為它具有更高的致動器力範圍。

對於參數 main_body，我們必須選擇哪個身體部位是主體（通常在模型檔案中稱為「torso」或「trunk」之類的東西），用於計算 forward_reward，在 Go1 的情況下，它是 "trunk"（注意：在大多數情況下，包括這種情況，它可以保留為預設值）。

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=1,  # kept the same as the 'Ant' environment
    ctrl_cost_weight=0.05,  # changed because of the stronger motors of `Go1`
    contact_cost_weight=5e-4,  # kept the same as the 'Ant' environment
    healthy_reward=1,  # kept the same as the 'Ant' environment
    main_body=1,  # represents the "trunk" of the `Go1` robot
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

注意：如果您需要不同的獎勵函數，您可以編寫自己的 RewardWrapper（請參閱文件）。

步驟 2.4 - 調整環境觀察參數¶

感興趣的參數是 include_cfrc_ext_in_observation 和 exclude_current_positions_from_observation。

在這裡，對於 Go1，我們沒有特別的理由更改它們。

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=1,
    ctrl_cost_weight=0.05,
    contact_cost_weight=5e-4,
    healthy_reward=1,
    main_body=1,
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,  # kept the game as the 'Ant' environment
    exclude_current_positions_from_observation=False,  # kept the game as the 'Ant' environment
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

注意：如果您需要額外的觀察元素（例如額外的感測器），您可以編寫自己的 ObservationWrapper（請參閱文件）。

步驟 3 - 訓練您的智能體¶

最後，我們完成了，我們可以使用 RL 演算法來訓練智能體來步行/跑步 Go1 機器人。注意：如果您已按照本指南使用您自己的機器人模型，您可能會在訓練期間發現某些環境參數不如預期，請隨時返回步驟 2 並根據需要更改任何內容。

import gymnasium

env = gymnasium.make(
    'Ant-v5',
    xml_file='./mujoco_menagerie/unitree_go1/scene.xml',
    forward_reward_weight=1,
    ctrl_cost_weight=0.05,
    contact_cost_weight=5e-4,
    healthy_reward=1,
    main_body=1,
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)
... # run your RL algorithm

結語¶

您可以按照本指南建立大多數四足環境。要建立人形/雙足機器人，您也可以按照本指南使用 Gymnasium/MuJoCo/Humnaoid-v5 框架。

作者：@kallinteris-andreas