錄製代理 (Recording Agents)¶
在訓練期間或評估代理時,記錄代理在一個 episode 中的行為並記錄累積的總獎勵可能很有趣。這可以透過兩個封裝器來實現:RecordEpisodeStatistics
和 RecordVideo
,第一個追蹤 episode 數據,例如總獎勵、episode 長度和花費的時間,第二個使用環境渲染生成代理的 mp4 影片。(During training or when evaluating an agent, it may be interesting to record agent behaviour over an episode and log the total reward accumulated. This can be achieved through two wrappers: RecordEpisodeStatistics
and RecordVideo
, the first tracks episode data such as the total rewards, episode length and time taken and the second generates mp4 videos of the agents using the environment renderings.)
我們展示如何將這些封裝器應用於兩種問題類型;第一種用於記錄每個 episode 的數據(通常是評估),第二種用於定期記錄數據(用於正常訓練)。(We show how to apply these wrappers for two types of problems; the first for recording data for every episode (normally evaluation) and second for recording data periodically (for normal training).)
記錄每個 Episode (Recording Every Episode)¶
給定一個訓練有素的代理,您可能希望在評估期間記錄幾個 episode,以了解代理的行為方式。下面我們提供一個範例腳本,使用 RecordEpisodeStatistics
和 RecordVideo
來執行此操作。(Given a trained agent, you may wish to record several episodes during evaluation to see how the agent acts. Below we provide an example script to do this with the RecordEpisodeStatistics
and RecordVideo
.)
import gymnasium as gym
from gymnasium.wrappers import RecordEpisodeStatistics, RecordVideo
num_eval_episodes = 4
env = gym.make("CartPole-v1", render_mode="rgb_array") # replace with your environment
env = RecordVideo(env, video_folder="cartpole-agent", name_prefix="eval",
episode_trigger=lambda x: True)
env = RecordEpisodeStatistics(env, buffer_length=num_eval_episodes)
for episode_num in range(num_eval_episodes):
obs, info = env.reset()
episode_over = False
while not episode_over:
action = env.action_space.sample() # replace with actual agent
obs, reward, terminated, truncated, info = env.step(action)
episode_over = terminated or truncated
env.close()
print(f'Episode time taken: {env.time_queue}')
print(f'Episode total rewards: {env.return_queue}')
print(f'Episode lengths: {env.length_queue}')
在上面的腳本中,對於 RecordVideo
封裝器,我們指定了三個不同的變數:video_folder
用於指定應儲存影片的資料夾(根據您的問題變更)、name_prefix
用於影片本身的前綴,最後是 episode_trigger
,以便記錄每個 episode。這表示對於環境的每個 episode,都會記錄一個影片並以 “cartpole-agent/eval-episode-x.mp4” 的樣式儲存。(In the script above, for the RecordVideo
wrapper, we specify three different variables: video_folder
to specify the folder that the videos should be saved (change for your problem), name_prefix
for the prefix of videos themselves and finally an episode_trigger
such that every episode is recorded. This means that for every episode of the environment, a video will be recorded and saved in the style “cartpole-agent/eval-episode-x.mp4”.)
對於 RecordEpisodeStatistics
,我們只需要指定緩衝區長度,這是內部 time_queue
、return_queue
和 length_queue
的最大長度。我們可以不用個別收集每個 episode 的數據,而是可以使用數據佇列在評估結束時列印資訊。(For the RecordEpisodeStatistics
, we only need to specify the buffer lengths, this is the max length of the internal time_queue
, return_queue
and length_queue
. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.)
為了加速評估環境,可以使用向量環境來實現,以便同時並行評估 N
個 episode,而不是串行評估。(For speed ups in evaluating environments, it is possible to implement this with vector environments in order to evaluate N
episodes at the same time in parallel rather than series.)
在訓練期間錄製代理 (Recording the Agent during Training)¶
在訓練期間,代理將執行數百或數千個 episode,因此,您無法為每個 episode 錄製影片,但開發人員可能仍然想知道代理在訓練中不同時間點的行為方式,以便在訓練期間定期錄製 episode。雖然對於 episode 統計數據,了解每個 episode 的數據更有幫助。以下腳本提供了一個範例,說明如何在記錄每個 episode 統計數據的同時,定期錄製代理的 episode(我們使用 python 的 logger,但 tensorboard、wandb 和其他模組也可用)。(During training, an agent will act in hundreds or thousands of episodes, therefore, you can’t record a video for each episode, but developers might still want to know how the agent acts at different points in the training, recording episodes periodically during training. While for the episode statistics, it is more helpful to know this data for every episode. The following script provides an example of how to periodically record episodes of an agent while recording every episode’s statistics (we use the python’s logger but tensorboard, wandb and other modules are available).)
import logging
import gymnasium as gym
from gymnasium.wrappers import RecordEpisodeStatistics, RecordVideo
training_period = 250 # record the agent's episode every 250
num_training_episodes = 10_000 # total number of training episodes
env = gym.make("CartPole-v1", render_mode="rgb_array") # replace with your environment
env = RecordVideo(env, video_folder="cartpole-agent", name_prefix="training",
episode_trigger=lambda x: x % training_period == 0)
env = RecordEpisodeStatistics(env)
for episode_num in range(num_training_episodes):
obs, info = env.reset()
episode_over = False
while not episode_over:
action = env.action_space.sample() # replace with actual agent
obs, reward, terminated, truncated, info = env.step(action)
episode_over = terminated or truncated
logging.info(f"episode-{episode_num}", info["episode"])
env.close()