提交记录:https://github.com/NM512/dreamerv3-torch/issues/18
[392996] model_loss 3.3 / model_grad_norm 9.6 / image_loss 1.4 / reward_loss 0.1 / cont_loss 0.0 / kl_free 1.0 / dyn_scale 0.5 / rep_scale 0.1 / dyn_loss 3.0 / rep_loss 3.0 / kl 3.0 / prior_ent 46.1 / post_ent 43.0 / normed_target_mean 0.4 / normed_target_std 0.4 / normed_target_min -0.0 / normed_target_max 1.4 / EMA_005 7.0 / EMA_095 702.7 / value_mean 311.6 / value_std 264.2 / value_min 0.5 / value_max 773.7 / target_mean 313.0 / target_std 266.2 / target_min 0.6 / target_max 955.0 / imag_reward_mean 1.6 / imag_reward_std 14.8 / imag_reward_min -0.0 / imag_reward_max 414.3 / imag_action_mean 13.2 / imag_action_std 4.3 / imag_action_min 0.0 / imag_action_max 17.0 / actor_entropy 0.2 / actor_loss -0.1 / actor_grad_norm 3.4 / value_loss 1.4 / value_grad_norm 1.6 / update_count 92501.0 / fps 7.5
[391512] model_loss 8.1 / model_grad_norm 17.7 / image_loss 2.8 / reward_loss 0.1 / cont_loss 0.0 / kl_free 1.0 / dyn_scale 0.5 / rep_scale 0.1 / dyn_loss 8.6 / rep_loss 8.6 / kl 8.6 / prior_ent 57.9 / post_ent 49.1 / normed_target_mean 0.4 / normed_target_std 0.3 / normed_target_min -0.2 / normed_target_max 1.7 / EMA_005 9.1 / EMA_095 46.4 / value_mean 24.2 / value_std 10.4 / value_min 0.4 / value_max 62.2 / target_mean 24.5 / target_std 10.8 / target_min 0.1 / target_max 73.8 / imag_reward_mean 0.2 / imag_reward_std 0.9 / imag_reward_min -0.0 / imag_reward_max 16.0 / imag_action_mean 10.6 / imag_action_std 5.8 / imag_action_min 0.0 / imag_action_max 17.0 / actor_entropy 0.3 / actor_loss 0.0 / actor_grad_norm 0.6 / value_loss 1.9 / value_grad_norm 0.9 / update_count 92501.0 / fps 6.8
debug dream 看 deter的流程
检查了 dreamer torch版本,dreamerv3 版本,dreamerv2 版本 ,STPN带
latent = templatent[0], stpnstatus = templatent[1]
STPN代码记录
dreamer论文
h_tp1, states, energy = self.rnn(x=x_t, states=hebb)
512 batchsize
maze 论文 及 torch v3 代码及 v2 代码 论文
先检查log 再提交pull
v3torch
dreamerv2
atatri环境
maze环境
v2:
deter 论文介绍的 结构 到代码结构: