diff --git "a/sf_log.txt" "b/sf_log.txt"
--- "a/sf_log.txt"
+++ "b/sf_log.txt"
@@ -1,50 +1,27 @@
-[2025-01-15 19:55:56,247][00513] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2025-01-15 19:55:56,250][00513] Rollout worker 0 uses device cpu
-[2025-01-15 19:55:56,252][00513] Rollout worker 1 uses device cpu
-[2025-01-15 19:55:56,254][00513] Rollout worker 2 uses device cpu
-[2025-01-15 19:55:56,256][00513] Rollout worker 3 uses device cpu
-[2025-01-15 19:55:56,257][00513] Rollout worker 4 uses device cpu
-[2025-01-15 19:55:56,258][00513] Rollout worker 5 uses device cpu
-[2025-01-15 19:55:56,260][00513] Rollout worker 6 uses device cpu
-[2025-01-15 19:55:56,261][00513] Rollout worker 7 uses device cpu
-[2025-01-15 19:55:56,414][00513] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 19:55:56,415][00513] InferenceWorker_p0-w0: min num requests: 2
-[2025-01-15 19:55:56,456][00513] Starting all processes...
-[2025-01-15 19:55:56,458][00513] Starting process learner_proc0
-[2025-01-15 19:55:56,509][00513] Starting all processes...
-[2025-01-15 19:55:56,516][00513] Starting process inference_proc0-0
-[2025-01-15 19:55:56,518][00513] Starting process rollout_proc1
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc2
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc3
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc4
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc5
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc6
-[2025-01-15 19:55:56,519][00513] Starting process rollout_proc7
-[2025-01-15 19:55:56,516][00513] Starting process rollout_proc0
-[2025-01-15 19:56:13,220][04050] Worker 4 uses CPU cores [0]
-[2025-01-15 19:56:13,300][04052] Worker 7 uses CPU cores [1]
-[2025-01-15 19:56:13,350][04034] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 19:56:13,353][04034] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2025-01-15 19:56:13,354][04053] Worker 3 uses CPU cores [1]
-[2025-01-15 19:56:13,371][04051] Worker 5 uses CPU cores [1]
-[2025-01-15 19:56:13,393][04034] Num visible devices: 1
-[2025-01-15 19:56:13,416][04034] Starting seed is not provided
-[2025-01-15 19:56:13,417][04034] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 19:56:13,418][04034] Initializing actor-critic model on device cuda:0
-[2025-01-15 19:56:13,419][04034] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 19:56:13,422][04034] RunningMeanStd input shape: (1,)
-[2025-01-15 19:56:13,467][04034] ConvEncoder: input_channels=3
-[2025-01-15 19:56:13,505][04047] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 19:56:13,507][04047] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2025-01-15 19:56:13,517][04048] Worker 1 uses CPU cores [1]
-[2025-01-15 19:56:13,550][04055] Worker 0 uses CPU cores [0]
-[2025-01-15 19:56:13,553][04054] Worker 6 uses CPU cores [0]
-[2025-01-15 19:56:13,555][04049] Worker 2 uses CPU cores [0]
-[2025-01-15 19:56:13,562][04047] Num visible devices: 1
-[2025-01-15 19:56:13,764][04034] Conv encoder output size: 512
-[2025-01-15 19:56:13,765][04034] Policy head output size: 512
-[2025-01-15 19:56:13,819][04034] Created Actor Critic model with architecture:
-[2025-01-15 19:56:13,819][04034] ActorCriticSharedWeights(
+[2025-01-15 20:40:31,019][18890] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-01-15 20:40:31,022][18890] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-01-15 20:40:31,093][18890] Num visible devices: 1
+[2025-01-15 20:40:31,136][18890] Starting seed is not provided
+[2025-01-15 20:40:31,138][18890] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-01-15 20:40:31,138][18890] Initializing actor-critic model on device cuda:0
+[2025-01-15 20:40:31,139][18890] RunningMeanStd input shape: (3, 72, 128)
+[2025-01-15 20:40:31,141][18890] RunningMeanStd input shape: (1,)
+[2025-01-15 20:40:31,228][18890] ConvEncoder: input_channels=3
+[2025-01-15 20:40:31,267][18908] Worker 4 uses CPU cores [0]
+[2025-01-15 20:40:31,383][18910] Worker 7 uses CPU cores [1]
+[2025-01-15 20:40:31,596][18903] Worker 1 uses CPU cores [1]
+[2025-01-15 20:40:31,617][18904] Worker 0 uses CPU cores [0]
+[2025-01-15 20:40:31,658][18906] Worker 2 uses CPU cores [0]
+[2025-01-15 20:40:31,725][18905] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-01-15 20:40:31,725][18905] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-01-15 20:40:31,751][18909] Worker 5 uses CPU cores [1]
+[2025-01-15 20:40:31,763][18907] Worker 3 uses CPU cores [1]
+[2025-01-15 20:40:31,770][18890] Conv encoder output size: 512
+[2025-01-15 20:40:31,771][18890] Policy head output size: 512
+[2025-01-15 20:40:31,775][18911] Worker 6 uses CPU cores [0]
+[2025-01-15 20:40:31,779][18905] Num visible devices: 1
+[2025-01-15 20:40:31,799][18890] Created Actor Critic model with architecture:
+[2025-01-15 20:40:31,800][18890] ActorCriticSharedWeights(
   (obs_normalizer): ObservationNormalizer(
     (running_mean_std): RunningMeanStdDictInPlace(
       (running_mean_std): ModuleDict(
@@ -85,1460 +62,359 @@
     (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
   )
 )
-[2025-01-15 19:56:14,105][04034] Using optimizer <class 'torch.optim.adam.Adam'>
-[2025-01-15 19:56:16,404][00513] Heartbeat connected on Batcher_0
-[2025-01-15 19:56:16,414][00513] Heartbeat connected on InferenceWorker_p0-w0
-[2025-01-15 19:56:16,429][00513] Heartbeat connected on RolloutWorker_w0
-[2025-01-15 19:56:16,431][00513] Heartbeat connected on RolloutWorker_w1
-[2025-01-15 19:56:16,436][00513] Heartbeat connected on RolloutWorker_w2
-[2025-01-15 19:56:16,439][00513] Heartbeat connected on RolloutWorker_w3
-[2025-01-15 19:56:16,443][00513] Heartbeat connected on RolloutWorker_w4
-[2025-01-15 19:56:16,447][00513] Heartbeat connected on RolloutWorker_w5
-[2025-01-15 19:56:16,452][00513] Heartbeat connected on RolloutWorker_w6
-[2025-01-15 19:56:16,456][00513] Heartbeat connected on RolloutWorker_w7
-[2025-01-15 19:56:17,398][04034] No checkpoints found
-[2025-01-15 19:56:17,398][04034] Did not load from checkpoint, starting from scratch!
-[2025-01-15 19:56:17,398][04034] Initialized policy 0 weights for model version 0
-[2025-01-15 19:56:17,402][04034] LearnerWorker_p0 finished initialization!
-[2025-01-15 19:56:17,406][04034] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 19:56:17,403][00513] Heartbeat connected on LearnerWorker_p0
-[2025-01-15 19:56:17,514][04047] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 19:56:17,515][04047] RunningMeanStd input shape: (1,)
-[2025-01-15 19:56:17,528][04047] ConvEncoder: input_channels=3
-[2025-01-15 19:56:17,634][04047] Conv encoder output size: 512
-[2025-01-15 19:56:17,634][04047] Policy head output size: 512
-[2025-01-15 19:56:17,685][00513] Inference worker 0-0 is ready!
-[2025-01-15 19:56:17,687][00513] All inference workers are ready! Signal rollout workers to start!
-[2025-01-15 19:56:17,889][04049] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,892][04055] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,889][04052] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,895][04054] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,895][04050] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,894][04048] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,898][04051] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:17,903][04053] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 19:56:18,547][04055] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:18,917][04055] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:19,481][04052] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:19,483][04051] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:19,485][04048] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:19,487][04053] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:20,597][00513] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 19:56:20,663][04049] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:20,668][04050] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:20,979][04052] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:20,984][04048] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:20,986][04051] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:22,171][04049] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:22,416][04053] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:22,625][04050] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:23,140][04048] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:23,145][04052] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:23,155][04051] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:24,023][04053] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:24,137][04055] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:24,647][04049] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:24,858][04050] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:25,491][04054] Decorrelating experience for 0 frames...
-[2025-01-15 19:56:25,494][04053] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:25,539][04055] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:25,597][00513] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 19:56:25,981][04048] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:26,219][04051] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:27,064][04054] Decorrelating experience for 32 frames...
-[2025-01-15 19:56:27,768][04049] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:27,976][04050] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:28,042][04052] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:30,001][04054] Decorrelating experience for 64 frames...
-[2025-01-15 19:56:30,597][00513] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 166.0. Samples: 1660. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 19:56:30,599][00513] Avg episode reward: [(0, '1.856')]
-[2025-01-15 19:56:30,857][04034] Signal inference workers to stop experience collection...
-[2025-01-15 19:56:30,877][04047] InferenceWorker_p0-w0: stopping experience collection
-[2025-01-15 19:56:31,117][04054] Decorrelating experience for 96 frames...
-[2025-01-15 19:56:34,115][04034] Signal inference workers to resume experience collection...
-[2025-01-15 19:56:34,117][04047] InferenceWorker_p0-w0: resuming experience collection
-[2025-01-15 19:56:35,597][00513] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 163.3. Samples: 2450. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
-[2025-01-15 19:56:35,600][00513] Avg episode reward: [(0, '3.140')]
-[2025-01-15 19:56:40,597][00513] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 287.0. Samples: 5740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:56:40,602][00513] Avg episode reward: [(0, '3.762')]
-[2025-01-15 19:56:44,291][04047] Updated weights for policy 0, policy_version 10 (0.0153)
-[2025-01-15 19:56:45,597][00513] Fps is (10 sec: 3686.4, 60 sec: 1802.3, 300 sec: 1802.3). Total num frames: 45056. Throughput: 0: 448.0. Samples: 11200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 19:56:45,603][00513] Avg episode reward: [(0, '4.391')]
-[2025-01-15 19:56:50,597][00513] Fps is (10 sec: 4505.6, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 490.7. Samples: 14722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:56:50,604][00513] Avg episode reward: [(0, '4.194')]
-[2025-01-15 19:56:53,173][04047] Updated weights for policy 0, policy_version 20 (0.0021)
-[2025-01-15 19:56:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 611.3. Samples: 21394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:56:55,602][00513] Avg episode reward: [(0, '4.173')]
-[2025-01-15 19:57:00,597][00513] Fps is (10 sec: 3276.8, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 641.3. Samples: 25650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 19:57:00,599][00513] Avg episode reward: [(0, '4.315')]
-[2025-01-15 19:57:00,605][04034] Saving new best policy, reward=4.315!
-[2025-01-15 19:57:04,414][04047] Updated weights for policy 0, policy_version 30 (0.0023)
-[2025-01-15 19:57:05,597][00513] Fps is (10 sec: 4096.0, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 646.8. Samples: 29108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 19:57:05,600][00513] Avg episode reward: [(0, '4.390')]
-[2025-01-15 19:57:05,605][04034] Saving new best policy, reward=4.390!
-[2025-01-15 19:57:10,601][00513] Fps is (10 sec: 4503.9, 60 sec: 2948.9, 300 sec: 2948.9). Total num frames: 147456. Throughput: 0: 805.1. Samples: 36234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 19:57:10,603][00513] Avg episode reward: [(0, '4.293')]
-[2025-01-15 19:57:15,180][04047] Updated weights for policy 0, policy_version 40 (0.0025)
-[2025-01-15 19:57:15,600][00513] Fps is (10 sec: 3685.5, 60 sec: 2978.8, 300 sec: 2978.8). Total num frames: 163840. Throughput: 0: 877.7. Samples: 41160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 19:57:15,602][00513] Avg episode reward: [(0, '4.302')]
-[2025-01-15 19:57:20,597][00513] Fps is (10 sec: 3687.7, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 914.3. Samples: 43592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:57:20,603][00513] Avg episode reward: [(0, '4.302')]
-[2025-01-15 19:57:24,516][04047] Updated weights for policy 0, policy_version 50 (0.0018)
-[2025-01-15 19:57:25,597][00513] Fps is (10 sec: 4506.7, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 208896. Throughput: 0: 1000.3. Samples: 50752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:57:25,601][00513] Avg episode reward: [(0, '4.389')]
-[2025-01-15 19:57:30,601][00513] Fps is (10 sec: 4094.6, 60 sec: 3754.4, 300 sec: 3218.1). Total num frames: 225280. Throughput: 0: 1013.0. Samples: 56790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:57:30,606][00513] Avg episode reward: [(0, '4.354')]
-[2025-01-15 19:57:35,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 983.3. Samples: 58972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:57:35,600][00513] Avg episode reward: [(0, '4.491')]
-[2025-01-15 19:57:35,606][04034] Saving new best policy, reward=4.491!
-[2025-01-15 19:57:35,971][04047] Updated weights for policy 0, policy_version 60 (0.0033)
-[2025-01-15 19:57:40,597][00513] Fps is (10 sec: 4097.5, 60 sec: 4027.7, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 976.1. Samples: 65318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 19:57:40,600][00513] Avg episode reward: [(0, '4.595')]
-[2025-01-15 19:57:40,610][04034] Saving new best policy, reward=4.595!
-[2025-01-15 19:57:44,657][04047] Updated weights for policy 0, policy_version 70 (0.0013)
-[2025-01-15 19:57:45,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 1035.1. Samples: 72228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 19:57:45,603][00513] Avg episode reward: [(0, '4.492')]
-[2025-01-15 19:57:50,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 1006.8. Samples: 74412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 19:57:50,601][00513] Avg episode reward: [(0, '4.507')]
-[2025-01-15 19:57:50,608][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
-[2025-01-15 19:57:55,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3406.2). Total num frames: 323584. Throughput: 0: 967.6. Samples: 79774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:57:55,605][00513] Avg episode reward: [(0, '4.598')]
-[2025-01-15 19:57:55,610][04034] Saving new best policy, reward=4.598!
-[2025-01-15 19:57:56,107][04047] Updated weights for policy 0, policy_version 80 (0.0022)
-[2025-01-15 19:58:00,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3481.6). Total num frames: 348160. Throughput: 0: 1014.6. Samples: 86816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:58:00,599][00513] Avg episode reward: [(0, '4.426')]
-[2025-01-15 19:58:05,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3471.9). Total num frames: 364544. Throughput: 0: 1027.6. Samples: 89834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 19:58:05,599][00513] Avg episode reward: [(0, '4.383')]
-[2025-01-15 19:58:06,709][04047] Updated weights for policy 0, policy_version 90 (0.0025)
-[2025-01-15 19:58:10,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3959.7, 300 sec: 3500.2). Total num frames: 385024. Throughput: 0: 971.5. Samples: 94470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:58:10,601][00513] Avg episode reward: [(0, '4.415')]
-[2025-01-15 19:58:15,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3526.1). Total num frames: 405504. Throughput: 0: 999.3. Samples: 101756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 19:58:15,604][00513] Avg episode reward: [(0, '4.459')]
-[2025-01-15 19:58:15,685][04047] Updated weights for policy 0, policy_version 100 (0.0019)
-[2025-01-15 19:58:20,597][00513] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 1031.9. Samples: 105406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:58:20,602][00513] Avg episode reward: [(0, '4.592')]
-[2025-01-15 19:58:25,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 995.8. Samples: 110128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:58:25,599][00513] Avg episode reward: [(0, '4.629')]
-[2025-01-15 19:58:25,602][04034] Saving new best policy, reward=4.629!
-[2025-01-15 19:58:26,932][04047] Updated weights for policy 0, policy_version 110 (0.0018)
-[2025-01-15 19:58:30,597][00513] Fps is (10 sec: 4095.9, 60 sec: 4028.0, 300 sec: 3591.9). Total num frames: 466944. Throughput: 0: 979.4. Samples: 116300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:58:30,605][00513] Avg episode reward: [(0, '4.364')]
-[2025-01-15 19:58:35,598][00513] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3610.5). Total num frames: 487424. Throughput: 0: 1013.3. Samples: 120012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:58:35,603][00513] Avg episode reward: [(0, '4.426')]
-[2025-01-15 19:58:35,706][04047] Updated weights for policy 0, policy_version 120 (0.0023)
-[2025-01-15 19:58:40,597][00513] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 1019.3. Samples: 125642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:58:40,602][00513] Avg episode reward: [(0, '4.539')]
-[2025-01-15 19:58:45,597][00513] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3615.8). Total num frames: 524288. Throughput: 0: 987.3. Samples: 131244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 19:58:45,600][00513] Avg episode reward: [(0, '4.736')]
-[2025-01-15 19:58:45,602][04034] Saving new best policy, reward=4.736!
-[2025-01-15 19:58:46,757][04047] Updated weights for policy 0, policy_version 130 (0.0025)
-[2025-01-15 19:58:50,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3659.1). Total num frames: 548864. Throughput: 0: 998.5. Samples: 134768. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2025-01-15 19:58:50,599][00513] Avg episode reward: [(0, '4.687')]
-[2025-01-15 19:58:55,598][00513] Fps is (10 sec: 4505.1, 60 sec: 4095.9, 300 sec: 3673.2). Total num frames: 569344. Throughput: 0: 1043.4. Samples: 141424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 19:58:55,605][00513] Avg episode reward: [(0, '4.647')]
-[2025-01-15 19:58:56,815][04047] Updated weights for policy 0, policy_version 140 (0.0021)
-[2025-01-15 19:59:00,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 979.7. Samples: 145844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 19:59:00,600][00513] Avg episode reward: [(0, '4.460')]
-[2025-01-15 19:59:05,597][00513] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 978.1. Samples: 149420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:59:05,600][00513] Avg episode reward: [(0, '4.438')]
-[2025-01-15 19:59:06,664][04047] Updated weights for policy 0, policy_version 150 (0.0013)
-[2025-01-15 19:59:10,601][00513] Fps is (10 sec: 4913.5, 60 sec: 4095.8, 300 sec: 3710.4). Total num frames: 630784. Throughput: 0: 1037.4. Samples: 156814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:59:10,603][00513] Avg episode reward: [(0, '4.576')]
-[2025-01-15 19:59:15,600][00513] Fps is (10 sec: 3685.3, 60 sec: 3959.3, 300 sec: 3674.6). Total num frames: 643072. Throughput: 0: 1007.3. Samples: 161630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:59:15,606][00513] Avg episode reward: [(0, '4.581')]
-[2025-01-15 19:59:17,782][04047] Updated weights for policy 0, policy_version 160 (0.0023)
-[2025-01-15 19:59:20,597][00513] Fps is (10 sec: 3687.6, 60 sec: 4027.7, 300 sec: 3709.2). Total num frames: 667648. Throughput: 0: 987.3. Samples: 164438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 19:59:20,601][00513] Avg episode reward: [(0, '4.522')]
-[2025-01-15 19:59:25,597][00513] Fps is (10 sec: 4916.7, 60 sec: 4164.3, 300 sec: 3741.8). Total num frames: 692224. Throughput: 0: 1023.8. Samples: 171712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:59:25,600][00513] Avg episode reward: [(0, '4.694')]
-[2025-01-15 19:59:26,072][04047] Updated weights for policy 0, policy_version 170 (0.0013)
-[2025-01-15 19:59:30,597][00513] Fps is (10 sec: 4096.1, 60 sec: 4027.8, 300 sec: 3729.5). Total num frames: 708608. Throughput: 0: 1030.0. Samples: 177592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:59:30,599][00513] Avg episode reward: [(0, '4.498')]
-[2025-01-15 19:59:35,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 999.2. Samples: 179732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:59:35,600][00513] Avg episode reward: [(0, '4.298')]
-[2025-01-15 19:59:37,397][04047] Updated weights for policy 0, policy_version 180 (0.0017)
-[2025-01-15 19:59:40,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3747.8). Total num frames: 749568. Throughput: 0: 1001.0. Samples: 186466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 19:59:40,600][00513] Avg episode reward: [(0, '4.388')]
-[2025-01-15 19:59:45,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3756.3). Total num frames: 770048. Throughput: 0: 1056.8. Samples: 193400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:59:45,600][00513] Avg episode reward: [(0, '4.527')]
-[2025-01-15 19:59:47,261][04047] Updated weights for policy 0, policy_version 190 (0.0033)
-[2025-01-15 19:59:50,598][00513] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 1024.9. Samples: 195542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 19:59:50,602][00513] Avg episode reward: [(0, '4.640')]
-[2025-01-15 19:59:50,614][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth...
-[2025-01-15 19:59:55,597][00513] Fps is (10 sec: 4095.8, 60 sec: 4027.8, 300 sec: 3772.1). Total num frames: 811008. Throughput: 0: 989.3. Samples: 201330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 19:59:55,602][00513] Avg episode reward: [(0, '4.596')]
-[2025-01-15 19:59:57,124][04047] Updated weights for policy 0, policy_version 200 (0.0014)
-[2025-01-15 20:00:00,597][00513] Fps is (10 sec: 4915.5, 60 sec: 4232.5, 300 sec: 3798.1). Total num frames: 835584. Throughput: 0: 1045.2. Samples: 208660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:00,599][00513] Avg episode reward: [(0, '4.258')]
-[2025-01-15 20:00:05,602][00513] Fps is (10 sec: 3684.9, 60 sec: 4027.4, 300 sec: 3768.2). Total num frames: 847872. Throughput: 0: 1041.7. Samples: 211320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:05,607][00513] Avg episode reward: [(0, '4.238')]
-[2025-01-15 20:00:08,210][04047] Updated weights for policy 0, policy_version 210 (0.0015)
-[2025-01-15 20:00:10,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 3775.4). Total num frames: 868352. Throughput: 0: 986.9. Samples: 216124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:00:10,600][00513] Avg episode reward: [(0, '4.185')]
-[2025-01-15 20:00:15,597][00513] Fps is (10 sec: 4507.7, 60 sec: 4164.5, 300 sec: 3799.7). Total num frames: 892928. Throughput: 0: 1017.7. Samples: 223390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:15,599][00513] Avg episode reward: [(0, '4.325')]
-[2025-01-15 20:00:16,687][04047] Updated weights for policy 0, policy_version 220 (0.0013)
-[2025-01-15 20:00:20,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3805.9). Total num frames: 913408. Throughput: 0: 1050.9. Samples: 227022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:00:20,601][00513] Avg episode reward: [(0, '4.374')]
-[2025-01-15 20:00:25,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3795.1). Total num frames: 929792. Throughput: 0: 1001.4. Samples: 231530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:00:25,603][00513] Avg episode reward: [(0, '4.335')]
-[2025-01-15 20:00:27,922][04047] Updated weights for policy 0, policy_version 230 (0.0044)
-[2025-01-15 20:00:30,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3817.5). Total num frames: 954368. Throughput: 0: 997.8. Samples: 238300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:00:30,603][00513] Avg episode reward: [(0, '4.469')]
-[2025-01-15 20:00:35,598][00513] Fps is (10 sec: 4505.1, 60 sec: 4164.2, 300 sec: 3822.9). Total num frames: 974848. Throughput: 0: 1028.7. Samples: 241832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:00:35,600][00513] Avg episode reward: [(0, '4.351')]
-[2025-01-15 20:00:37,382][04047] Updated weights for policy 0, policy_version 240 (0.0015)
-[2025-01-15 20:00:40,597][00513] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3812.4). Total num frames: 991232. Throughput: 0: 1021.3. Samples: 247288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:40,604][00513] Avg episode reward: [(0, '4.320')]
-[2025-01-15 20:00:45,597][00513] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 3817.8). Total num frames: 1011712. Throughput: 0: 984.8. Samples: 252978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:00:45,600][00513] Avg episode reward: [(0, '4.465')]
-[2025-01-15 20:00:47,679][04047] Updated weights for policy 0, policy_version 250 (0.0019)
-[2025-01-15 20:00:50,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3838.1). Total num frames: 1036288. Throughput: 0: 1006.5. Samples: 256608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:50,603][00513] Avg episode reward: [(0, '4.541')]
-[2025-01-15 20:00:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3827.9). Total num frames: 1052672. Throughput: 0: 1046.1. Samples: 263200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:00:55,602][00513] Avg episode reward: [(0, '4.586')]
-[2025-01-15 20:00:58,542][04047] Updated weights for policy 0, policy_version 260 (0.0025)
-[2025-01-15 20:01:00,598][00513] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3818.1). Total num frames: 1069056. Throughput: 0: 988.0. Samples: 267850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:01:00,608][00513] Avg episode reward: [(0, '4.457')]
-[2025-01-15 20:01:05,597][00513] Fps is (10 sec: 4096.1, 60 sec: 4096.3, 300 sec: 3837.3). Total num frames: 1093632. Throughput: 0: 985.0. Samples: 271346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
-[2025-01-15 20:01:05,600][00513] Avg episode reward: [(0, '4.576')]
-[2025-01-15 20:01:07,376][04047] Updated weights for policy 0, policy_version 270 (0.0013)
-[2025-01-15 20:01:10,597][00513] Fps is (10 sec: 4915.4, 60 sec: 4164.3, 300 sec: 3855.9). Total num frames: 1118208. Throughput: 0: 1051.0. Samples: 278824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:01:10,601][00513] Avg episode reward: [(0, '4.681')]
-[2025-01-15 20:01:15,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 999.1. Samples: 283258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:01:15,607][00513] Avg episode reward: [(0, '4.588')]
-[2025-01-15 20:01:18,678][04047] Updated weights for policy 0, policy_version 280 (0.0040)
-[2025-01-15 20:01:20,597][00513] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 987.1. Samples: 286252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:01:20,600][00513] Avg episode reward: [(0, '4.487')]
-[2025-01-15 20:01:25,597][00513] Fps is (10 sec: 4915.4, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1027.9. Samples: 293542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:01:25,599][00513] Avg episode reward: [(0, '4.566')]
-[2025-01-15 20:01:27,414][04047] Updated weights for policy 0, policy_version 290 (0.0024)
-[2025-01-15 20:01:30,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1196032. Throughput: 0: 1026.8. Samples: 299186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:01:30,604][00513] Avg episode reward: [(0, '4.830')]
-[2025-01-15 20:01:30,619][04034] Saving new best policy, reward=4.830!
-[2025-01-15 20:01:35,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1212416. Throughput: 0: 994.2. Samples: 301348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:01:35,600][00513] Avg episode reward: [(0, '4.838')]
-[2025-01-15 20:01:35,602][04034] Saving new best policy, reward=4.838!
-[2025-01-15 20:01:38,366][04047] Updated weights for policy 0, policy_version 300 (0.0027)
-[2025-01-15 20:01:40,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1236992. Throughput: 0: 1000.7. Samples: 308232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:01:40,600][00513] Avg episode reward: [(0, '4.625')]
-[2025-01-15 20:01:45,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1257472. Throughput: 0: 1046.6. Samples: 314946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:01:45,602][00513] Avg episode reward: [(0, '4.805')]
-[2025-01-15 20:01:48,942][04047] Updated weights for policy 0, policy_version 310 (0.0033)
-[2025-01-15 20:01:50,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1273856. Throughput: 0: 1017.0. Samples: 317112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:01:50,603][00513] Avg episode reward: [(0, '4.671')]
-[2025-01-15 20:01:50,613][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth...
-[2025-01-15 20:01:50,736][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
-[2025-01-15 20:01:55,597][00513] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1298432. Throughput: 0: 986.8. Samples: 323232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:01:55,600][00513] Avg episode reward: [(0, '4.689')]
-[2025-01-15 20:01:57,869][04047] Updated weights for policy 0, policy_version 320 (0.0014)
-[2025-01-15 20:02:00,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 4054.3). Total num frames: 1323008. Throughput: 0: 1050.9. Samples: 330548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:02:00,601][00513] Avg episode reward: [(0, '4.757')]
-[2025-01-15 20:02:05,597][00513] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1335296. Throughput: 0: 1038.8. Samples: 332996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:02:05,599][00513] Avg episode reward: [(0, '4.776')]
-[2025-01-15 20:02:09,032][04047] Updated weights for policy 0, policy_version 330 (0.0033)
-[2025-01-15 20:02:10,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1355776. Throughput: 0: 991.2. Samples: 338144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:02:10,604][00513] Avg episode reward: [(0, '4.913')]
-[2025-01-15 20:02:10,614][04034] Saving new best policy, reward=4.913!
-[2025-01-15 20:02:15,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1380352. Throughput: 0: 1024.7. Samples: 345298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:02:15,604][00513] Avg episode reward: [(0, '4.802')]
-[2025-01-15 20:02:17,635][04047] Updated weights for policy 0, policy_version 340 (0.0023)
-[2025-01-15 20:02:20,599][00513] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 4040.4). Total num frames: 1400832. Throughput: 0: 1053.5. Samples: 348758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:02:20,604][00513] Avg episode reward: [(0, '4.561')]
-[2025-01-15 20:02:25,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1417216. Throughput: 0: 999.3. Samples: 353200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:02:25,599][00513] Avg episode reward: [(0, '4.492')]
-[2025-01-15 20:02:28,759][04047] Updated weights for policy 0, policy_version 350 (0.0029)
-[2025-01-15 20:02:30,597][00513] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1441792. Throughput: 0: 1003.4. Samples: 360100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:02:30,604][00513] Avg episode reward: [(0, '4.555')]
-[2025-01-15 20:02:35,601][00513] Fps is (10 sec: 4504.0, 60 sec: 4164.0, 300 sec: 4054.3). Total num frames: 1462272. Throughput: 0: 1032.9. Samples: 363594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:02:35,604][00513] Avg episode reward: [(0, '4.572')]
-[2025-01-15 20:02:38,894][04047] Updated weights for policy 0, policy_version 360 (0.0025)
-[2025-01-15 20:02:40,597][00513] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1478656. Throughput: 0: 1013.6. Samples: 368846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2025-01-15 20:02:40,600][00513] Avg episode reward: [(0, '4.625')]
-[2025-01-15 20:02:45,597][00513] Fps is (10 sec: 3687.7, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1499136. Throughput: 0: 984.0. Samples: 374830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:02:45,600][00513] Avg episode reward: [(0, '4.555')]
-[2025-01-15 20:02:48,488][04047] Updated weights for policy 0, policy_version 370 (0.0019)
-[2025-01-15 20:02:50,597][00513] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 1523712. Throughput: 0: 1009.9. Samples: 378444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:02:50,604][00513] Avg episode reward: [(0, '4.594')]
-[2025-01-15 20:02:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 1540096. Throughput: 0: 1033.4. Samples: 384646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:02:55,603][00513] Avg episode reward: [(0, '4.810')]
-[2025-01-15 20:02:59,657][04047] Updated weights for policy 0, policy_version 380 (0.0029)
-[2025-01-15 20:03:00,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.4, 300 sec: 4054.3). Total num frames: 1560576. Throughput: 0: 987.0. Samples: 389712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:03:00,605][00513] Avg episode reward: [(0, '4.670')]
-[2025-01-15 20:03:05,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1581056. Throughput: 0: 988.0. Samples: 393216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:03:05,603][00513] Avg episode reward: [(0, '4.542')]
-[2025-01-15 20:03:08,312][04047] Updated weights for policy 0, policy_version 390 (0.0029)
-[2025-01-15 20:03:10,599][00513] Fps is (10 sec: 4095.4, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 1601536. Throughput: 0: 1045.8. Samples: 400262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:03:10,601][00513] Avg episode reward: [(0, '4.809')]
-[2025-01-15 20:03:15,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1617920. Throughput: 0: 991.6. Samples: 404724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:03:15,602][00513] Avg episode reward: [(0, '5.025')]
-[2025-01-15 20:03:15,604][04034] Saving new best policy, reward=5.025!
-[2025-01-15 20:03:19,604][04047] Updated weights for policy 0, policy_version 400 (0.0028)
-[2025-01-15 20:03:20,603][00513] Fps is (10 sec: 4094.2, 60 sec: 4027.4, 300 sec: 4068.1). Total num frames: 1642496. Throughput: 0: 986.0. Samples: 407968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:03:20,607][00513] Avg episode reward: [(0, '4.828')]
-[2025-01-15 20:03:25,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 1667072. Throughput: 0: 1026.8. Samples: 415054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:03:25,599][00513] Avg episode reward: [(0, '4.736')]
-[2025-01-15 20:03:30,145][04047] Updated weights for policy 0, policy_version 410 (0.0017)
-[2025-01-15 20:03:30,597][00513] Fps is (10 sec: 3688.7, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1679360. Throughput: 0: 1008.8. Samples: 420228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:03:30,601][00513] Avg episode reward: [(0, '4.679')]
-[2025-01-15 20:03:35,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 4054.3). Total num frames: 1699840. Throughput: 0: 977.2. Samples: 422418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:03:35,604][00513] Avg episode reward: [(0, '4.504')]
-[2025-01-15 20:03:39,405][04047] Updated weights for policy 0, policy_version 420 (0.0024)
-[2025-01-15 20:03:40,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1724416. Throughput: 0: 1001.7. Samples: 429724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:03:40,601][00513] Avg episode reward: [(0, '4.845')]
-[2025-01-15 20:03:45,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1740800. Throughput: 0: 1025.5. Samples: 435858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:03:45,599][00513] Avg episode reward: [(0, '4.757')]
-[2025-01-15 20:03:50,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 1757184. Throughput: 0: 996.5. Samples: 438060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:03:50,599][00513] Avg episode reward: [(0, '4.599')]
-[2025-01-15 20:03:50,609][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth...
-[2025-01-15 20:03:50,731][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth
-[2025-01-15 20:03:50,859][04047] Updated weights for policy 0, policy_version 430 (0.0031)
-[2025-01-15 20:03:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1781760. Throughput: 0: 977.8. Samples: 444260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:03:55,600][00513] Avg episode reward: [(0, '5.134')]
-[2025-01-15 20:03:55,605][04034] Saving new best policy, reward=5.134!
-[2025-01-15 20:03:59,570][04047] Updated weights for policy 0, policy_version 440 (0.0020)
-[2025-01-15 20:04:00,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 1802240. Throughput: 0: 1034.3. Samples: 451268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:04:00,599][00513] Avg episode reward: [(0, '5.048')]
-[2025-01-15 20:04:05,601][00513] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 4026.6). Total num frames: 1818624. Throughput: 0: 1010.8. Samples: 453450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:04:05,603][00513] Avg episode reward: [(0, '4.766')]
-[2025-01-15 20:04:10,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4054.4). Total num frames: 1839104. Throughput: 0: 970.2. Samples: 458712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:04:10,604][00513] Avg episode reward: [(0, '4.578')]
-[2025-01-15 20:04:10,815][04047] Updated weights for policy 0, policy_version 450 (0.0026)
-[2025-01-15 20:04:15,597][00513] Fps is (10 sec: 4507.2, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1863680. Throughput: 0: 1016.9. Samples: 465990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:04:15,601][00513] Avg episode reward: [(0, '4.503')]
-[2025-01-15 20:04:20,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 4026.6). Total num frames: 1880064. Throughput: 0: 1038.7. Samples: 469160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:04:20,603][00513] Avg episode reward: [(0, '4.483')]
-[2025-01-15 20:04:21,402][04047] Updated weights for policy 0, policy_version 460 (0.0038)
-[2025-01-15 20:04:25,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4026.6). Total num frames: 1896448. Throughput: 0: 972.4. Samples: 473482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:04:25,602][00513] Avg episode reward: [(0, '4.671')]
-[2025-01-15 20:04:30,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1921024. Throughput: 0: 992.8. Samples: 480534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:04:30,600][00513] Avg episode reward: [(0, '4.872')]
-[2025-01-15 20:04:30,941][04047] Updated weights for policy 0, policy_version 470 (0.0019)
-[2025-01-15 20:04:35,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1941504. Throughput: 0: 1021.7. Samples: 484038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:04:35,605][00513] Avg episode reward: [(0, '4.586')]
-[2025-01-15 20:04:40,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 1957888. Throughput: 0: 993.6. Samples: 488970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:04:40,603][00513] Avg episode reward: [(0, '4.418')]
-[2025-01-15 20:04:42,369][04047] Updated weights for policy 0, policy_version 480 (0.0031)
-[2025-01-15 20:04:45,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1978368. Throughput: 0: 973.0. Samples: 495052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:04:45,604][00513] Avg episode reward: [(0, '4.699')]
-[2025-01-15 20:04:50,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2002944. Throughput: 0: 1005.1. Samples: 498674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:04:50,603][00513] Avg episode reward: [(0, '4.900')]
-[2025-01-15 20:04:50,749][04047] Updated weights for policy 0, policy_version 490 (0.0016)
-[2025-01-15 20:04:55,602][00513] Fps is (10 sec: 4094.1, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 2019328. Throughput: 0: 1017.7. Samples: 504514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:04:55,611][00513] Avg episode reward: [(0, '4.756')]
-[2025-01-15 20:05:00,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2039808. Throughput: 0: 972.9. Samples: 509770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:05:00,603][00513] Avg episode reward: [(0, '4.606')]
-[2025-01-15 20:05:01,880][04047] Updated weights for policy 0, policy_version 500 (0.0040)
-[2025-01-15 20:05:05,597][00513] Fps is (10 sec: 4507.7, 60 sec: 4096.2, 300 sec: 4054.3). Total num frames: 2064384. Throughput: 0: 981.8. Samples: 513342. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2025-01-15 20:05:05,602][00513] Avg episode reward: [(0, '4.586')]
-[2025-01-15 20:05:10,597][00513] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2080768. Throughput: 0: 1034.8. Samples: 520046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:05:10,604][00513] Avg episode reward: [(0, '4.535')]
-[2025-01-15 20:05:12,530][04047] Updated weights for policy 0, policy_version 510 (0.0022)
-[2025-01-15 20:05:15,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2097152. Throughput: 0: 976.9. Samples: 524496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:05:15,601][00513] Avg episode reward: [(0, '4.416')]
-[2025-01-15 20:05:20,597][00513] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2121728. Throughput: 0: 974.6. Samples: 527894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:05:20,603][00513] Avg episode reward: [(0, '4.490')]
-[2025-01-15 20:05:22,104][04047] Updated weights for policy 0, policy_version 520 (0.0038)
-[2025-01-15 20:05:25,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2146304. Throughput: 0: 1024.1. Samples: 535056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:05:25,599][00513] Avg episode reward: [(0, '4.660')]
-[2025-01-15 20:05:30,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2158592. Throughput: 0: 1001.5. Samples: 540122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:05:30,602][00513] Avg episode reward: [(0, '4.712')]
-[2025-01-15 20:05:33,396][04047] Updated weights for policy 0, policy_version 530 (0.0026)
-[2025-01-15 20:05:35,598][00513] Fps is (10 sec: 3276.6, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 2179072. Throughput: 0: 973.9. Samples: 542500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:05:35,604][00513] Avg episode reward: [(0, '4.742')]
-[2025-01-15 20:05:40,597][00513] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2203648. Throughput: 0: 1002.3. Samples: 549612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:05:40,600][00513] Avg episode reward: [(0, '4.545')]
-[2025-01-15 20:05:42,000][04047] Updated weights for policy 0, policy_version 540 (0.0019)
-[2025-01-15 20:05:45,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2220032. Throughput: 0: 1019.1. Samples: 555628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:05:45,603][00513] Avg episode reward: [(0, '4.482')]
-[2025-01-15 20:05:50,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2236416. Throughput: 0: 987.9. Samples: 557798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:05:50,600][00513] Avg episode reward: [(0, '4.602')]
-[2025-01-15 20:05:50,657][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000547_2240512.pth...
-[2025-01-15 20:05:50,792][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth
-[2025-01-15 20:05:53,304][04047] Updated weights for policy 0, policy_version 550 (0.0024)
-[2025-01-15 20:05:55,597][00513] Fps is (10 sec: 4096.1, 60 sec: 4028.0, 300 sec: 4040.5). Total num frames: 2260992. Throughput: 0: 981.7. Samples: 564224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:05:55,599][00513] Avg episode reward: [(0, '4.606')]
-[2025-01-15 20:06:00,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2281472. Throughput: 0: 1037.4. Samples: 571180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:00,601][00513] Avg episode reward: [(0, '4.634')]
-[2025-01-15 20:06:03,680][04047] Updated weights for policy 0, policy_version 560 (0.0037)
-[2025-01-15 20:06:05,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2297856. Throughput: 0: 1008.5. Samples: 573278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:05,600][00513] Avg episode reward: [(0, '4.650')]
-[2025-01-15 20:06:10,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2318336. Throughput: 0: 969.9. Samples: 578700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:10,600][00513] Avg episode reward: [(0, '4.751')]
-[2025-01-15 20:06:13,307][04047] Updated weights for policy 0, policy_version 570 (0.0018)
-[2025-01-15 20:06:15,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2342912. Throughput: 0: 1017.5. Samples: 585910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:15,604][00513] Avg episode reward: [(0, '4.776')]
-[2025-01-15 20:06:20,597][00513] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2359296. Throughput: 0: 1032.5. Samples: 588960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:20,606][00513] Avg episode reward: [(0, '4.737')]
-[2025-01-15 20:06:24,469][04047] Updated weights for policy 0, policy_version 580 (0.0030)
-[2025-01-15 20:06:25,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2379776. Throughput: 0: 974.9. Samples: 593482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:25,604][00513] Avg episode reward: [(0, '4.740')]
-[2025-01-15 20:06:30,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2404352. Throughput: 0: 1002.5. Samples: 600738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:06:30,604][00513] Avg episode reward: [(0, '4.782')]
-[2025-01-15 20:06:33,038][04047] Updated weights for policy 0, policy_version 590 (0.0021)
-[2025-01-15 20:06:35,605][00513] Fps is (10 sec: 4092.9, 60 sec: 4027.3, 300 sec: 4012.6). Total num frames: 2420736. Throughput: 0: 1034.3. Samples: 604350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:06:35,610][00513] Avg episode reward: [(0, '4.717')]
-[2025-01-15 20:06:40,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2437120. Throughput: 0: 993.9. Samples: 608950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:06:40,601][00513] Avg episode reward: [(0, '4.792')]
-[2025-01-15 20:06:44,545][04047] Updated weights for policy 0, policy_version 600 (0.0029)
-[2025-01-15 20:06:45,597][00513] Fps is (10 sec: 4099.1, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 2461696. Throughput: 0: 977.9. Samples: 615184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:06:45,599][00513] Avg episode reward: [(0, '4.757')]
-[2025-01-15 20:06:50,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2486272. Throughput: 0: 1012.5. Samples: 618842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:06:50,599][00513] Avg episode reward: [(0, '4.826')]
-[2025-01-15 20:06:54,025][04047] Updated weights for policy 0, policy_version 610 (0.0023)
-[2025-01-15 20:06:55,608][00513] Fps is (10 sec: 4091.5, 60 sec: 4027.0, 300 sec: 3998.7). Total num frames: 2502656. Throughput: 0: 1026.0. Samples: 624880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:06:55,612][00513] Avg episode reward: [(0, '4.873')]
-[2025-01-15 20:07:00,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2519040. Throughput: 0: 979.4. Samples: 629982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:07:00,605][00513] Avg episode reward: [(0, '4.645')]
-[2025-01-15 20:07:04,248][04047] Updated weights for policy 0, policy_version 620 (0.0026)
-[2025-01-15 20:07:05,597][00513] Fps is (10 sec: 4100.5, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2543616. Throughput: 0: 990.7. Samples: 633542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:07:05,602][00513] Avg episode reward: [(0, '4.595')]
-[2025-01-15 20:07:10,600][00513] Fps is (10 sec: 4504.5, 60 sec: 4095.8, 300 sec: 4012.7). Total num frames: 2564096. Throughput: 0: 1044.8. Samples: 640502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:07:10,603][00513] Avg episode reward: [(0, '4.762')]
-[2025-01-15 20:07:15,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2576384. Throughput: 0: 981.4. Samples: 644900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:07:15,602][00513] Avg episode reward: [(0, '4.843')]
-[2025-01-15 20:07:15,630][04047] Updated weights for policy 0, policy_version 630 (0.0025)
-[2025-01-15 20:07:20,597][00513] Fps is (10 sec: 3687.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2600960. Throughput: 0: 976.0. Samples: 648262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:07:20,600][00513] Avg episode reward: [(0, '4.695')]
-[2025-01-15 20:07:24,086][04047] Updated weights for policy 0, policy_version 640 (0.0014)
-[2025-01-15 20:07:25,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2625536. Throughput: 0: 1034.2. Samples: 655488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:07:25,599][00513] Avg episode reward: [(0, '4.712')]
-[2025-01-15 20:07:30,601][00513] Fps is (10 sec: 4094.3, 60 sec: 3959.2, 300 sec: 3998.8). Total num frames: 2641920. Throughput: 0: 1010.3. Samples: 660650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:07:30,612][00513] Avg episode reward: [(0, '4.815')]
-[2025-01-15 20:07:35,290][04047] Updated weights for policy 0, policy_version 650 (0.0021)
-[2025-01-15 20:07:35,597][00513] Fps is (10 sec: 3686.4, 60 sec: 4028.2, 300 sec: 4012.7). Total num frames: 2662400. Throughput: 0: 984.8. Samples: 663160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:07:35,599][00513] Avg episode reward: [(0, '4.792')]
-[2025-01-15 20:07:40,597][00513] Fps is (10 sec: 4507.4, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2686976. Throughput: 0: 1007.5. Samples: 670208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:07:40,599][00513] Avg episode reward: [(0, '4.665')]
-[2025-01-15 20:07:44,882][04047] Updated weights for policy 0, policy_version 660 (0.0019)
-[2025-01-15 20:07:45,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2703360. Throughput: 0: 1027.7. Samples: 676230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:07:45,601][00513] Avg episode reward: [(0, '4.830')]
-[2025-01-15 20:07:50,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2719744. Throughput: 0: 996.6. Samples: 678388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:07:50,599][00513] Avg episode reward: [(0, '4.914')]
-[2025-01-15 20:07:50,615][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth...
-[2025-01-15 20:07:50,764][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000429_1757184.pth
-[2025-01-15 20:07:55,041][04047] Updated weights for policy 0, policy_version 670 (0.0014)
-[2025-01-15 20:07:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4028.5, 300 sec: 4012.7). Total num frames: 2744320. Throughput: 0: 987.8. Samples: 684952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:07:55,604][00513] Avg episode reward: [(0, '4.820')]
-[2025-01-15 20:08:00,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2764800. Throughput: 0: 1045.3. Samples: 691940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:08:00,604][00513] Avg episode reward: [(0, '4.660')]
-[2025-01-15 20:08:05,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2781184. Throughput: 0: 1020.7. Samples: 694194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:08:05,599][00513] Avg episode reward: [(0, '4.638')]
-[2025-01-15 20:08:06,406][04047] Updated weights for policy 0, policy_version 680 (0.0033)
-[2025-01-15 20:08:10,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4012.7). Total num frames: 2801664. Throughput: 0: 984.0. Samples: 699766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:08:10,599][00513] Avg episode reward: [(0, '4.439')]
-[2025-01-15 20:08:14,928][04047] Updated weights for policy 0, policy_version 690 (0.0028)
-[2025-01-15 20:08:15,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4012.8). Total num frames: 2826240. Throughput: 0: 1027.6. Samples: 706890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:08:15,604][00513] Avg episode reward: [(0, '4.666')]
-[2025-01-15 20:08:20,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2842624. Throughput: 0: 1034.1. Samples: 709696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:08:20,603][00513] Avg episode reward: [(0, '4.849')]
-[2025-01-15 20:08:25,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2863104. Throughput: 0: 978.4. Samples: 714236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:08:25,603][00513] Avg episode reward: [(0, '4.824')]
-[2025-01-15 20:08:26,373][04047] Updated weights for policy 0, policy_version 700 (0.0015)
-[2025-01-15 20:08:30,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4012.7). Total num frames: 2883584. Throughput: 0: 1002.8. Samples: 721356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:08:30,602][00513] Avg episode reward: [(0, '4.787')]
-[2025-01-15 20:08:35,597][00513] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2904064. Throughput: 0: 1035.2. Samples: 724972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:08:35,600][00513] Avg episode reward: [(0, '4.780')]
-[2025-01-15 20:08:35,846][04047] Updated weights for policy 0, policy_version 710 (0.0037)
-[2025-01-15 20:08:40,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2920448. Throughput: 0: 991.0. Samples: 729548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:08:40,600][00513] Avg episode reward: [(0, '4.519')]
-[2025-01-15 20:08:45,597][00513] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2940928. Throughput: 0: 975.3. Samples: 735830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:08:45,604][00513] Avg episode reward: [(0, '4.629')]
-[2025-01-15 20:08:46,433][04047] Updated weights for policy 0, policy_version 720 (0.0035)
-[2025-01-15 20:08:50,602][00513] Fps is (10 sec: 4503.3, 60 sec: 4095.6, 300 sec: 4012.6). Total num frames: 2965504. Throughput: 0: 1002.4. Samples: 739308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:08:50,606][00513] Avg episode reward: [(0, '4.646')]
-[2025-01-15 20:08:55,599][00513] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3998.8). Total num frames: 2981888. Throughput: 0: 1004.9. Samples: 744990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:08:55,607][00513] Avg episode reward: [(0, '4.528')]
-[2025-01-15 20:08:57,831][04047] Updated weights for policy 0, policy_version 730 (0.0029)
-[2025-01-15 20:09:00,597][00513] Fps is (10 sec: 3688.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3002368. Throughput: 0: 963.5. Samples: 750246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:09:00,603][00513] Avg episode reward: [(0, '4.592')]
-[2025-01-15 20:09:05,597][00513] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3022848. Throughput: 0: 977.6. Samples: 753690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:09:05,600][00513] Avg episode reward: [(0, '4.556')]
-[2025-01-15 20:09:06,815][04047] Updated weights for policy 0, policy_version 740 (0.0025)
-[2025-01-15 20:09:10,597][00513] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3039232. Throughput: 0: 1017.3. Samples: 760014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:09:10,600][00513] Avg episode reward: [(0, '4.575')]
-[2025-01-15 20:09:15,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 3055616. Throughput: 0: 950.6. Samples: 764134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:09:15,600][00513] Avg episode reward: [(0, '4.840')]
-[2025-01-15 20:09:18,560][04047] Updated weights for policy 0, policy_version 750 (0.0030)
-[2025-01-15 20:09:20,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3080192. Throughput: 0: 946.4. Samples: 767558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:09:20,604][00513] Avg episode reward: [(0, '4.655')]
-[2025-01-15 20:09:25,604][00513] Fps is (10 sec: 4502.6, 60 sec: 3959.0, 300 sec: 3998.7). Total num frames: 3100672. Throughput: 0: 997.3. Samples: 774434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:09:25,609][00513] Avg episode reward: [(0, '4.597')]
-[2025-01-15 20:09:29,089][04047] Updated weights for policy 0, policy_version 760 (0.0016)
-[2025-01-15 20:09:30,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3117056. Throughput: 0: 963.3. Samples: 779180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:09:30,599][00513] Avg episode reward: [(0, '4.775')]
-[2025-01-15 20:09:35,597][00513] Fps is (10 sec: 3688.8, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3137536. Throughput: 0: 944.2. Samples: 781794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:09:35,601][00513] Avg episode reward: [(0, '4.822')]
-[2025-01-15 20:09:38,891][04047] Updated weights for policy 0, policy_version 770 (0.0030)
-[2025-01-15 20:09:40,597][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3162112. Throughput: 0: 975.2. Samples: 788872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:09:40,600][00513] Avg episode reward: [(0, '4.838')]
-[2025-01-15 20:09:45,601][00513] Fps is (10 sec: 4094.6, 60 sec: 3959.2, 300 sec: 3984.9). Total num frames: 3178496. Throughput: 0: 989.7. Samples: 794786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:09:45,603][00513] Avg episode reward: [(0, '4.730')]
-[2025-01-15 20:09:50,017][04047] Updated weights for policy 0, policy_version 780 (0.0026)
-[2025-01-15 20:09:50,597][00513] Fps is (10 sec: 3276.9, 60 sec: 3823.3, 300 sec: 3985.0). Total num frames: 3194880. Throughput: 0: 961.7. Samples: 796966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:09:50,605][00513] Avg episode reward: [(0, '4.766')]
-[2025-01-15 20:09:50,620][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000780_3194880.pth...
-[2025-01-15 20:09:50,750][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000547_2240512.pth
-[2025-01-15 20:09:55,597][00513] Fps is (10 sec: 4097.5, 60 sec: 3959.6, 300 sec: 3998.8). Total num frames: 3219456. Throughput: 0: 965.2. Samples: 803446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:09:55,602][00513] Avg episode reward: [(0, '4.702')]
-[2025-01-15 20:09:59,044][04047] Updated weights for policy 0, policy_version 790 (0.0021)
-[2025-01-15 20:10:00,597][00513] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3239936. Throughput: 0: 1020.9. Samples: 810076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:10:00,599][00513] Avg episode reward: [(0, '4.842')]
-[2025-01-15 20:10:05,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3252224. Throughput: 0: 991.8. Samples: 812188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:05,600][00513] Avg episode reward: [(0, '4.700')]
-[2025-01-15 20:10:10,294][04047] Updated weights for policy 0, policy_version 800 (0.0041)
-[2025-01-15 20:10:10,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3276800. Throughput: 0: 963.3. Samples: 817778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:10,605][00513] Avg episode reward: [(0, '4.569')]
-[2025-01-15 20:10:15,597][00513] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 3301376. Throughput: 0: 1015.1. Samples: 824858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:10:15,599][00513] Avg episode reward: [(0, '4.664')]
-[2025-01-15 20:10:20,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3313664. Throughput: 0: 1018.7. Samples: 827634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:10:20,601][00513] Avg episode reward: [(0, '4.872')]
-[2025-01-15 20:10:20,735][04047] Updated weights for policy 0, policy_version 810 (0.0027)
-[2025-01-15 20:10:25,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3984.9). Total num frames: 3334144. Throughput: 0: 961.2. Samples: 832124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:25,603][00513] Avg episode reward: [(0, '4.819')]
-[2025-01-15 20:10:30,546][04047] Updated weights for policy 0, policy_version 820 (0.0013)
-[2025-01-15 20:10:30,597][00513] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3358720. Throughput: 0: 985.1. Samples: 839112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:30,605][00513] Avg episode reward: [(0, '4.499')]
-[2025-01-15 20:10:35,598][00513] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 3375104. Throughput: 0: 1014.3. Samples: 842610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:10:35,603][00513] Avg episode reward: [(0, '4.527')]
-[2025-01-15 20:10:40,598][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3391488. Throughput: 0: 969.9. Samples: 847094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:10:40,600][00513] Avg episode reward: [(0, '4.728')]
-[2025-01-15 20:10:42,244][04047] Updated weights for policy 0, policy_version 830 (0.0028)
-[2025-01-15 20:10:45,597][00513] Fps is (10 sec: 3686.9, 60 sec: 3891.4, 300 sec: 3984.9). Total num frames: 3411968. Throughput: 0: 961.2. Samples: 853328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:10:45,604][00513] Avg episode reward: [(0, '4.881')]
-[2025-01-15 20:10:50,597][00513] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3436544. Throughput: 0: 991.4. Samples: 856800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:50,602][00513] Avg episode reward: [(0, '4.882')]
-[2025-01-15 20:10:50,939][04047] Updated weights for policy 0, policy_version 840 (0.0034)
-[2025-01-15 20:10:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3452928. Throughput: 0: 995.8. Samples: 862590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:10:55,602][00513] Avg episode reward: [(0, '4.697')]
-[2025-01-15 20:11:00,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3469312. Throughput: 0: 949.6. Samples: 867588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:11:00,600][00513] Avg episode reward: [(0, '4.746')]
-[2025-01-15 20:11:02,644][04047] Updated weights for policy 0, policy_version 850 (0.0029)
-[2025-01-15 20:11:05,597][00513] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3493888. Throughput: 0: 965.6. Samples: 871088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:11:05,604][00513] Avg episode reward: [(0, '4.409')]
-[2025-01-15 20:11:10,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3510272. Throughput: 0: 1010.5. Samples: 877598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:11:10,599][00513] Avg episode reward: [(0, '4.308')]
-[2025-01-15 20:11:13,939][04047] Updated weights for policy 0, policy_version 860 (0.0023)
-[2025-01-15 20:11:15,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 3526656. Throughput: 0: 949.3. Samples: 881828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:11:15,600][00513] Avg episode reward: [(0, '4.448')]
-[2025-01-15 20:11:20,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3551232. Throughput: 0: 948.1. Samples: 885274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:11:20,603][00513] Avg episode reward: [(0, '4.608')]
-[2025-01-15 20:11:23,035][04047] Updated weights for policy 0, policy_version 870 (0.0028)
-[2025-01-15 20:11:25,597][00513] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3571712. Throughput: 0: 1002.4. Samples: 892202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:11:25,602][00513] Avg episode reward: [(0, '4.514')]
-[2025-01-15 20:11:30,600][00513] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3957.2). Total num frames: 3588096. Throughput: 0: 971.9. Samples: 897066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:11:30,602][00513] Avg episode reward: [(0, '4.469')]
-[2025-01-15 20:11:34,462][04047] Updated weights for policy 0, policy_version 880 (0.0031)
-[2025-01-15 20:11:35,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3971.0). Total num frames: 3608576. Throughput: 0: 948.9. Samples: 899502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:11:35,600][00513] Avg episode reward: [(0, '4.516')]
-[2025-01-15 20:11:40,597][00513] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3629056. Throughput: 0: 972.4. Samples: 906346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:11:40,603][00513] Avg episode reward: [(0, '4.613')]
-[2025-01-15 20:11:44,228][04047] Updated weights for policy 0, policy_version 890 (0.0029)
-[2025-01-15 20:11:45,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3645440. Throughput: 0: 991.8. Samples: 912218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:11:45,605][00513] Avg episode reward: [(0, '4.515')]
-[2025-01-15 20:11:50,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3943.4). Total num frames: 3665920. Throughput: 0: 961.5. Samples: 914356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:11:50,599][00513] Avg episode reward: [(0, '4.450')]
-[2025-01-15 20:11:50,611][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000895_3665920.pth...
-[2025-01-15 20:11:50,757][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth
-[2025-01-15 20:11:55,220][04047] Updated weights for policy 0, policy_version 900 (0.0026)
-[2025-01-15 20:11:55,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3686400. Throughput: 0: 954.1. Samples: 920534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:11:55,604][00513] Avg episode reward: [(0, '4.536')]
-[2025-01-15 20:12:00,600][00513] Fps is (10 sec: 4094.9, 60 sec: 3959.3, 300 sec: 3943.2). Total num frames: 3706880. Throughput: 0: 1007.4. Samples: 927162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:12:00,602][00513] Avg episode reward: [(0, '4.473')]
-[2025-01-15 20:12:05,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3719168. Throughput: 0: 976.5. Samples: 929216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:12:05,600][00513] Avg episode reward: [(0, '4.543')]
-[2025-01-15 20:12:06,900][04047] Updated weights for policy 0, policy_version 910 (0.0018)
-[2025-01-15 20:12:10,597][00513] Fps is (10 sec: 3277.6, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3739648. Throughput: 0: 933.5. Samples: 934208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:12:10,599][00513] Avg episode reward: [(0, '4.383')]
-[2025-01-15 20:12:15,597][00513] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3764224. Throughput: 0: 977.9. Samples: 941070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:12:15,599][00513] Avg episode reward: [(0, '4.318')]
-[2025-01-15 20:12:16,040][04047] Updated weights for policy 0, policy_version 920 (0.0015)
-[2025-01-15 20:12:20,598][00513] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3780608. Throughput: 0: 987.8. Samples: 943954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:12:20,604][00513] Avg episode reward: [(0, '4.304')]
-[2025-01-15 20:12:25,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.6). Total num frames: 3796992. Throughput: 0: 930.0. Samples: 948196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:12:25,605][00513] Avg episode reward: [(0, '4.508')]
-[2025-01-15 20:12:27,877][04047] Updated weights for policy 0, policy_version 930 (0.0014)
-[2025-01-15 20:12:30,597][00513] Fps is (10 sec: 4096.2, 60 sec: 3891.4, 300 sec: 3929.4). Total num frames: 3821568. Throughput: 0: 951.8. Samples: 955050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:12:30,605][00513] Avg episode reward: [(0, '4.566')]
-[2025-01-15 20:12:35,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3837952. Throughput: 0: 981.6. Samples: 958530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:12:35,599][00513] Avg episode reward: [(0, '4.622')]
-[2025-01-15 20:12:38,739][04047] Updated weights for policy 0, policy_version 940 (0.0036)
-[2025-01-15 20:12:40,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3854336. Throughput: 0: 948.8. Samples: 963230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:12:40,600][00513] Avg episode reward: [(0, '4.725')]
-[2025-01-15 20:12:45,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3878912. Throughput: 0: 934.1. Samples: 969192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:12:45,599][00513] Avg episode reward: [(0, '4.620')]
-[2025-01-15 20:12:48,161][04047] Updated weights for policy 0, policy_version 950 (0.0016)
-[2025-01-15 20:12:50,597][00513] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3899392. Throughput: 0: 968.0. Samples: 972778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:12:50,600][00513] Avg episode reward: [(0, '4.676')]
-[2025-01-15 20:12:55,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3915776. Throughput: 0: 989.5. Samples: 978736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
-[2025-01-15 20:12:55,602][00513] Avg episode reward: [(0, '4.784')]
-[2025-01-15 20:12:59,666][04047] Updated weights for policy 0, policy_version 960 (0.0020)
-[2025-01-15 20:13:00,597][00513] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3915.5). Total num frames: 3936256. Throughput: 0: 947.7. Samples: 983716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:13:00,599][00513] Avg episode reward: [(0, '4.941')]
-[2025-01-15 20:13:05,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3956736. Throughput: 0: 962.8. Samples: 987278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:13:05,600][00513] Avg episode reward: [(0, '4.631')]
-[2025-01-15 20:13:08,504][04047] Updated weights for policy 0, policy_version 970 (0.0025)
-[2025-01-15 20:13:10,597][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3977216. Throughput: 0: 1015.6. Samples: 993896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:13:10,599][00513] Avg episode reward: [(0, '4.612')]
-[2025-01-15 20:13:15,597][00513] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3989504. Throughput: 0: 958.9. Samples: 998200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:13:15,603][00513] Avg episode reward: [(0, '4.627')]
-[2025-01-15 20:13:18,263][00513] Component Batcher_0 stopped!
-[2025-01-15 20:13:18,263][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-01-15 20:13:18,263][04034] Stopping Batcher_0...
-[2025-01-15 20:13:18,274][04034] Loop batcher_evt_loop terminating...
-[2025-01-15 20:13:18,315][04047] Weights refcount: 2 0
-[2025-01-15 20:13:18,320][04047] Stopping InferenceWorker_p0-w0...
-[2025-01-15 20:13:18,321][04047] Loop inference_proc0-0_evt_loop terminating...
-[2025-01-15 20:13:18,320][00513] Component InferenceWorker_p0-w0 stopped!
-[2025-01-15 20:13:18,397][04034] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000780_3194880.pth
-[2025-01-15 20:13:18,424][04034] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-01-15 20:13:18,617][04034] Stopping LearnerWorker_p0...
-[2025-01-15 20:13:18,619][04034] Loop learner_proc0_evt_loop terminating...
-[2025-01-15 20:13:18,615][00513] Component RolloutWorker_w4 stopped!
-[2025-01-15 20:13:18,626][00513] Component LearnerWorker_p0 stopped!
-[2025-01-15 20:13:18,631][04050] Stopping RolloutWorker_w4...
-[2025-01-15 20:13:18,632][04050] Loop rollout_proc4_evt_loop terminating...
-[2025-01-15 20:13:18,644][04053] Stopping RolloutWorker_w3...
-[2025-01-15 20:13:18,644][04053] Loop rollout_proc3_evt_loop terminating...
-[2025-01-15 20:13:18,644][00513] Component RolloutWorker_w3 stopped!
-[2025-01-15 20:13:18,658][04051] Stopping RolloutWorker_w5...
-[2025-01-15 20:13:18,658][00513] Component RolloutWorker_w2 stopped!
-[2025-01-15 20:13:18,663][00513] Component RolloutWorker_w5 stopped!
-[2025-01-15 20:13:18,667][04049] Stopping RolloutWorker_w2...
-[2025-01-15 20:13:18,659][04051] Loop rollout_proc5_evt_loop terminating...
-[2025-01-15 20:13:18,679][00513] Component RolloutWorker_w7 stopped!
-[2025-01-15 20:13:18,667][04049] Loop rollout_proc2_evt_loop terminating...
-[2025-01-15 20:13:18,679][04052] Stopping RolloutWorker_w7...
-[2025-01-15 20:13:18,684][04052] Loop rollout_proc7_evt_loop terminating...
-[2025-01-15 20:13:18,701][04048] Stopping RolloutWorker_w1...
-[2025-01-15 20:13:18,695][00513] Component RolloutWorker_w0 stopped!
-[2025-01-15 20:13:18,702][00513] Component RolloutWorker_w1 stopped!
-[2025-01-15 20:13:18,705][04055] Stopping RolloutWorker_w0...
-[2025-01-15 20:13:18,706][04055] Loop rollout_proc0_evt_loop terminating...
-[2025-01-15 20:13:18,702][04048] Loop rollout_proc1_evt_loop terminating...
-[2025-01-15 20:13:18,732][00513] Component RolloutWorker_w6 stopped!
-[2025-01-15 20:13:18,736][00513] Waiting for process learner_proc0 to stop...
-[2025-01-15 20:13:18,741][04054] Stopping RolloutWorker_w6...
-[2025-01-15 20:13:18,743][04054] Loop rollout_proc6_evt_loop terminating...
-[2025-01-15 20:13:20,283][00513] Waiting for process inference_proc0-0 to join...
-[2025-01-15 20:13:20,290][00513] Waiting for process rollout_proc0 to join...
-[2025-01-15 20:13:22,129][00513] Waiting for process rollout_proc1 to join...
-[2025-01-15 20:13:22,132][00513] Waiting for process rollout_proc2 to join...
-[2025-01-15 20:13:22,138][00513] Waiting for process rollout_proc3 to join...
-[2025-01-15 20:13:22,141][00513] Waiting for process rollout_proc4 to join...
-[2025-01-15 20:13:22,144][00513] Waiting for process rollout_proc5 to join...
-[2025-01-15 20:13:22,148][00513] Waiting for process rollout_proc6 to join...
-[2025-01-15 20:13:22,152][00513] Waiting for process rollout_proc7 to join...
-[2025-01-15 20:13:22,155][00513] Batcher 0 profile tree view:
-batching: 25.6401, releasing_batches: 0.0269
-[2025-01-15 20:13:22,158][00513] InferenceWorker_p0-w0 profile tree view:
-wait_policy: 0.0001
-  wait_policy_total: 403.9082
-update_model: 8.4210
-  weight_update: 0.0018
-one_step: 0.0027
-  handle_policy_step: 563.9452
-    deserialize: 14.0492, stack: 2.9447, obs_to_device_normalize: 121.0485, forward: 282.4296, send_messages: 27.1289
-    prepare_outputs: 87.4528
-      to_cpu: 53.6402
-[2025-01-15 20:13:22,159][00513] Learner 0 profile tree view:
-misc: 0.0054, prepare_batch: 13.4539
-train: 74.6533
-  epoch_init: 0.0055, minibatch_init: 0.0089, losses_postprocess: 0.6264, kl_divergence: 0.6695, after_optimizer: 33.7855
-  calculate_losses: 27.0727
-    losses_init: 0.0136, forward_head: 1.3775, bptt_initial: 18.1180, tail: 1.1195, advantages_returns: 0.3496, losses: 3.9889
-    bptt: 1.8540
-      bptt_forward_core: 1.7564
-  update: 11.8380
-    clip: 0.8982
-[2025-01-15 20:13:22,160][00513] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.3145, enqueue_policy_requests: 90.7809, env_step: 803.9700, overhead: 12.0578, complete_rollouts: 6.3028
-save_policy_outputs: 20.2678
-  split_output_tensors: 8.3586
-[2025-01-15 20:13:22,164][00513] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.3205, enqueue_policy_requests: 93.6584, env_step: 799.2973, overhead: 11.9665, complete_rollouts: 6.9354
-save_policy_outputs: 20.6605
-  split_output_tensors: 8.3095
-[2025-01-15 20:13:22,166][00513] Loop Runner_EvtLoop terminating...
-[2025-01-15 20:13:22,167][00513] Runner profile tree view:
-main_loop: 1045.7117
-[2025-01-15 20:13:22,168][00513] Collected {0: 4005888}, FPS: 3830.8
-[2025-01-15 20:14:50,945][00513] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2025-01-15 20:14:50,946][00513] Overriding arg 'num_workers' with value 1 passed from command line
-[2025-01-15 20:14:50,949][00513] Adding new argument 'no_render'=True that is not in the saved config file!
-[2025-01-15 20:14:50,953][00513] Adding new argument 'save_video'=True that is not in the saved config file!
-[2025-01-15 20:14:50,954][00513] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2025-01-15 20:14:50,955][00513] Adding new argument 'video_name'=None that is not in the saved config file!
-[2025-01-15 20:14:50,957][00513] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
-[2025-01-15 20:14:50,958][00513] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2025-01-15 20:14:50,959][00513] Adding new argument 'push_to_hub'=False that is not in the saved config file!
-[2025-01-15 20:14:50,961][00513] Adding new argument 'hf_repository'=None that is not in the saved config file!
-[2025-01-15 20:14:50,962][00513] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2025-01-15 20:14:50,963][00513] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2025-01-15 20:14:50,964][00513] Adding new argument 'train_script'=None that is not in the saved config file!
-[2025-01-15 20:14:50,966][00513] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2025-01-15 20:14:50,967][00513] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2025-01-15 20:14:51,002][00513] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:14:51,006][00513] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 20:14:51,008][00513] RunningMeanStd input shape: (1,)
-[2025-01-15 20:14:51,023][00513] ConvEncoder: input_channels=3
-[2025-01-15 20:14:51,130][00513] Conv encoder output size: 512
-[2025-01-15 20:14:51,132][00513] Policy head output size: 512
-[2025-01-15 20:14:51,320][00513] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-01-15 20:14:52,134][00513] Num frames 100...
-[2025-01-15 20:14:52,258][00513] Num frames 200...
-[2025-01-15 20:14:52,393][00513] Num frames 300...
-[2025-01-15 20:14:52,523][00513] Num frames 400...
-[2025-01-15 20:14:52,640][00513] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
-[2025-01-15 20:14:52,641][00513] Avg episode reward: 5.480, avg true_objective: 4.480
-[2025-01-15 20:14:52,707][00513] Num frames 500...
-[2025-01-15 20:14:52,829][00513] Num frames 600...
-[2025-01-15 20:14:52,956][00513] Num frames 700...
-[2025-01-15 20:14:53,079][00513] Num frames 800...
-[2025-01-15 20:14:53,173][00513] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
-[2025-01-15 20:14:53,174][00513] Avg episode reward: 4.660, avg true_objective: 4.160
-[2025-01-15 20:14:53,262][00513] Num frames 900...
-[2025-01-15 20:14:53,390][00513] Num frames 1000...
-[2025-01-15 20:14:53,521][00513] Num frames 1100...
-[2025-01-15 20:14:53,642][00513] Num frames 1200...
-[2025-01-15 20:14:53,717][00513] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
-[2025-01-15 20:14:53,718][00513] Avg episode reward: 4.387, avg true_objective: 4.053
-[2025-01-15 20:14:53,823][00513] Num frames 1300...
-[2025-01-15 20:14:53,942][00513] Num frames 1400...
-[2025-01-15 20:14:54,064][00513] Num frames 1500...
-[2025-01-15 20:14:54,193][00513] Num frames 1600...
-[2025-01-15 20:14:54,246][00513] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000
-[2025-01-15 20:14:54,247][00513] Avg episode reward: 4.250, avg true_objective: 4.000
-[2025-01-15 20:14:54,373][00513] Num frames 1700...
-[2025-01-15 20:14:54,508][00513] Num frames 1800...
-[2025-01-15 20:14:54,634][00513] Num frames 1900...
-[2025-01-15 20:14:54,791][00513] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968
-[2025-01-15 20:14:54,792][00513] Avg episode reward: 4.168, avg true_objective: 3.968
-[2025-01-15 20:14:54,815][00513] Num frames 2000...
-[2025-01-15 20:14:54,936][00513] Num frames 2100...
-[2025-01-15 20:14:55,058][00513] Num frames 2200...
-[2025-01-15 20:14:55,182][00513] Num frames 2300...
-[2025-01-15 20:14:55,318][00513] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947
-[2025-01-15 20:14:55,319][00513] Avg episode reward: 4.113, avg true_objective: 3.947
-[2025-01-15 20:14:55,364][00513] Num frames 2400...
-[2025-01-15 20:14:55,500][00513] Num frames 2500...
-[2025-01-15 20:14:55,626][00513] Num frames 2600...
-[2025-01-15 20:14:55,748][00513] Num frames 2700...
-[2025-01-15 20:14:55,867][00513] Num frames 2800...
-[2025-01-15 20:14:55,942][00513] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023
-[2025-01-15 20:14:55,943][00513] Avg episode reward: 4.309, avg true_objective: 4.023
-[2025-01-15 20:14:56,046][00513] Num frames 2900...
-[2025-01-15 20:14:56,167][00513] Num frames 3000...
-[2025-01-15 20:14:56,290][00513] Num frames 3100...
-[2025-01-15 20:14:56,417][00513] Num frames 3200...
-[2025-01-15 20:14:56,469][00513] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000
-[2025-01-15 20:14:56,470][00513] Avg episode reward: 4.250, avg true_objective: 4.000
-[2025-01-15 20:14:56,595][00513] Num frames 3300...
-[2025-01-15 20:14:56,715][00513] Num frames 3400...
-[2025-01-15 20:14:56,836][00513] Num frames 3500...
-[2025-01-15 20:14:56,991][00513] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982
-[2025-01-15 20:14:56,992][00513] Avg episode reward: 4.204, avg true_objective: 3.982
-[2025-01-15 20:14:57,014][00513] Num frames 3600...
-[2025-01-15 20:14:57,143][00513] Num frames 3700...
-[2025-01-15 20:14:57,282][00513] Num frames 3800...
-[2025-01-15 20:14:57,458][00513] Num frames 3900...
-[2025-01-15 20:14:57,640][00513] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968
-[2025-01-15 20:14:57,643][00513] Avg episode reward: 4.168, avg true_objective: 3.968
-[2025-01-15 20:15:16,094][00513] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
-[2025-01-15 20:16:16,957][00513] Environment doom_basic already registered, overwriting...
-[2025-01-15 20:16:16,960][00513] Environment doom_two_colors_easy already registered, overwriting...
-[2025-01-15 20:16:16,962][00513] Environment doom_two_colors_hard already registered, overwriting...
-[2025-01-15 20:16:16,963][00513] Environment doom_dm already registered, overwriting...
-[2025-01-15 20:16:16,965][00513] Environment doom_dwango5 already registered, overwriting...
-[2025-01-15 20:16:16,966][00513] Environment doom_my_way_home_flat_actions already registered, overwriting...
-[2025-01-15 20:16:16,968][00513] Environment doom_defend_the_center_flat_actions already registered, overwriting...
-[2025-01-15 20:16:16,969][00513] Environment doom_my_way_home already registered, overwriting...
-[2025-01-15 20:16:16,970][00513] Environment doom_deadly_corridor already registered, overwriting...
-[2025-01-15 20:16:16,971][00513] Environment doom_defend_the_center already registered, overwriting...
-[2025-01-15 20:16:16,972][00513] Environment doom_defend_the_line already registered, overwriting...
-[2025-01-15 20:16:16,974][00513] Environment doom_health_gathering already registered, overwriting...
-[2025-01-15 20:16:16,975][00513] Environment doom_health_gathering_supreme already registered, overwriting...
-[2025-01-15 20:16:16,976][00513] Environment doom_battle already registered, overwriting...
-[2025-01-15 20:16:16,977][00513] Environment doom_battle2 already registered, overwriting...
-[2025-01-15 20:16:16,978][00513] Environment doom_duel_bots already registered, overwriting...
-[2025-01-15 20:16:16,979][00513] Environment doom_deathmatch_bots already registered, overwriting...
-[2025-01-15 20:16:16,980][00513] Environment doom_duel already registered, overwriting...
-[2025-01-15 20:16:16,981][00513] Environment doom_deathmatch_full already registered, overwriting...
-[2025-01-15 20:16:16,982][00513] Environment doom_benchmark already registered, overwriting...
-[2025-01-15 20:16:16,984][00513] register_encoder_factory: <function make_vizdoom_encoder at 0x7e628be370a0>
-[2025-01-15 20:16:17,009][00513] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2025-01-15 20:16:17,010][00513] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line
-[2025-01-15 20:16:17,016][00513] Experiment dir /content/train_dir/default_experiment already exists!
-[2025-01-15 20:16:17,017][00513] Resuming existing experiment from /content/train_dir/default_experiment...
-[2025-01-15 20:16:17,018][00513] Weights and Biases integration disabled
-[2025-01-15 20:16:17,023][00513] Environment var CUDA_VISIBLE_DEVICES is 0
-
-[2025-01-15 20:16:19,245][00513] Starting experiment with the following configuration:
-help=False
-algo=APPO
-env=doom_health_gathering_supreme
-experiment=default_experiment
-train_dir=/content/train_dir
-restart_behavior=resume
-device=gpu
-seed=None
-num_policies=1
-async_rl=True
-serial_mode=False
-batched_sampling=False
-num_batches_to_accumulate=2
-worker_num_splits=2
-policy_workers_per_policy=1
-max_policy_lag=1000
-num_workers=8
-num_envs_per_worker=4
-batch_size=1024
-num_batches_per_epoch=1
-num_epochs=1
-rollout=32
-recurrence=32
-shuffle_minibatches=False
-gamma=0.99
-reward_scale=1.0
-reward_clip=1000.0
-value_bootstrap=False
-normalize_returns=True
-exploration_loss_coeff=0.001
-value_loss_coeff=0.5
-kl_loss_coeff=0.0
-exploration_loss=symmetric_kl
-gae_lambda=0.95
-ppo_clip_ratio=0.1
-ppo_clip_value=0.2
-with_vtrace=False
-vtrace_rho=1.0
-vtrace_c=1.0
-optimizer=adam
-adam_eps=1e-06
-adam_beta1=0.9
-adam_beta2=0.999
-max_grad_norm=4.0
-learning_rate=0.0001
-lr_schedule=constant
-lr_schedule_kl_threshold=0.008
-lr_adaptive_min=1e-06
-lr_adaptive_max=0.01
-obs_subtract_mean=0.0
-obs_scale=255.0
-normalize_input=True
-normalize_input_keys=None
-decorrelate_experience_max_seconds=0
-decorrelate_envs_on_one_worker=True
-actor_worker_gpus=[]
-set_workers_cpu_affinity=True
-force_envs_single_thread=False
-default_niceness=0
-log_to_file=True
-experiment_summaries_interval=10
-flush_summaries_interval=30
-stats_avg=100
-summaries_use_frameskip=True
-heartbeat_interval=20
-heartbeat_reporting_interval=600
-train_for_env_steps=5000000
-train_for_seconds=10000000000
-save_every_sec=120
-keep_checkpoints=2
-load_checkpoint_kind=latest
-save_milestones_sec=-1
-save_best_every_sec=5
-save_best_metric=reward
-save_best_after=100000
-benchmark=False
-encoder_mlp_layers=[512, 512]
-encoder_conv_architecture=convnet_simple
-encoder_conv_mlp_layers=[512]
-use_rnn=True
-rnn_size=512
-rnn_type=gru
-rnn_num_layers=1
-decoder_mlp_layers=[]
-nonlinearity=elu
-policy_initialization=orthogonal
-policy_init_gain=1.0
-actor_critic_share_weights=True
-adaptive_stddev=True
-continuous_tanh_scale=0.0
-initial_stddev=1.0
-use_env_info_cache=False
-env_gpu_actions=False
-env_gpu_observations=True
-env_frameskip=4
-env_framestack=1
-pixel_format=CHW
-use_record_episode_statistics=False
-with_wandb=False
-wandb_user=None
-wandb_project=sample_factory
-wandb_group=None
-wandb_job_type=SF
-wandb_tags=[]
-with_pbt=False
-pbt_mix_policies_in_one_env=True
-pbt_period_env_steps=5000000
-pbt_start_mutation=20000000
-pbt_replace_fraction=0.3
-pbt_mutation_rate=0.15
-pbt_replace_reward_gap=0.1
-pbt_replace_reward_gap_absolute=1e-06
-pbt_optimize_gamma=False
-pbt_target_objective=true_objective
-pbt_perturb_min=1.1
-pbt_perturb_max=1.5
-num_agents=-1
-num_humans=0
-num_bots=-1
-start_bot_difficulty=None
-timelimit=None
-res_w=128
-res_h=72
-wide_aspect_ratio=False
-eval_env_frameskip=1
-fps=35
-command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
-cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
-git_hash=unknown
-git_repo_name=not a git repository
-[2025-01-15 20:16:19,247][00513] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2025-01-15 20:16:19,251][00513] Rollout worker 0 uses device cpu
-[2025-01-15 20:16:19,253][00513] Rollout worker 1 uses device cpu
-[2025-01-15 20:16:19,254][00513] Rollout worker 2 uses device cpu
-[2025-01-15 20:16:19,256][00513] Rollout worker 3 uses device cpu
-[2025-01-15 20:16:19,258][00513] Rollout worker 4 uses device cpu
-[2025-01-15 20:16:19,259][00513] Rollout worker 5 uses device cpu
-[2025-01-15 20:16:19,261][00513] Rollout worker 6 uses device cpu
-[2025-01-15 20:16:19,262][00513] Rollout worker 7 uses device cpu
-[2025-01-15 20:16:19,337][00513] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 20:16:19,338][00513] InferenceWorker_p0-w0: min num requests: 2
-[2025-01-15 20:16:19,379][00513] Starting all processes...
-[2025-01-15 20:16:19,380][00513] Starting process learner_proc0
-[2025-01-15 20:16:19,429][00513] Starting all processes...
-[2025-01-15 20:16:19,435][00513] Starting process inference_proc0-0
-[2025-01-15 20:16:19,436][00513] Starting process rollout_proc0
-[2025-01-15 20:16:19,436][00513] Starting process rollout_proc1
-[2025-01-15 20:16:19,436][00513] Starting process rollout_proc2
-[2025-01-15 20:16:19,436][00513] Starting process rollout_proc3
-[2025-01-15 20:16:19,436][00513] Starting process rollout_proc4
-[2025-01-15 20:16:19,437][00513] Starting process rollout_proc5
-[2025-01-15 20:16:19,438][00513] Starting process rollout_proc6
-[2025-01-15 20:16:19,438][00513] Starting process rollout_proc7
-[2025-01-15 20:16:35,577][12487] Worker 3 uses CPU cores [1]
-[2025-01-15 20:16:35,666][12470] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 20:16:35,667][12470] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2025-01-15 20:16:35,787][12470] Num visible devices: 1
-[2025-01-15 20:16:35,816][12470] Starting seed is not provided
-[2025-01-15 20:16:35,817][12470] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 20:16:35,817][12470] Initializing actor-critic model on device cuda:0
-[2025-01-15 20:16:35,818][12470] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 20:16:35,820][12470] RunningMeanStd input shape: (1,)
-[2025-01-15 20:16:35,983][12470] ConvEncoder: input_channels=3
-[2025-01-15 20:16:36,180][12489] Worker 4 uses CPU cores [0]
-[2025-01-15 20:16:36,365][12485] Worker 1 uses CPU cores [1]
-[2025-01-15 20:16:36,527][12483] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 20:16:36,530][12483] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2025-01-15 20:16:36,614][12486] Worker 2 uses CPU cores [0]
-[2025-01-15 20:16:36,621][12491] Worker 7 uses CPU cores [1]
-[2025-01-15 20:16:36,641][12483] Num visible devices: 1
-[2025-01-15 20:16:36,647][12490] Worker 6 uses CPU cores [0]
-[2025-01-15 20:16:36,666][12488] Worker 5 uses CPU cores [1]
-[2025-01-15 20:16:36,676][12484] Worker 0 uses CPU cores [0]
-[2025-01-15 20:16:36,753][12470] Conv encoder output size: 512
-[2025-01-15 20:16:36,753][12470] Policy head output size: 512
-[2025-01-15 20:16:36,779][12470] Created Actor Critic model with architecture:
-[2025-01-15 20:16:36,779][12470] ActorCriticSharedWeights(
-  (obs_normalizer): ObservationNormalizer(
-    (running_mean_std): RunningMeanStdDictInPlace(
-      (running_mean_std): ModuleDict(
-        (obs): RunningMeanStdInPlace()
-      )
-    )
-  )
-  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
-  (encoder): VizdoomEncoder(
-    (basic_encoder): ConvEncoder(
-      (enc): RecursiveScriptModule(
-        original_name=ConvEncoderImpl
-        (conv_head): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Conv2d)
-          (1): RecursiveScriptModule(original_name=ELU)
-          (2): RecursiveScriptModule(original_name=Conv2d)
-          (3): RecursiveScriptModule(original_name=ELU)
-          (4): RecursiveScriptModule(original_name=Conv2d)
-          (5): RecursiveScriptModule(original_name=ELU)
-        )
-        (mlp_layers): RecursiveScriptModule(
-          original_name=Sequential
-          (0): RecursiveScriptModule(original_name=Linear)
-          (1): RecursiveScriptModule(original_name=ELU)
-        )
-      )
-    )
-  )
-  (core): ModelCoreRNN(
-    (core): GRU(512, 512)
-  )
-  (decoder): MlpDecoder(
-    (mlp): Identity()
-  )
-  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
-  (action_parameterization): ActionParameterizationDefault(
-    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
-  )
-)
-[2025-01-15 20:16:36,939][12470] Using optimizer <class 'torch.optim.adam.Adam'>
-[2025-01-15 20:16:38,120][12470] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2025-01-15 20:16:38,182][12470] Loading model from checkpoint
-[2025-01-15 20:16:38,184][12470] Loaded experiment state at self.train_step=978, self.env_steps=4005888
-[2025-01-15 20:16:38,184][12470] Initialized policy 0 weights for model version 978
-[2025-01-15 20:16:38,189][12470] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2025-01-15 20:16:38,198][12470] LearnerWorker_p0 finished initialization!
-[2025-01-15 20:16:38,372][12483] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 20:16:38,375][12483] RunningMeanStd input shape: (1,)
-[2025-01-15 20:16:38,398][12483] ConvEncoder: input_channels=3
-[2025-01-15 20:16:38,572][12483] Conv encoder output size: 512
-[2025-01-15 20:16:38,573][12483] Policy head output size: 512
-[2025-01-15 20:16:38,649][00513] Inference worker 0-0 is ready!
-[2025-01-15 20:16:38,652][00513] All inference workers are ready! Signal rollout workers to start!
-[2025-01-15 20:16:38,840][12486] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,845][12484] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,843][12489] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,845][12490] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,897][12487] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,904][12488] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,901][12485] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:38,909][12491] Doom resolution: 160x120, resize resolution: (128, 72)
-[2025-01-15 20:16:39,330][00513] Heartbeat connected on Batcher_0
-[2025-01-15 20:16:39,336][00513] Heartbeat connected on LearnerWorker_p0
-[2025-01-15 20:16:39,369][00513] Heartbeat connected on InferenceWorker_p0-w0
-[2025-01-15 20:16:39,814][12491] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:40,172][12491] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:40,307][12486] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:40,310][12484] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:40,317][12489] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:40,695][12485] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:41,271][12484] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:41,267][12489] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:41,273][12486] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:41,795][12490] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:42,024][00513] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 20:16:42,227][12491] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:42,231][12487] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:42,601][12489] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:42,608][12486] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:43,032][12488] Decorrelating experience for 0 frames...
-[2025-01-15 20:16:43,405][12485] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:43,425][12487] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:43,434][12484] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:43,915][12490] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:44,046][12486] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:44,268][00513] Heartbeat connected on RolloutWorker_w2
-[2025-01-15 20:16:44,398][12488] Decorrelating experience for 32 frames...
-[2025-01-15 20:16:45,017][12491] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:45,284][00513] Heartbeat connected on RolloutWorker_w7
-[2025-01-15 20:16:45,481][12487] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:45,497][12489] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:45,651][12484] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:45,788][12485] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:45,845][00513] Heartbeat connected on RolloutWorker_w4
-[2025-01-15 20:16:46,059][00513] Heartbeat connected on RolloutWorker_w0
-[2025-01-15 20:16:46,631][12488] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:47,024][00513] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 20:16:47,026][00513] Avg episode reward: [(0, '2.377')]
-[2025-01-15 20:16:47,375][12487] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:47,745][00513] Heartbeat connected on RolloutWorker_w3
-[2025-01-15 20:16:47,873][12485] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:48,497][00513] Heartbeat connected on RolloutWorker_w1
-[2025-01-15 20:16:50,967][12490] Decorrelating experience for 64 frames...
-[2025-01-15 20:16:51,329][12488] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:51,657][12470] Signal inference workers to stop experience collection...
-[2025-01-15 20:16:51,681][12483] InferenceWorker_p0-w0: stopping experience collection
-[2025-01-15 20:16:52,031][00513] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 227.8. Samples: 2280. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2025-01-15 20:16:52,033][00513] Avg episode reward: [(0, '3.172')]
-[2025-01-15 20:16:52,040][00513] Heartbeat connected on RolloutWorker_w5
-[2025-01-15 20:16:52,803][12490] Decorrelating experience for 96 frames...
-[2025-01-15 20:16:53,070][00513] Heartbeat connected on RolloutWorker_w6
-[2025-01-15 20:16:53,761][12470] Signal inference workers to resume experience collection...
-[2025-01-15 20:16:53,762][12483] InferenceWorker_p0-w0: resuming experience collection
-[2025-01-15 20:16:57,024][00513] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 4022272. Throughput: 0: 322.0. Samples: 4830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0)
-[2025-01-15 20:16:57,030][00513] Avg episode reward: [(0, '3.550')]
-[2025-01-15 20:17:01,844][12483] Updated weights for policy 0, policy_version 988 (0.0027)
-[2025-01-15 20:17:02,024][00513] Fps is (10 sec: 4098.8, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4046848. Throughput: 0: 409.6. Samples: 8192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:17:02,026][00513] Avg episode reward: [(0, '4.152')]
-[2025-01-15 20:17:07,024][00513] Fps is (10 sec: 4505.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4067328. Throughput: 0: 604.1. Samples: 15102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:17:07,029][00513] Avg episode reward: [(0, '4.718')]
-[2025-01-15 20:17:12,025][00513] Fps is (10 sec: 3276.4, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 4079616. Throughput: 0: 636.6. Samples: 19100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:17:12,027][00513] Avg episode reward: [(0, '4.786')]
-[2025-01-15 20:17:14,031][12483] Updated weights for policy 0, policy_version 998 (0.0038)
-[2025-01-15 20:17:17,024][00513] Fps is (10 sec: 3276.8, 60 sec: 2691.6, 300 sec: 2691.6). Total num frames: 4100096. Throughput: 0: 620.3. Samples: 21712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:17:17,030][00513] Avg episode reward: [(0, '4.892')]
-[2025-01-15 20:17:22,024][00513] Fps is (10 sec: 4506.2, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 4124672. Throughput: 0: 719.6. Samples: 28784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:17:22,029][00513] Avg episode reward: [(0, '4.690')]
-[2025-01-15 20:17:22,793][12483] Updated weights for policy 0, policy_version 1008 (0.0014)
-[2025-01-15 20:17:27,024][00513] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 4141056. Throughput: 0: 765.2. Samples: 34436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:17:27,027][00513] Avg episode reward: [(0, '4.498')]
-[2025-01-15 20:17:32,024][00513] Fps is (10 sec: 3276.8, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 4157440. Throughput: 0: 814.0. Samples: 36642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:17:32,027][00513] Avg episode reward: [(0, '4.439')]
-[2025-01-15 20:17:34,100][12483] Updated weights for policy 0, policy_version 1018 (0.0014)
-[2025-01-15 20:17:37,024][00513] Fps is (10 sec: 4096.0, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 4182016. Throughput: 0: 915.6. Samples: 43474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:17:37,028][00513] Avg episode reward: [(0, '4.644')]
-[2025-01-15 20:17:42,028][00513] Fps is (10 sec: 4503.8, 60 sec: 3276.6, 300 sec: 3276.6). Total num frames: 4202496. Throughput: 0: 1004.5. Samples: 50038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:17:42,030][00513] Avg episode reward: [(0, '4.694')]
-[2025-01-15 20:17:44,274][12483] Updated weights for policy 0, policy_version 1028 (0.0017)
-[2025-01-15 20:17:47,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 4218880. Throughput: 0: 976.8. Samples: 52146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:17:47,028][00513] Avg episode reward: [(0, '4.614')]
-[2025-01-15 20:17:52,024][00513] Fps is (10 sec: 3687.9, 60 sec: 3891.6, 300 sec: 3335.3). Total num frames: 4239360. Throughput: 0: 950.6. Samples: 57880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:17:52,031][00513] Avg episode reward: [(0, '4.800')]
-[2025-01-15 20:17:53,979][12483] Updated weights for policy 0, policy_version 1038 (0.0019)
-[2025-01-15 20:17:57,024][00513] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3440.6). Total num frames: 4263936. Throughput: 0: 1021.5. Samples: 65066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:17:57,030][00513] Avg episode reward: [(0, '4.858')]
-[2025-01-15 20:18:02,024][00513] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3430.4). Total num frames: 4280320. Throughput: 0: 1025.5. Samples: 67860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:18:02,026][00513] Avg episode reward: [(0, '4.639')]
-[2025-01-15 20:18:05,504][12483] Updated weights for policy 0, policy_version 1048 (0.0017)
-[2025-01-15 20:18:07,024][00513] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3421.4). Total num frames: 4296704. Throughput: 0: 972.2. Samples: 72532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:18:07,032][00513] Avg episode reward: [(0, '4.629')]
-[2025-01-15 20:18:12,028][00513] Fps is (10 sec: 4094.5, 60 sec: 4027.6, 300 sec: 3504.2). Total num frames: 4321280. Throughput: 0: 1004.5. Samples: 79642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:18:12,035][00513] Avg episode reward: [(0, '4.662')]
-[2025-01-15 20:18:14,140][12483] Updated weights for policy 0, policy_version 1058 (0.0024)
-[2025-01-15 20:18:17,026][00513] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3535.4). Total num frames: 4341760. Throughput: 0: 1034.0. Samples: 83174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:18:17,030][00513] Avg episode reward: [(0, '4.844')]
-[2025-01-15 20:18:17,044][12470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001060_4341760.pth...
-[2025-01-15 20:18:17,235][12470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000895_3665920.pth
-[2025-01-15 20:18:22,024][00513] Fps is (10 sec: 3278.0, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 4354048. Throughput: 0: 976.5. Samples: 87416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:18:22,035][00513] Avg episode reward: [(0, '4.774')]
-[2025-01-15 20:18:25,669][12483] Updated weights for policy 0, policy_version 1068 (0.0019)
-[2025-01-15 20:18:27,024][00513] Fps is (10 sec: 3687.0, 60 sec: 3959.4, 300 sec: 3549.9). Total num frames: 4378624. Throughput: 0: 975.4. Samples: 93926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:18:27,031][00513] Avg episode reward: [(0, '4.711')]
-[2025-01-15 20:18:32,028][00513] Fps is (10 sec: 4912.9, 60 sec: 4095.7, 300 sec: 3611.8). Total num frames: 4403200. Throughput: 0: 1007.4. Samples: 97482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:18:32,031][00513] Avg episode reward: [(0, '4.695')]
-[2025-01-15 20:18:36,060][12483] Updated weights for policy 0, policy_version 1078 (0.0018)
-[2025-01-15 20:18:37,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3561.7). Total num frames: 4415488. Throughput: 0: 995.3. Samples: 102670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:18:37,028][00513] Avg episode reward: [(0, '4.473')]
-[2025-01-15 20:18:42,024][00513] Fps is (10 sec: 3278.2, 60 sec: 3891.4, 300 sec: 3584.0). Total num frames: 4435968. Throughput: 0: 959.6. Samples: 108246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:18:42,026][00513] Avg episode reward: [(0, '4.496')]
-[2025-01-15 20:18:45,981][12483] Updated weights for policy 0, policy_version 1088 (0.0016)
-[2025-01-15 20:18:47,025][00513] Fps is (10 sec: 4505.3, 60 sec: 4027.7, 300 sec: 3637.2). Total num frames: 4460544. Throughput: 0: 972.7. Samples: 111634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:18:47,029][00513] Avg episode reward: [(0, '4.619')]
-[2025-01-15 20:18:52,024][00513] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3623.4). Total num frames: 4476928. Throughput: 0: 1017.3. Samples: 118310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:18:52,030][00513] Avg episode reward: [(0, '4.665')]
-[2025-01-15 20:18:57,027][00513] Fps is (10 sec: 3276.2, 60 sec: 3822.8, 300 sec: 3610.5). Total num frames: 4493312. Throughput: 0: 961.0. Samples: 122888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:18:57,032][00513] Avg episode reward: [(0, '4.764')]
-[2025-01-15 20:18:57,137][12483] Updated weights for policy 0, policy_version 1098 (0.0021)
-[2025-01-15 20:19:02,024][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3657.1). Total num frames: 4517888. Throughput: 0: 962.6. Samples: 126490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:19:02,030][00513] Avg episode reward: [(0, '4.918')]
-[2025-01-15 20:19:05,643][12483] Updated weights for policy 0, policy_version 1108 (0.0024)
-[2025-01-15 20:19:07,024][00513] Fps is (10 sec: 4916.4, 60 sec: 4096.0, 300 sec: 3700.5). Total num frames: 4542464. Throughput: 0: 1029.2. Samples: 133730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:19:07,027][00513] Avg episode reward: [(0, '4.918')]
-[2025-01-15 20:19:12,032][00513] Fps is (10 sec: 3683.5, 60 sec: 3890.9, 300 sec: 3658.9). Total num frames: 4554752. Throughput: 0: 987.4. Samples: 138366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:19:12,039][00513] Avg episode reward: [(0, '4.947')]
-[2025-01-15 20:19:16,946][12483] Updated weights for policy 0, policy_version 1118 (0.0023)
-[2025-01-15 20:19:17,024][00513] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 3699.6). Total num frames: 4579328. Throughput: 0: 967.3. Samples: 141004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:19:17,026][00513] Avg episode reward: [(0, '4.761')]
-[2025-01-15 20:19:22,024][00513] Fps is (10 sec: 4509.1, 60 sec: 4096.0, 300 sec: 3712.0). Total num frames: 4599808. Throughput: 0: 1006.4. Samples: 147956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:19:22,026][00513] Avg episode reward: [(0, '4.587')]
-[2025-01-15 20:19:27,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3698.8). Total num frames: 4616192. Throughput: 0: 1004.1. Samples: 153432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:19:27,026][00513] Avg episode reward: [(0, '4.309')]
-[2025-01-15 20:19:27,499][12483] Updated weights for policy 0, policy_version 1128 (0.0023)
-[2025-01-15 20:19:32,024][00513] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3686.4). Total num frames: 4632576. Throughput: 0: 977.0. Samples: 155598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:19:32,026][00513] Avg episode reward: [(0, '4.495')]
-[2025-01-15 20:19:37,026][00513] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3721.5). Total num frames: 4657152. Throughput: 0: 976.2. Samples: 162242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:19:37,030][00513] Avg episode reward: [(0, '4.706')]
-[2025-01-15 20:19:37,501][12483] Updated weights for policy 0, policy_version 1138 (0.0014)
-[2025-01-15 20:19:42,024][00513] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3731.9). Total num frames: 4677632. Throughput: 0: 1017.3. Samples: 168664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:19:42,027][00513] Avg episode reward: [(0, '4.736')]
-[2025-01-15 20:19:47,024][00513] Fps is (10 sec: 3277.4, 60 sec: 3823.0, 300 sec: 3697.5). Total num frames: 4689920. Throughput: 0: 981.5. Samples: 170656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:19:47,028][00513] Avg episode reward: [(0, '4.702')]
-[2025-01-15 20:19:49,402][12483] Updated weights for policy 0, policy_version 1148 (0.0016)
-[2025-01-15 20:19:52,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3729.5). Total num frames: 4714496. Throughput: 0: 944.8. Samples: 176246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2025-01-15 20:19:52,027][00513] Avg episode reward: [(0, '4.517')]
-[2025-01-15 20:19:57,024][00513] Fps is (10 sec: 4505.5, 60 sec: 4027.9, 300 sec: 3738.9). Total num frames: 4734976. Throughput: 0: 999.8. Samples: 183350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:19:57,026][00513] Avg episode reward: [(0, '4.657')]
-[2025-01-15 20:19:57,985][12483] Updated weights for policy 0, policy_version 1158 (0.0023)
-[2025-01-15 20:20:02,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3727.4). Total num frames: 4751360. Throughput: 0: 1004.6. Samples: 186210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:20:02,031][00513] Avg episode reward: [(0, '4.700')]
-[2025-01-15 20:20:07,024][00513] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3736.4). Total num frames: 4771840. Throughput: 0: 953.2. Samples: 190848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:20:07,029][00513] Avg episode reward: [(0, '4.793')]
-[2025-01-15 20:20:09,172][12483] Updated weights for policy 0, policy_version 1168 (0.0025)
-[2025-01-15 20:20:12,024][00513] Fps is (10 sec: 4505.4, 60 sec: 4028.2, 300 sec: 3764.4). Total num frames: 4796416. Throughput: 0: 993.1. Samples: 198122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:20:12,030][00513] Avg episode reward: [(0, '4.853')]
-[2025-01-15 20:20:17,024][00513] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3772.1). Total num frames: 4816896. Throughput: 0: 1022.6. Samples: 201614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:20:17,029][00513] Avg episode reward: [(0, '4.693')]
-[2025-01-15 20:20:17,041][12470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001176_4816896.pth...
-[2025-01-15 20:20:17,243][12470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
-[2025-01-15 20:20:19,780][12483] Updated weights for policy 0, policy_version 1178 (0.0042)
-[2025-01-15 20:20:22,025][00513] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3742.2). Total num frames: 4829184. Throughput: 0: 975.2. Samples: 206124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2025-01-15 20:20:22,028][00513] Avg episode reward: [(0, '4.699')]
-[2025-01-15 20:20:27,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3768.3). Total num frames: 4853760. Throughput: 0: 977.8. Samples: 212666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:20:27,027][00513] Avg episode reward: [(0, '4.807')]
-[2025-01-15 20:20:29,065][12483] Updated weights for policy 0, policy_version 1188 (0.0017)
-[2025-01-15 20:20:32,024][00513] Fps is (10 sec: 4915.7, 60 sec: 4096.0, 300 sec: 3793.3). Total num frames: 4878336. Throughput: 0: 1012.3. Samples: 216208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2025-01-15 20:20:32,026][00513] Avg episode reward: [(0, '4.717')]
-[2025-01-15 20:20:37,024][00513] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3782.3). Total num frames: 4894720. Throughput: 0: 1013.4. Samples: 221850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2025-01-15 20:20:37,026][00513] Avg episode reward: [(0, '4.635')]
-[2025-01-15 20:20:40,431][12483] Updated weights for policy 0, policy_version 1198 (0.0015)
-[2025-01-15 20:20:42,024][00513] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3771.7). Total num frames: 4911104. Throughput: 0: 976.5. Samples: 227292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2025-01-15 20:20:42,028][00513] Avg episode reward: [(0, '4.488')]
-[2025-01-15 20:20:47,024][00513] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3795.1). Total num frames: 4935680. Throughput: 0: 991.3. Samples: 230818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:20:47,026][00513] Avg episode reward: [(0, '4.608')]
-[2025-01-15 20:20:49,093][12483] Updated weights for policy 0, policy_version 1208 (0.0014)
-[2025-01-15 20:20:52,024][00513] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3801.1). Total num frames: 4956160. Throughput: 0: 1033.0. Samples: 237334. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2025-01-15 20:20:52,027][00513] Avg episode reward: [(0, '4.771')]
-[2025-01-15 20:20:57,024][00513] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3790.8). Total num frames: 4972544. Throughput: 0: 971.5. Samples: 241840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
-[2025-01-15 20:20:57,026][00513] Avg episode reward: [(0, '4.877')]
-[2025-01-15 20:21:00,282][12483] Updated weights for policy 0, policy_version 1218 (0.0027)
-[2025-01-15 20:21:02,024][00513] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3796.7). Total num frames: 4993024. Throughput: 0: 973.9. Samples: 245440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2025-01-15 20:21:02,026][00513] Avg episode reward: [(0, '4.657')]
-[2025-01-15 20:21:03,883][12470] Stopping Batcher_0...
-[2025-01-15 20:21:03,884][12470] Loop batcher_evt_loop terminating...
-[2025-01-15 20:21:03,884][00513] Component Batcher_0 stopped!
-[2025-01-15 20:21:03,888][12470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
-[2025-01-15 20:21:03,938][12483] Weights refcount: 2 0
-[2025-01-15 20:21:03,942][00513] Component InferenceWorker_p0-w0 stopped!
-[2025-01-15 20:21:03,942][12483] Stopping InferenceWorker_p0-w0...
-[2025-01-15 20:21:03,946][12483] Loop inference_proc0-0_evt_loop terminating...
-[2025-01-15 20:21:04,012][12470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001060_4341760.pth
-[2025-01-15 20:21:04,021][12470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
-[2025-01-15 20:21:04,187][00513] Component LearnerWorker_p0 stopped!
-[2025-01-15 20:21:04,191][12470] Stopping LearnerWorker_p0...
-[2025-01-15 20:21:04,191][12470] Loop learner_proc0_evt_loop terminating...
-[2025-01-15 20:21:04,243][12491] Stopping RolloutWorker_w7...
-[2025-01-15 20:21:04,243][00513] Component RolloutWorker_w7 stopped!
-[2025-01-15 20:21:04,249][12488] Stopping RolloutWorker_w5...
-[2025-01-15 20:21:04,249][00513] Component RolloutWorker_w5 stopped!
-[2025-01-15 20:21:04,245][12491] Loop rollout_proc7_evt_loop terminating...
-[2025-01-15 20:21:04,250][12488] Loop rollout_proc5_evt_loop terminating...
-[2025-01-15 20:21:04,258][12485] Stopping RolloutWorker_w1...
-[2025-01-15 20:21:04,258][00513] Component RolloutWorker_w1 stopped!
-[2025-01-15 20:21:04,269][00513] Component RolloutWorker_w3 stopped!
-[2025-01-15 20:21:04,269][12487] Stopping RolloutWorker_w3...
-[2025-01-15 20:21:04,259][12485] Loop rollout_proc1_evt_loop terminating...
-[2025-01-15 20:21:04,273][12487] Loop rollout_proc3_evt_loop terminating...
-[2025-01-15 20:21:04,400][00513] Component RolloutWorker_w4 stopped!
-[2025-01-15 20:21:04,402][12489] Stopping RolloutWorker_w4...
-[2025-01-15 20:21:04,403][12489] Loop rollout_proc4_evt_loop terminating...
-[2025-01-15 20:21:04,408][00513] Component RolloutWorker_w6 stopped!
-[2025-01-15 20:21:04,411][12490] Stopping RolloutWorker_w6...
-[2025-01-15 20:21:04,413][12490] Loop rollout_proc6_evt_loop terminating...
-[2025-01-15 20:21:04,440][00513] Component RolloutWorker_w0 stopped!
-[2025-01-15 20:21:04,448][12484] Stopping RolloutWorker_w0...
-[2025-01-15 20:21:04,460][12484] Loop rollout_proc0_evt_loop terminating...
-[2025-01-15 20:21:04,507][00513] Component RolloutWorker_w2 stopped!
-[2025-01-15 20:21:04,514][00513] Waiting for process learner_proc0 to stop...
-[2025-01-15 20:21:04,519][12486] Stopping RolloutWorker_w2...
-[2025-01-15 20:21:04,521][12486] Loop rollout_proc2_evt_loop terminating...
-[2025-01-15 20:21:05,798][00513] Waiting for process inference_proc0-0 to join...
-[2025-01-15 20:21:05,804][00513] Waiting for process rollout_proc0 to join...
-[2025-01-15 20:21:08,394][00513] Waiting for process rollout_proc1 to join...
-[2025-01-15 20:21:08,400][00513] Waiting for process rollout_proc2 to join...
-[2025-01-15 20:21:08,407][00513] Waiting for process rollout_proc3 to join...
-[2025-01-15 20:21:08,415][00513] Waiting for process rollout_proc4 to join...
-[2025-01-15 20:21:08,420][00513] Waiting for process rollout_proc5 to join...
-[2025-01-15 20:21:08,429][00513] Waiting for process rollout_proc6 to join...
-[2025-01-15 20:21:08,432][00513] Waiting for process rollout_proc7 to join...
-[2025-01-15 20:21:08,435][00513] Batcher 0 profile tree view:
-batching: 6.9221, releasing_batches: 0.0071
-[2025-01-15 20:21:08,440][00513] InferenceWorker_p0-w0 profile tree view:
-wait_policy: 0.0000
-  wait_policy_total: 108.8997
-update_model: 2.0590
-  weight_update: 0.0035
-one_step: 0.0121
-  handle_policy_step: 141.7515
-    deserialize: 3.4622, stack: 0.7518, obs_to_device_normalize: 30.4202, forward: 70.4643, send_messages: 7.0628
-    prepare_outputs: 22.3062
-      to_cpu: 13.7233
-[2025-01-15 20:21:08,442][00513] Learner 0 profile tree view:
-misc: 0.0012, prepare_batch: 5.7237
-train: 22.0629
-  epoch_init: 0.0102, minibatch_init: 0.0054, losses_postprocess: 0.1409, kl_divergence: 0.1607, after_optimizer: 0.8153
-  calculate_losses: 8.6145
-    losses_init: 0.0009, forward_head: 0.8418, bptt_initial: 5.7808, tail: 0.3687, advantages_returns: 0.0542, losses: 0.9566
-    bptt: 0.5177
-      bptt_forward_core: 0.4968
-  update: 12.1568
-    clip: 0.2376
-[2025-01-15 20:21:08,445][00513] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.1150, enqueue_policy_requests: 24.4356, env_step: 202.8134, overhead: 3.1177, complete_rollouts: 1.6034
-save_policy_outputs: 4.9026
-  split_output_tensors: 1.9623
-[2025-01-15 20:21:08,447][00513] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.1117, enqueue_policy_requests: 25.4056, env_step: 202.5008, overhead: 3.0848, complete_rollouts: 1.8759
-save_policy_outputs: 5.0752
-  split_output_tensors: 2.0261
-[2025-01-15 20:21:08,449][00513] Loop Runner_EvtLoop terminating...
-[2025-01-15 20:21:08,454][00513] Runner profile tree view:
-main_loop: 289.0756
-[2025-01-15 20:21:08,455][00513] Collected {0: 5005312}, FPS: 3457.3
-[2025-01-15 20:21:27,047][00513] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2025-01-15 20:21:27,049][00513] Overriding arg 'num_workers' with value 1 passed from command line
-[2025-01-15 20:21:27,051][00513] Adding new argument 'no_render'=True that is not in the saved config file!
-[2025-01-15 20:21:27,054][00513] Adding new argument 'save_video'=True that is not in the saved config file!
-[2025-01-15 20:21:27,056][00513] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2025-01-15 20:21:27,057][00513] Adding new argument 'video_name'=None that is not in the saved config file!
-[2025-01-15 20:21:27,059][00513] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
-[2025-01-15 20:21:27,061][00513] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2025-01-15 20:21:27,062][00513] Adding new argument 'push_to_hub'=False that is not in the saved config file!
-[2025-01-15 20:21:27,062][00513] Adding new argument 'hf_repository'=None that is not in the saved config file!
-[2025-01-15 20:21:27,063][00513] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2025-01-15 20:21:27,064][00513] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2025-01-15 20:21:27,065][00513] Adding new argument 'train_script'=None that is not in the saved config file!
-[2025-01-15 20:21:27,067][00513] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2025-01-15 20:21:27,068][00513] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2025-01-15 20:21:27,106][00513] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 20:21:27,109][00513] RunningMeanStd input shape: (1,)
-[2025-01-15 20:21:27,122][00513] ConvEncoder: input_channels=3
-[2025-01-15 20:21:27,161][00513] Conv encoder output size: 512
-[2025-01-15 20:21:27,162][00513] Policy head output size: 512
-[2025-01-15 20:21:27,183][00513] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
-[2025-01-15 20:21:27,628][00513] Num frames 100...
-[2025-01-15 20:21:27,750][00513] Num frames 200...
-[2025-01-15 20:21:27,872][00513] Num frames 300...
-[2025-01-15 20:21:27,995][00513] Num frames 400...
-[2025-01-15 20:21:28,070][00513] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
-[2025-01-15 20:21:28,072][00513] Avg episode reward: 5.160, avg true_objective: 4.160
-[2025-01-15 20:21:28,181][00513] Num frames 500...
-[2025-01-15 20:21:28,314][00513] Num frames 600...
-[2025-01-15 20:21:28,448][00513] Num frames 700...
-[2025-01-15 20:21:28,575][00513] Num frames 800...
-[2025-01-15 20:21:28,707][00513] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
-[2025-01-15 20:21:28,710][00513] Avg episode reward: 5.320, avg true_objective: 4.320
-[2025-01-15 20:21:28,758][00513] Num frames 900...
-[2025-01-15 20:21:28,883][00513] Num frames 1000...
-[2025-01-15 20:21:29,004][00513] Num frames 1100...
-[2025-01-15 20:21:29,128][00513] Num frames 1200...
-[2025-01-15 20:21:29,241][00513] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160
-[2025-01-15 20:21:29,243][00513] Avg episode reward: 4.827, avg true_objective: 4.160
-[2025-01-15 20:21:29,316][00513] Num frames 1300...
-[2025-01-15 20:21:29,450][00513] Num frames 1400...
-[2025-01-15 20:21:29,580][00513] Num frames 1500...
-[2025-01-15 20:21:29,704][00513] Num frames 1600...
-[2025-01-15 20:21:29,799][00513] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080
-[2025-01-15 20:21:29,800][00513] Avg episode reward: 4.580, avg true_objective: 4.080
-[2025-01-15 20:21:29,887][00513] Num frames 1700...
-[2025-01-15 20:21:30,010][00513] Num frames 1800...
-[2025-01-15 20:21:30,135][00513] Num frames 1900...
-[2025-01-15 20:21:30,261][00513] Num frames 2000...
-[2025-01-15 20:21:30,337][00513] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032
-[2025-01-15 20:21:30,339][00513] Avg episode reward: 4.432, avg true_objective: 4.032
-[2025-01-15 20:21:30,464][00513] Num frames 2100...
-[2025-01-15 20:21:30,596][00513] Num frames 2200...
-[2025-01-15 20:21:30,762][00513] Num frames 2300...
-[2025-01-15 20:21:30,936][00513] Num frames 2400...
-[2025-01-15 20:21:30,989][00513] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000
-[2025-01-15 20:21:30,991][00513] Avg episode reward: 4.333, avg true_objective: 4.000
-[2025-01-15 20:21:31,157][00513] Num frames 2500...
-[2025-01-15 20:21:31,329][00513] Num frames 2600...
-[2025-01-15 20:21:31,515][00513] Num frames 2700...
-[2025-01-15 20:21:31,711][00513] Avg episode rewards: #0: 4.263, true rewards: #0: 3.977
-[2025-01-15 20:21:31,713][00513] Avg episode reward: 4.263, avg true_objective: 3.977
-[2025-01-15 20:21:31,741][00513] Num frames 2800...
-[2025-01-15 20:21:31,907][00513] Num frames 2900...
-[2025-01-15 20:21:32,077][00513] Num frames 3000...
-[2025-01-15 20:21:32,251][00513] Num frames 3100...
-[2025-01-15 20:21:32,427][00513] Avg episode rewards: #0: 4.210, true rewards: #0: 3.960
-[2025-01-15 20:21:32,429][00513] Avg episode reward: 4.210, avg true_objective: 3.960
-[2025-01-15 20:21:32,490][00513] Num frames 3200...
-[2025-01-15 20:21:32,661][00513] Num frames 3300...
-[2025-01-15 20:21:32,835][00513] Num frames 3400...
-[2025-01-15 20:21:33,008][00513] Num frames 3500...
-[2025-01-15 20:21:33,154][00513] Avg episode rewards: #0: 4.169, true rewards: #0: 3.947
-[2025-01-15 20:21:33,156][00513] Avg episode reward: 4.169, avg true_objective: 3.947
-[2025-01-15 20:21:33,218][00513] Num frames 3600...
-[2025-01-15 20:21:33,342][00513] Num frames 3700...
-[2025-01-15 20:21:33,483][00513] Num frames 3800...
-[2025-01-15 20:21:33,611][00513] Num frames 3900...
-[2025-01-15 20:21:33,710][00513] Avg episode rewards: #0: 4.136, true rewards: #0: 3.936
-[2025-01-15 20:21:33,712][00513] Avg episode reward: 4.136, avg true_objective: 3.936
-[2025-01-15 20:21:52,703][00513] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
-[2025-01-15 20:23:23,610][00513] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2025-01-15 20:23:23,612][00513] Overriding arg 'num_workers' with value 1 passed from command line
-[2025-01-15 20:23:23,614][00513] Adding new argument 'no_render'=True that is not in the saved config file!
-[2025-01-15 20:23:23,615][00513] Adding new argument 'save_video'=True that is not in the saved config file!
-[2025-01-15 20:23:23,617][00513] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2025-01-15 20:23:23,619][00513] Adding new argument 'video_name'=None that is not in the saved config file!
-[2025-01-15 20:23:23,620][00513] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
-[2025-01-15 20:23:23,622][00513] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2025-01-15 20:23:23,623][00513] Adding new argument 'push_to_hub'=True that is not in the saved config file!
-[2025-01-15 20:23:23,624][00513] Adding new argument 'hf_repository'='hartman23/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
-[2025-01-15 20:23:23,625][00513] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2025-01-15 20:23:23,626][00513] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2025-01-15 20:23:23,627][00513] Adding new argument 'train_script'=None that is not in the saved config file!
-[2025-01-15 20:23:23,628][00513] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2025-01-15 20:23:23,629][00513] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2025-01-15 20:23:23,657][00513] RunningMeanStd input shape: (3, 72, 128)
-[2025-01-15 20:23:23,659][00513] RunningMeanStd input shape: (1,)
-[2025-01-15 20:23:23,676][00513] ConvEncoder: input_channels=3
-[2025-01-15 20:23:23,717][00513] Conv encoder output size: 512
-[2025-01-15 20:23:23,719][00513] Policy head output size: 512
-[2025-01-15 20:23:23,737][00513] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
-[2025-01-15 20:23:24,134][00513] Num frames 100...
-[2025-01-15 20:23:24,256][00513] Num frames 200...
-[2025-01-15 20:23:24,377][00513] Num frames 300...
-[2025-01-15 20:23:24,519][00513] Num frames 400...
-[2025-01-15 20:23:24,598][00513] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
-[2025-01-15 20:23:24,600][00513] Avg episode reward: 5.160, avg true_objective: 4.160
-[2025-01-15 20:23:24,709][00513] Num frames 500...
-[2025-01-15 20:23:24,835][00513] Num frames 600...
-[2025-01-15 20:23:24,958][00513] Num frames 700...
-[2025-01-15 20:23:25,089][00513] Num frames 800...
-[2025-01-15 20:23:25,142][00513] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000
-[2025-01-15 20:23:25,143][00513] Avg episode reward: 4.500, avg true_objective: 4.000
-[2025-01-15 20:23:25,264][00513] Num frames 900...
-[2025-01-15 20:23:25,385][00513] Num frames 1000...
-[2025-01-15 20:23:25,518][00513] Num frames 1100...
-[2025-01-15 20:23:25,647][00513] Num frames 1200...
-[2025-01-15 20:23:25,805][00513] Avg episode rewards: #0: 5.267, true rewards: #0: 4.267
-[2025-01-15 20:23:25,807][00513] Avg episode reward: 5.267, avg true_objective: 4.267
-[2025-01-15 20:23:25,837][00513] Num frames 1300...
-[2025-01-15 20:23:25,959][00513] Num frames 1400...
-[2025-01-15 20:23:26,082][00513] Num frames 1500...
-[2025-01-15 20:23:26,207][00513] Num frames 1600...
-[2025-01-15 20:23:26,330][00513] Num frames 1700...
-[2025-01-15 20:23:26,419][00513] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
-[2025-01-15 20:23:26,422][00513] Avg episode reward: 5.320, avg true_objective: 4.320
-[2025-01-15 20:23:26,515][00513] Num frames 1800...
-[2025-01-15 20:23:26,635][00513] Num frames 1900...
-[2025-01-15 20:23:26,762][00513] Num frames 2000...
-[2025-01-15 20:23:26,882][00513] Num frames 2100...
-[2025-01-15 20:23:26,952][00513] Avg episode rewards: #0: 5.024, true rewards: #0: 4.224
-[2025-01-15 20:23:26,954][00513] Avg episode reward: 5.024, avg true_objective: 4.224
-[2025-01-15 20:23:27,066][00513] Num frames 2200...
-[2025-01-15 20:23:27,189][00513] Num frames 2300...
-[2025-01-15 20:23:27,313][00513] Num frames 2400...
-[2025-01-15 20:23:27,493][00513] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160
-[2025-01-15 20:23:27,494][00513] Avg episode reward: 4.827, avg true_objective: 4.160
-[2025-01-15 20:23:27,502][00513] Num frames 2500...
-[2025-01-15 20:23:27,625][00513] Num frames 2600...
-[2025-01-15 20:23:27,745][00513] Num frames 2700...
-[2025-01-15 20:23:27,873][00513] Num frames 2800...
-[2025-01-15 20:23:27,996][00513] Num frames 2900...
-[2025-01-15 20:23:28,115][00513] Num frames 3000...
-[2025-01-15 20:23:28,285][00513] Num frames 3100...
-[2025-01-15 20:23:28,458][00513] Num frames 3200...
-[2025-01-15 20:23:28,624][00513] Avg episode rewards: #0: 6.091, true rewards: #0: 4.663
-[2025-01-15 20:23:28,627][00513] Avg episode reward: 6.091, avg true_objective: 4.663
-[2025-01-15 20:23:28,689][00513] Num frames 3300...
-[2025-01-15 20:23:28,861][00513] Num frames 3400...
-[2025-01-15 20:23:29,027][00513] Num frames 3500...
-[2025-01-15 20:23:29,188][00513] Num frames 3600...
-[2025-01-15 20:23:29,351][00513] Num frames 3700...
-[2025-01-15 20:23:29,532][00513] Num frames 3800...
-[2025-01-15 20:23:29,605][00513] Avg episode rewards: #0: 6.260, true rewards: #0: 4.760
-[2025-01-15 20:23:29,607][00513] Avg episode reward: 6.260, avg true_objective: 4.760
-[2025-01-15 20:23:29,766][00513] Num frames 3900...
-[2025-01-15 20:23:29,941][00513] Num frames 4000...
-[2025-01-15 20:23:30,123][00513] Num frames 4100...
-[2025-01-15 20:23:30,301][00513] Num frames 4200...
-[2025-01-15 20:23:30,455][00513] Avg episode rewards: #0: 6.173, true rewards: #0: 4.729
-[2025-01-15 20:23:30,457][00513] Avg episode reward: 6.173, avg true_objective: 4.729
-[2025-01-15 20:23:30,542][00513] Num frames 4300...
-[2025-01-15 20:23:30,700][00513] Num frames 4400...
-[2025-01-15 20:23:30,824][00513] Num frames 4500...
-[2025-01-15 20:23:30,894][00513] Avg episode rewards: #0: 5.812, true rewards: #0: 4.512
-[2025-01-15 20:23:30,896][00513] Avg episode reward: 5.812, avg true_objective: 4.512
-[2025-01-15 20:23:52,101][00513] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2025-01-15 20:40:31,914][18890] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-01-15 20:40:32,713][18890] No checkpoints found
+[2025-01-15 20:40:32,713][18890] Did not load from checkpoint, starting from scratch!
+[2025-01-15 20:40:32,713][18890] Initialized policy 0 weights for model version 0
+[2025-01-15 20:40:32,717][18890] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-01-15 20:40:32,724][18890] LearnerWorker_p0 finished initialization!
+[2025-01-15 20:40:32,811][18905] RunningMeanStd input shape: (3, 72, 128)
+[2025-01-15 20:40:32,812][18905] RunningMeanStd input shape: (1,)
+[2025-01-15 20:40:32,824][18905] ConvEncoder: input_channels=3
+[2025-01-15 20:40:32,926][18905] Conv encoder output size: 512
+[2025-01-15 20:40:32,927][18905] Policy head output size: 512
+[2025-01-15 20:40:33,174][18904] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,177][18906] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,176][18907] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,180][18903] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,178][18909] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,176][18911] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,182][18910] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,181][18908] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-01-15 20:40:33,813][18904] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:34,224][18909] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:34,243][18903] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:34,248][18907] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:34,970][18909] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:35,000][18903] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:35,043][18906] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:35,042][18908] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:36,561][18904] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:36,569][18906] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:36,583][18908] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:37,320][18907] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:37,386][18910] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:37,751][18909] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:37,784][18903] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:38,306][18911] Decorrelating experience for 0 frames...
+[2025-01-15 20:40:38,649][18904] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:39,093][18906] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:39,749][18908] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:39,782][18910] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:39,825][18907] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:39,911][18903] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:40,736][18909] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:40,862][18904] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:41,518][18911] Decorrelating experience for 32 frames...
+[2025-01-15 20:40:42,272][18910] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:42,711][18907] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:43,827][18906] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:45,204][18911] Decorrelating experience for 64 frames...
+[2025-01-15 20:40:45,678][18890] Signal inference workers to stop experience collection...
+[2025-01-15 20:40:45,693][18905] InferenceWorker_p0-w0: stopping experience collection
+[2025-01-15 20:40:45,748][18910] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:46,017][18908] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:46,444][18911] Decorrelating experience for 96 frames...
+[2025-01-15 20:40:48,164][18890] Signal inference workers to resume experience collection...
+[2025-01-15 20:40:48,165][18905] InferenceWorker_p0-w0: resuming experience collection
+[2025-01-15 20:40:58,532][18905] Updated weights for policy 0, policy_version 10 (0.0151)
+[2025-01-15 20:41:07,752][18905] Updated weights for policy 0, policy_version 20 (0.0021)
+[2025-01-15 20:41:16,654][18890] Saving new best policy, reward=4.286!
+[2025-01-15 20:41:19,054][18905] Updated weights for policy 0, policy_version 30 (0.0027)
+[2025-01-15 20:41:21,730][18890] Saving new best policy, reward=4.421!
+[2025-01-15 20:41:26,657][18890] Saving new best policy, reward=4.544!
+[2025-01-15 20:41:28,296][18905] Updated weights for policy 0, policy_version 40 (0.0025)
+[2025-01-15 20:41:39,180][18905] Updated weights for policy 0, policy_version 50 (0.0013)
+[2025-01-15 20:41:49,606][18905] Updated weights for policy 0, policy_version 60 (0.0024)
+[2025-01-15 20:41:56,658][18890] Saving new best policy, reward=4.672!
+[2025-01-15 20:41:59,097][18905] Updated weights for policy 0, policy_version 70 (0.0014)
+[2025-01-15 20:42:10,377][18905] Updated weights for policy 0, policy_version 80 (0.0034)
+[2025-01-15 20:42:11,660][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000081_331776.pth...
+[2025-01-15 20:42:18,927][18905] Updated weights for policy 0, policy_version 90 (0.0022)
+[2025-01-15 20:42:30,370][18905] Updated weights for policy 0, policy_version 100 (0.0023)
+[2025-01-15 20:42:39,917][18905] Updated weights for policy 0, policy_version 110 (0.0017)
+[2025-01-15 20:42:46,656][18890] Saving new best policy, reward=4.700!
+[2025-01-15 20:42:50,317][18905] Updated weights for policy 0, policy_version 120 (0.0017)
+[2025-01-15 20:43:01,728][18905] Updated weights for policy 0, policy_version 130 (0.0031)
+[2025-01-15 20:43:09,954][18905] Updated weights for policy 0, policy_version 140 (0.0020)
+[2025-01-15 20:43:21,208][18905] Updated weights for policy 0, policy_version 150 (0.0033)
+[2025-01-15 20:43:21,663][18890] Saving new best policy, reward=4.805!
+[2025-01-15 20:43:26,658][18890] Saving new best policy, reward=4.912!
+[2025-01-15 20:43:30,742][18905] Updated weights for policy 0, policy_version 160 (0.0033)
+[2025-01-15 20:43:31,677][18890] Saving new best policy, reward=5.011!
+[2025-01-15 20:43:36,656][18890] Saving new best policy, reward=5.529!
+[2025-01-15 20:43:41,256][18905] Updated weights for policy 0, policy_version 170 (0.0020)
+[2025-01-15 20:43:51,796][18905] Updated weights for policy 0, policy_version 180 (0.0025)
+[2025-01-15 20:44:00,760][18905] Updated weights for policy 0, policy_version 190 (0.0016)
+[2025-01-15 20:44:01,661][18890] Saving new best policy, reward=5.920!
+[2025-01-15 20:44:06,656][18890] Saving new best policy, reward=6.300!
+[2025-01-15 20:44:11,662][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth...
+[2025-01-15 20:44:12,157][18905] Updated weights for policy 0, policy_version 200 (0.0020)
+[2025-01-15 20:44:20,549][18905] Updated weights for policy 0, policy_version 210 (0.0016)
+[2025-01-15 20:44:21,663][18890] Saving new best policy, reward=6.683!
+[2025-01-15 20:44:31,929][18905] Updated weights for policy 0, policy_version 220 (0.0022)
+[2025-01-15 20:44:36,656][18890] Saving new best policy, reward=6.875!
+[2025-01-15 20:44:41,670][18890] Saving new best policy, reward=7.165!
+[2025-01-15 20:44:41,990][18905] Updated weights for policy 0, policy_version 230 (0.0020)
+[2025-01-15 20:44:46,657][18890] Saving new best policy, reward=7.461!
+[2025-01-15 20:44:51,665][18890] Saving new best policy, reward=7.797!
+[2025-01-15 20:44:51,887][18905] Updated weights for policy 0, policy_version 240 (0.0022)
+[2025-01-15 20:45:03,081][18905] Updated weights for policy 0, policy_version 250 (0.0027)
+[2025-01-15 20:45:11,328][18905] Updated weights for policy 0, policy_version 260 (0.0021)
+[2025-01-15 20:45:11,663][18890] Saving new best policy, reward=8.564!
+[2025-01-15 20:45:16,659][18890] Saving new best policy, reward=8.706!
+[2025-01-15 20:45:21,661][18890] Saving new best policy, reward=9.101!
+[2025-01-15 20:45:22,805][18905] Updated weights for policy 0, policy_version 270 (0.0018)
+[2025-01-15 20:45:26,658][18890] Saving new best policy, reward=9.152!
+[2025-01-15 20:45:32,452][18905] Updated weights for policy 0, policy_version 280 (0.0025)
+[2025-01-15 20:45:36,661][18890] Saving new best policy, reward=9.238!
+[2025-01-15 20:45:41,663][18890] Saving new best policy, reward=9.757!
+[2025-01-15 20:45:42,863][18905] Updated weights for policy 0, policy_version 290 (0.0032)
+[2025-01-15 20:45:46,655][18890] Saving new best policy, reward=10.477!
+[2025-01-15 20:45:53,611][18905] Updated weights for policy 0, policy_version 300 (0.0020)
+[2025-01-15 20:45:56,711][18890] Saving new best policy, reward=10.766!
+[2025-01-15 20:46:01,659][18890] Saving new best policy, reward=11.492!
+[2025-01-15 20:46:02,942][18905] Updated weights for policy 0, policy_version 310 (0.0014)
+[2025-01-15 20:46:06,656][18890] Saving new best policy, reward=12.060!
+[2025-01-15 20:46:11,663][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_1294336.pth...
+[2025-01-15 20:46:11,820][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000081_331776.pth
+[2025-01-15 20:46:11,836][18890] Saving new best policy, reward=12.845!
+[2025-01-15 20:46:14,719][18905] Updated weights for policy 0, policy_version 320 (0.0027)
+[2025-01-15 20:46:16,657][18890] Saving new best policy, reward=12.907!
+[2025-01-15 20:46:24,108][18905] Updated weights for policy 0, policy_version 330 (0.0028)
+[2025-01-15 20:46:31,663][18890] Saving new best policy, reward=13.885!
+[2025-01-15 20:46:35,142][18905] Updated weights for policy 0, policy_version 340 (0.0031)
+[2025-01-15 20:46:46,257][18905] Updated weights for policy 0, policy_version 350 (0.0030)
+[2025-01-15 20:46:46,664][18890] Saving new best policy, reward=14.050!
+[2025-01-15 20:46:51,667][18890] Saving new best policy, reward=14.325!
+[2025-01-15 20:46:55,987][18905] Updated weights for policy 0, policy_version 360 (0.0023)
+[2025-01-15 20:46:56,663][18890] Saving new best policy, reward=15.430!
+[2025-01-15 20:47:07,932][18905] Updated weights for policy 0, policy_version 370 (0.0028)
+[2025-01-15 20:47:17,492][18905] Updated weights for policy 0, policy_version 380 (0.0028)
+[2025-01-15 20:47:26,656][18890] Saving new best policy, reward=15.671!
+[2025-01-15 20:47:28,279][18905] Updated weights for policy 0, policy_version 390 (0.0017)
+[2025-01-15 20:47:31,661][18890] Saving new best policy, reward=16.523!
+[2025-01-15 20:47:39,939][18905] Updated weights for policy 0, policy_version 400 (0.0017)
+[2025-01-15 20:47:46,656][18890] Saving new best policy, reward=18.349!
+[2025-01-15 20:47:48,655][18905] Updated weights for policy 0, policy_version 410 (0.0024)
+[2025-01-15 20:47:51,668][18890] Saving new best policy, reward=19.337!
+[2025-01-15 20:48:00,447][18905] Updated weights for policy 0, policy_version 420 (0.0054)
+[2025-01-15 20:48:01,670][18890] Saving new best policy, reward=21.156!
+[2025-01-15 20:48:10,522][18905] Updated weights for policy 0, policy_version 430 (0.0024)
+[2025-01-15 20:48:11,672][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth...
+[2025-01-15 20:48:11,873][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth
+[2025-01-15 20:48:16,653][18890] Saving new best policy, reward=21.622!
+[2025-01-15 20:48:20,712][18905] Updated weights for policy 0, policy_version 440 (0.0023)
+[2025-01-15 20:48:21,663][18890] Saving new best policy, reward=23.133!
+[2025-01-15 20:48:32,132][18905] Updated weights for policy 0, policy_version 450 (0.0033)
+[2025-01-15 20:48:40,538][18905] Updated weights for policy 0, policy_version 460 (0.0030)
+[2025-01-15 20:48:51,879][18905] Updated weights for policy 0, policy_version 470 (0.0042)
+[2025-01-15 20:49:01,033][18905] Updated weights for policy 0, policy_version 480 (0.0022)
+[2025-01-15 20:49:11,656][18905] Updated weights for policy 0, policy_version 490 (0.0024)
+[2025-01-15 20:49:22,367][18905] Updated weights for policy 0, policy_version 500 (0.0022)
+[2025-01-15 20:49:31,613][18905] Updated weights for policy 0, policy_version 510 (0.0031)
+[2025-01-15 20:49:42,685][18905] Updated weights for policy 0, policy_version 520 (0.0052)
+[2025-01-15 20:49:51,425][18905] Updated weights for policy 0, policy_version 530 (0.0042)
+[2025-01-15 20:50:02,716][18905] Updated weights for policy 0, policy_version 540 (0.0031)
+[2025-01-15 20:50:11,674][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth...
+[2025-01-15 20:50:11,854][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_1294336.pth
+[2025-01-15 20:50:12,874][18905] Updated weights for policy 0, policy_version 550 (0.0026)
+[2025-01-15 20:50:22,584][18905] Updated weights for policy 0, policy_version 560 (0.0047)
+[2025-01-15 20:50:33,989][18905] Updated weights for policy 0, policy_version 570 (0.0019)
+[2025-01-15 20:50:42,382][18905] Updated weights for policy 0, policy_version 580 (0.0013)
+[2025-01-15 20:50:53,839][18905] Updated weights for policy 0, policy_version 590 (0.0021)
+[2025-01-15 20:50:56,654][18890] Saving new best policy, reward=23.940!
+[2025-01-15 20:51:03,550][18905] Updated weights for policy 0, policy_version 600 (0.0031)
+[2025-01-15 20:51:13,866][18905] Updated weights for policy 0, policy_version 610 (0.0021)
+[2025-01-15 20:51:24,952][18905] Updated weights for policy 0, policy_version 620 (0.0030)
+[2025-01-15 20:51:33,904][18905] Updated weights for policy 0, policy_version 630 (0.0021)
+[2025-01-15 20:51:45,397][18905] Updated weights for policy 0, policy_version 640 (0.0030)
+[2025-01-15 20:51:54,896][18905] Updated weights for policy 0, policy_version 650 (0.0018)
+[2025-01-15 20:52:05,840][18905] Updated weights for policy 0, policy_version 660 (0.0030)
+[2025-01-15 20:52:11,666][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth...
+[2025-01-15 20:52:11,834][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth
+[2025-01-15 20:52:16,965][18905] Updated weights for policy 0, policy_version 670 (0.0029)
+[2025-01-15 20:52:26,215][18905] Updated weights for policy 0, policy_version 680 (0.0020)
+[2025-01-15 20:52:31,658][18890] Saving new best policy, reward=24.357!
+[2025-01-15 20:52:38,122][18905] Updated weights for policy 0, policy_version 690 (0.0028)
+[2025-01-15 20:52:41,670][18890] Saving new best policy, reward=26.488!
+[2025-01-15 20:52:46,669][18890] Saving new best policy, reward=26.503!
+[2025-01-15 20:52:47,883][18905] Updated weights for policy 0, policy_version 700 (0.0020)
+[2025-01-15 20:52:51,667][18890] Saving new best policy, reward=27.526!
+[2025-01-15 20:52:56,658][18890] Saving new best policy, reward=29.132!
+[2025-01-15 20:52:58,673][18905] Updated weights for policy 0, policy_version 710 (0.0026)
+[2025-01-15 20:53:09,768][18905] Updated weights for policy 0, policy_version 720 (0.0025)
+[2025-01-15 20:53:18,657][18905] Updated weights for policy 0, policy_version 730 (0.0014)
+[2025-01-15 20:53:30,135][18905] Updated weights for policy 0, policy_version 740 (0.0015)
+[2025-01-15 20:53:38,869][18905] Updated weights for policy 0, policy_version 750 (0.0014)
+[2025-01-15 20:53:49,919][18905] Updated weights for policy 0, policy_version 760 (0.0019)
+[2025-01-15 20:54:00,558][18905] Updated weights for policy 0, policy_version 770 (0.0022)
+[2025-01-15 20:54:09,799][18905] Updated weights for policy 0, policy_version 780 (0.0034)
+[2025-01-15 20:54:11,659][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth...
+[2025-01-15 20:54:11,783][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth
+[2025-01-15 20:54:21,517][18905] Updated weights for policy 0, policy_version 790 (0.0016)
+[2025-01-15 20:54:29,995][18905] Updated weights for policy 0, policy_version 800 (0.0030)
+[2025-01-15 20:54:41,458][18905] Updated weights for policy 0, policy_version 810 (0.0022)
+[2025-01-15 20:54:51,392][18905] Updated weights for policy 0, policy_version 820 (0.0022)
+[2025-01-15 20:55:01,548][18905] Updated weights for policy 0, policy_version 830 (0.0013)
+[2025-01-15 20:55:13,403][18905] Updated weights for policy 0, policy_version 840 (0.0018)
+[2025-01-15 20:55:22,095][18905] Updated weights for policy 0, policy_version 850 (0.0019)
+[2025-01-15 20:55:33,757][18905] Updated weights for policy 0, policy_version 860 (0.0018)
+[2025-01-15 20:55:44,201][18905] Updated weights for policy 0, policy_version 870 (0.0021)
+[2025-01-15 20:55:53,652][18905] Updated weights for policy 0, policy_version 880 (0.0013)
+[2025-01-15 20:56:05,098][18905] Updated weights for policy 0, policy_version 890 (0.0018)
+[2025-01-15 20:56:11,664][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000897_3674112.pth...
+[2025-01-15 20:56:11,799][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000666_2727936.pth
+[2025-01-15 20:56:13,697][18905] Updated weights for policy 0, policy_version 900 (0.0024)
+[2025-01-15 20:56:24,930][18905] Updated weights for policy 0, policy_version 910 (0.0020)
+[2025-01-15 20:56:34,779][18905] Updated weights for policy 0, policy_version 920 (0.0013)
+[2025-01-15 20:56:45,592][18905] Updated weights for policy 0, policy_version 930 (0.0014)
+[2025-01-15 20:56:57,177][18905] Updated weights for policy 0, policy_version 940 (0.0024)
+[2025-01-15 20:57:05,655][18905] Updated weights for policy 0, policy_version 950 (0.0024)
+[2025-01-15 20:57:17,074][18905] Updated weights for policy 0, policy_version 960 (0.0024)
+[2025-01-15 20:57:26,621][18905] Updated weights for policy 0, policy_version 970 (0.0037)
+[2025-01-15 20:57:37,201][18905] Updated weights for policy 0, policy_version 980 (0.0021)
+[2025-01-15 20:57:48,395][18905] Updated weights for policy 0, policy_version 990 (0.0025)
+[2025-01-15 20:57:57,347][18905] Updated weights for policy 0, policy_version 1000 (0.0035)
+[2025-01-15 20:58:08,940][18905] Updated weights for policy 0, policy_version 1010 (0.0019)
+[2025-01-15 20:58:11,658][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001013_4149248.pth...
+[2025-01-15 20:58:11,781][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth
+[2025-01-15 20:58:17,501][18905] Updated weights for policy 0, policy_version 1020 (0.0021)
+[2025-01-15 20:58:28,742][18905] Updated weights for policy 0, policy_version 1030 (0.0028)
+[2025-01-15 20:58:39,007][18905] Updated weights for policy 0, policy_version 1040 (0.0021)
+[2025-01-15 20:58:48,820][18905] Updated weights for policy 0, policy_version 1050 (0.0022)
+[2025-01-15 20:59:00,172][18905] Updated weights for policy 0, policy_version 1060 (0.0032)
+[2025-01-15 20:59:06,655][18890] Saving new best policy, reward=29.510!
+[2025-01-15 20:59:09,059][18905] Updated weights for policy 0, policy_version 1070 (0.0036)
+[2025-01-15 20:59:20,728][18905] Updated weights for policy 0, policy_version 1080 (0.0023)
+[2025-01-15 20:59:21,674][18890] Saving new best policy, reward=30.241!
+[2025-01-15 20:59:31,347][18905] Updated weights for policy 0, policy_version 1090 (0.0032)
+[2025-01-15 20:59:41,468][18905] Updated weights for policy 0, policy_version 1100 (0.0015)
+[2025-01-15 20:59:53,103][18905] Updated weights for policy 0, policy_version 1110 (0.0017)
+[2025-01-15 21:00:02,007][18905] Updated weights for policy 0, policy_version 1120 (0.0023)
+[2025-01-15 21:00:11,660][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001127_4616192.pth...
+[2025-01-15 21:00:11,784][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000897_3674112.pth
+[2025-01-15 21:00:13,992][18905] Updated weights for policy 0, policy_version 1130 (0.0017)
+[2025-01-15 21:00:24,610][18905] Updated weights for policy 0, policy_version 1140 (0.0023)
+[2025-01-15 21:00:34,395][18905] Updated weights for policy 0, policy_version 1150 (0.0020)
+[2025-01-15 21:00:45,859][18905] Updated weights for policy 0, policy_version 1160 (0.0032)
+[2025-01-15 21:00:54,606][18905] Updated weights for policy 0, policy_version 1170 (0.0028)
+[2025-01-15 21:01:06,189][18905] Updated weights for policy 0, policy_version 1180 (0.0018)
+[2025-01-15 21:01:17,374][18905] Updated weights for policy 0, policy_version 1190 (0.0019)
+[2025-01-15 21:01:27,577][18905] Updated weights for policy 0, policy_version 1200 (0.0026)
+[2025-01-15 21:01:39,529][18905] Updated weights for policy 0, policy_version 1210 (0.0036)
+[2025-01-15 21:01:49,185][18905] Updated weights for policy 0, policy_version 1220 (0.0018)
+[2025-01-15 21:02:00,663][18905] Updated weights for policy 0, policy_version 1230 (0.0036)
+[2025-01-15 21:02:11,666][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth...
+[2025-01-15 21:02:11,855][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001013_4149248.pth
+[2025-01-15 21:02:12,642][18905] Updated weights for policy 0, policy_version 1240 (0.0033)
+[2025-01-15 21:02:21,514][18905] Updated weights for policy 0, policy_version 1250 (0.0025)
+[2025-01-15 21:02:33,048][18905] Updated weights for policy 0, policy_version 1260 (0.0019)
+[2025-01-15 21:02:43,319][18905] Updated weights for policy 0, policy_version 1270 (0.0018)
+[2025-01-15 21:02:53,434][18905] Updated weights for policy 0, policy_version 1280 (0.0051)
+[2025-01-15 21:03:04,850][18905] Updated weights for policy 0, policy_version 1290 (0.0014)
+[2025-01-15 21:03:14,014][18905] Updated weights for policy 0, policy_version 1300 (0.0017)
+[2025-01-15 21:03:25,945][18905] Updated weights for policy 0, policy_version 1310 (0.0021)
+[2025-01-15 21:03:36,884][18905] Updated weights for policy 0, policy_version 1320 (0.0019)
+[2025-01-15 21:03:46,246][18905] Updated weights for policy 0, policy_version 1330 (0.0013)
+[2025-01-15 21:03:57,758][18905] Updated weights for policy 0, policy_version 1340 (0.0018)
+[2025-01-15 21:04:06,345][18905] Updated weights for policy 0, policy_version 1350 (0.0017)
+[2025-01-15 21:04:11,657][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001353_5541888.pth...
+[2025-01-15 21:04:11,847][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001127_4616192.pth
+[2025-01-15 21:04:17,554][18905] Updated weights for policy 0, policy_version 1360 (0.0019)
+[2025-01-15 21:04:28,374][18905] Updated weights for policy 0, policy_version 1370 (0.0024)
+[2025-01-15 21:04:37,717][18905] Updated weights for policy 0, policy_version 1380 (0.0027)
+[2025-01-15 21:04:49,511][18905] Updated weights for policy 0, policy_version 1390 (0.0025)
+[2025-01-15 21:04:58,799][18905] Updated weights for policy 0, policy_version 1400 (0.0024)
+[2025-01-15 21:05:10,020][18905] Updated weights for policy 0, policy_version 1410 (0.0023)
+[2025-01-15 21:05:21,268][18905] Updated weights for policy 0, policy_version 1420 (0.0024)
+[2025-01-15 21:05:30,178][18905] Updated weights for policy 0, policy_version 1430 (0.0028)
+[2025-01-15 21:05:42,217][18905] Updated weights for policy 0, policy_version 1440 (0.0023)
+[2025-01-15 21:05:51,587][18905] Updated weights for policy 0, policy_version 1450 (0.0026)
+[2025-01-15 21:06:02,491][18905] Updated weights for policy 0, policy_version 1460 (0.0020)
+[2025-01-15 21:06:11,665][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001468_6012928.pth...
+[2025-01-15 21:06:11,843][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth
+[2025-01-15 21:06:14,352][18905] Updated weights for policy 0, policy_version 1470 (0.0025)
+[2025-01-15 21:06:22,985][18905] Updated weights for policy 0, policy_version 1480 (0.0023)
+[2025-01-15 21:06:34,583][18905] Updated weights for policy 0, policy_version 1490 (0.0031)
+[2025-01-15 21:06:45,001][18905] Updated weights for policy 0, policy_version 1500 (0.0016)
+[2025-01-15 21:06:55,011][18905] Updated weights for policy 0, policy_version 1510 (0.0024)
+[2025-01-15 21:07:06,610][18905] Updated weights for policy 0, policy_version 1520 (0.0038)
+[2025-01-15 21:07:15,316][18905] Updated weights for policy 0, policy_version 1530 (0.0018)
+[2025-01-15 21:07:26,970][18905] Updated weights for policy 0, policy_version 1540 (0.0021)
+[2025-01-15 21:07:36,805][18905] Updated weights for policy 0, policy_version 1550 (0.0028)
+[2025-01-15 21:07:47,252][18905] Updated weights for policy 0, policy_version 1560 (0.0023)
+[2025-01-15 21:07:59,083][18905] Updated weights for policy 0, policy_version 1570 (0.0022)
+[2025-01-15 21:08:08,032][18905] Updated weights for policy 0, policy_version 1580 (0.0025)
+[2025-01-15 21:08:11,659][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001582_6479872.pth...
+[2025-01-15 21:08:11,810][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001353_5541888.pth
+[2025-01-15 21:08:19,825][18905] Updated weights for policy 0, policy_version 1590 (0.0016)
+[2025-01-15 21:08:30,054][18905] Updated weights for policy 0, policy_version 1600 (0.0037)
+[2025-01-15 21:08:40,211][18905] Updated weights for policy 0, policy_version 1610 (0.0042)
+[2025-01-15 21:08:51,935][18905] Updated weights for policy 0, policy_version 1620 (0.0024)
+[2025-01-15 21:09:00,171][18905] Updated weights for policy 0, policy_version 1630 (0.0024)
+[2025-01-15 21:09:11,789][18905] Updated weights for policy 0, policy_version 1640 (0.0032)
+[2025-01-15 21:09:21,556][18905] Updated weights for policy 0, policy_version 1650 (0.0025)
+[2025-01-15 21:09:31,744][18905] Updated weights for policy 0, policy_version 1660 (0.0022)
+[2025-01-15 21:09:43,156][18905] Updated weights for policy 0, policy_version 1670 (0.0026)
+[2025-01-15 21:09:51,515][18905] Updated weights for policy 0, policy_version 1680 (0.0022)
+[2025-01-15 21:10:02,980][18905] Updated weights for policy 0, policy_version 1690 (0.0033)
+[2025-01-15 21:10:11,669][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001699_6959104.pth...
+[2025-01-15 21:10:11,859][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001468_6012928.pth
+[2025-01-15 21:10:12,308][18905] Updated weights for policy 0, policy_version 1700 (0.0013)
+[2025-01-15 21:10:23,074][18905] Updated weights for policy 0, policy_version 1710 (0.0021)
+[2025-01-15 21:10:33,761][18905] Updated weights for policy 0, policy_version 1720 (0.0030)
+[2025-01-15 21:10:42,945][18905] Updated weights for policy 0, policy_version 1730 (0.0017)
+[2025-01-15 21:10:54,194][18905] Updated weights for policy 0, policy_version 1740 (0.0028)
+[2025-01-15 21:11:03,021][18905] Updated weights for policy 0, policy_version 1750 (0.0017)
+[2025-01-15 21:11:14,051][18905] Updated weights for policy 0, policy_version 1760 (0.0027)
+[2025-01-15 21:11:24,540][18905] Updated weights for policy 0, policy_version 1770 (0.0015)
+[2025-01-15 21:11:33,893][18905] Updated weights for policy 0, policy_version 1780 (0.0024)
+[2025-01-15 21:11:45,299][18905] Updated weights for policy 0, policy_version 1790 (0.0018)
+[2025-01-15 21:11:53,747][18905] Updated weights for policy 0, policy_version 1800 (0.0024)
+[2025-01-15 21:12:04,987][18905] Updated weights for policy 0, policy_version 1810 (0.0044)
+[2025-01-15 21:12:11,667][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001817_7442432.pth...
+[2025-01-15 21:12:11,834][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001582_6479872.pth
+[2025-01-15 21:12:14,806][18905] Updated weights for policy 0, policy_version 1820 (0.0015)
+[2025-01-15 21:12:24,843][18905] Updated weights for policy 0, policy_version 1830 (0.0019)
+[2025-01-15 21:12:35,856][18905] Updated weights for policy 0, policy_version 1840 (0.0013)
+[2025-01-15 21:12:44,627][18905] Updated weights for policy 0, policy_version 1850 (0.0018)
+[2025-01-15 21:12:55,985][18905] Updated weights for policy 0, policy_version 1860 (0.0021)
+[2025-01-15 21:13:04,731][18905] Updated weights for policy 0, policy_version 1870 (0.0021)
+[2025-01-15 21:13:11,668][18890] Saving new best policy, reward=30.845!
+[2025-01-15 21:13:15,672][18905] Updated weights for policy 0, policy_version 1880 (0.0025)
+[2025-01-15 21:13:26,680][18905] Updated weights for policy 0, policy_version 1890 (0.0023)
+[2025-01-15 21:13:35,543][18905] Updated weights for policy 0, policy_version 1900 (0.0025)
+[2025-01-15 21:13:46,924][18905] Updated weights for policy 0, policy_version 1910 (0.0029)
+[2025-01-15 21:13:55,349][18905] Updated weights for policy 0, policy_version 1920 (0.0022)
+[2025-01-15 21:14:06,595][18905] Updated weights for policy 0, policy_version 1930 (0.0018)
+[2025-01-15 21:14:11,668][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001936_7929856.pth...
+[2025-01-15 21:14:11,794][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001699_6959104.pth
+[2025-01-15 21:14:16,872][18905] Updated weights for policy 0, policy_version 1940 (0.0020)
+[2025-01-15 21:14:26,483][18905] Updated weights for policy 0, policy_version 1950 (0.0031)
+[2025-01-15 21:14:31,130][18890] Stopping Batcher_0...
+[2025-01-15 21:14:31,131][18890] Loop batcher_evt_loop terminating...
+[2025-01-15 21:14:31,139][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2025-01-15 21:14:31,254][18905] Weights refcount: 2 0
+[2025-01-15 21:14:31,267][18905] Stopping InferenceWorker_p0-w0...
+[2025-01-15 21:14:31,267][18905] Loop inference_proc0-0_evt_loop terminating...
+[2025-01-15 21:14:31,281][18890] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001817_7442432.pth
+[2025-01-15 21:14:31,299][18890] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
+[2025-01-15 21:14:31,530][18890] Stopping LearnerWorker_p0...
+[2025-01-15 21:14:31,531][18890] Loop learner_proc0_evt_loop terminating...
+[2025-01-15 21:14:31,930][18908] Stopping RolloutWorker_w4...
+[2025-01-15 21:14:31,932][18908] Loop rollout_proc4_evt_loop terminating...
+[2025-01-15 21:14:31,947][18906] Stopping RolloutWorker_w2...
+[2025-01-15 21:14:31,954][18906] Loop rollout_proc2_evt_loop terminating...
+[2025-01-15 21:14:31,985][18904] Stopping RolloutWorker_w0...
+[2025-01-15 21:14:31,987][18904] Loop rollout_proc0_evt_loop terminating...
+[2025-01-15 21:14:31,995][18910] Stopping RolloutWorker_w7...
+[2025-01-15 21:14:31,996][18910] Loop rollout_proc7_evt_loop terminating...
+[2025-01-15 21:14:32,004][18911] Stopping RolloutWorker_w6...
+[2025-01-15 21:14:32,008][18911] Loop rollout_proc6_evt_loop terminating...
+[2025-01-15 21:14:32,038][18907] Stopping RolloutWorker_w3...
+[2025-01-15 21:14:32,042][18907] Loop rollout_proc3_evt_loop terminating...
+[2025-01-15 21:14:32,051][18909] Stopping RolloutWorker_w5...
+[2025-01-15 21:14:32,058][18909] Loop rollout_proc5_evt_loop terminating...
+[2025-01-15 21:14:32,061][18903] Stopping RolloutWorker_w1...
+[2025-01-15 21:14:32,076][18903] Loop rollout_proc1_evt_loop terminating...