unreal.LearningAgentsTrainerTrainingSettings

class unreal.LearningAgentsTrainerTrainingSettings

Bases: StructBase

The configurable settings for the training process.

C++ Source:

  • Plugin: LearningAgents

  • Module: LearningAgentsTraining

  • File: LearningAgentsTrainer.h

Editor Properties: (see get_editor_property/set_editor_property)

  • action_entropy_weight (float): [Read-Write] Weighting used for the entropy bonus. Larger values encourage larger action noise and therefore greater exploration but can make actions very noisy.

  • action_regularization_weight (float): [Read-Write] Weight used to regularize actions. Larger values will encourage exploration and smaller actions, but too large will cause noisy actions centered around zero.

  • advantage_normalization (bool): [Read-Write] When true, advantages are normalized. This tends to makes training more robust to adjustments of the scale of rewards.

  • critic_batch_size (int32): [Read-Write] Batch size to use for training the critic. Large batch sizes are much more computationally efficient when training on the GPU.

  • critic_warmup_iterations (int32): [Read-Write] Number of iterations of training to perform to warm - up the Critic. This helps speed up and stabilize training at the beginning when the Critic may be producing predictions at the wrong order of magnitude.

  • device (LearningAgentsTrainerDevice): [Read-Write] The device to train on.

  • discount_factor (float): [Read-Write] The discount factor to use during training. This affects how much the agent cares about future rewards vs near-term rewards. Should typically be a value less than but near 1.0.

  • epsilon_clip (float): [Read-Write] Clipping ratio to apply to policy updates. Keeps the training “on-policy”. Larger values may speed up training at the cost of stability. Conversely, too small values will keep the policy from being able to learn an optimal policy.

  • gae_lambda (float): [Read-Write] This is used in the Generalized Advantage Estimation, where larger values will tend to assign more credit to recent actions. Typical values should be between 0.9 and 1.0.

  • grad_norm_max (float): [Read-Write] The maximum gradient norm to clip updates to. Only used when bUseGradNormMaxClipping is set to true.

    This needs to be carefully chosen based on the size of your gradients during training. Setting too low can make it difficult to learn an optimal policy, and too high will have no impact.

  • iterations_per_gather (int32): [Read-Write] Number of training iterations to perform per buffer of experience gathered. This should be large enough for the critic and policy to be effectively updated, but too large and it will simply slow down training.

  • learning_rate_critic (float): [Read-Write] Learning rate of the critic network. To avoid instability generally the critic should have a larger learning rate than the policy. Typically this can be set to 10x the rate of the policy.

  • learning_rate_decay (float): [Read-Write] Amount by which to multiply the learning rate every 1000 iterations.

  • learning_rate_policy (float): [Read-Write] Learning rate of the policy network. Typical values are between 0.001 and 0.0001.

  • maximum_advantage (float): [Read-Write] The maximum advantage to allow. Making this smaller may increase training stability at the cost of some training speed.

  • minimum_advantage (float): [Read-Write] The minimum advantage to allow. Setting this below zero will encourage the policy to move away from bad actions, but can introduce instability.

  • number_of_iterations (int32): [Read-Write] The number of iterations to run before ending training.

  • number_of_steps_to_trim_at_end_of_episode (int32): [Read-Write] The number of steps to trim from the end of the episode. Can be useful if the end of the episode contains irrelevant data.

  • number_of_steps_to_trim_at_start_of_episode (int32): [Read-Write] The number of steps to trim from the start of the episode, e.g. can be useful if some things are still getting setup at the start of the episode and you don’t want them used for training.

  • policy_batch_size (int32): [Read-Write] Batch size to use for training the policy. Large batch sizes are much more computationally efficient when training on the GPU.

  • policy_window_size (int32): [Read-Write] The number of consecutive steps of observations and actions over which to train the policy. Increasing this value will encourage the policy to use its memory effectively. Too large and training can become slow and unstable.

  • random_seed (int32): [Read-Write] The seed used for any random sampling the trainer will perform, e.g. for weight initialization.

  • return_regularization_weight (float): [Read-Write] Weight used to regularize returns. Encourages the critic not to over or under estimate returns.

  • save_snapshots (bool): [Read-Write] If true, snapshots of the trained networks will be emitted to the intermediate directory.

  • use_grad_norm_max_clipping (bool): [Read-Write] When true, gradient norm max clipping will be used on the policy, critic, encoder, and decoder. Set this as True if training is unstable (and adjust GradNormMax) or leave as False if unused.

  • use_tensorboard (bool): [Read-Write] If true, TensorBoard logs will be emitted to the intermediate directory.

  • weight_decay (float): [Read-Write] Amount of weight decay to apply to the network. Larger values encourage network weights to be smaller but too large a value can cause the network weights to collapse to all zeros.