unreal.LearningAgentsTrainerTrainingSettings

class unreal.LearningAgentsTrainerTrainingSettings

Bases: StructBase

The configurable settings for the training process.

C++ Source:

  • Plugin: LearningAgents

  • Module: LearningAgentsTraining

  • File: LearningAgentsTrainer.h

Editor Properties: (see get_editor_property/set_editor_property)

  • action_regularization_weight (float): [Read-Write] Weight used to regularize actions. Larger values will encourage smaller actions but too large will cause actions to become always zero.

  • advantage_normalization (bool): [Read-Write] When true, advantages are normalized. This tends to makes training more robust to adjustments of the scale of rewards.

  • batch_size (int32): [Read-Write] Batch size to use for training. Smaller values tend to produce better results at the cost of slowing down training. Large batch sizes are much more computationally efficient when training on the GPU.

  • clip_advantages (bool): [Read-Write] When true, very large or small advantages will be clipped. This has few downsides and helps with numerical stability.

  • device (LearningAgentsTrainerDevice): [Read-Write] The device to train on.

  • discount_factor (float): [Read-Write] The discount factor to use during training. This affects how much the agent cares about future rewards vs near-term rewards. Should typically be a value less than but near 1.0.

  • entropy_weight (float): [Read-Write] Weighting used for the entropy bonus. Larger values encourage larger action noise and therefore greater exploration but can make actions very noisy.

  • epsilon_clip (float): [Read-Write] Clipping ratio to apply to policy updates. Keeps the training “on-policy”. Larger values may speed up training at the cost of stability. Conversely, too small values will keep the policy from being able to learn an optimal policy.

  • gae_lambda (float): [Read-Write] This is used in the Generalized Advantage Estimation as what is essentially an exponential smoothing/decay. Typical values should be between 0.9 and 1.0.

  • initial_action_scale (float): [Read-Write] The initial scaling for the weights of the output layer of the neural network. Typically, you would use this to scale down the initial actions as it can stabilize the training and speed up convergence.

  • learning_rate_critic (float): [Read-Write] Learning rate of the critic network. To avoid instability generally the critic should have a larger learning rate than the policy. Typically this can be set to 10x the rate of the policy.

  • learning_rate_decay (float): [Read-Write] Ratio by which to decay the learning rate every 1000 iterations.

  • learning_rate_policy (float): [Read-Write] Learning rate of the policy network. Typical values are between 0.001 and 0.0001.

  • number_of_iterations (int32): [Read-Write] The number of iterations to run before ending training.

  • number_of_steps_to_trim_at_end_of_episode (int32): [Read-Write] The number of steps to trim from the end of the episode. Can be useful if the end of the episode contains irrelevant data.

  • number_of_steps_to_trim_at_start_of_episode (int32): [Read-Write] The number of steps to trim from the start of the episode, e.g. can be useful if some things are still getting setup at the start of the episode and you don’t want them used for training.

  • random_seed (int32): [Read-Write] The seed used for any random sampling the trainer will perform, e.g. for weight initialization.

  • use_tensorboard (bool): [Read-Write] If true, TensorBoard logs will be emitted to the intermediate directory.

  • weight_decay (float): [Read-Write] Amount of weight decay to apply to the network. Larger values encourage network weights to be smaller but too large a value can cause the network weights to collapse to all zeros.