unreal.LearningAgentsTrainer
¶
- class unreal.LearningAgentsTrainer(outer: Object | None = None, name: Name | str = 'None')¶
Bases:
LearningAgentsManagerComponent
- The ULearningAgentsTrainer is the core class for reinforcement learning training. It has a few responsibilities:
It keeps track of which agents are gathering training data.
It defines how those agents’ rewards, completions, and resets are implemented.
It provides methods for orchestrating the training process.
To use this class, you need to implement the SetupRewards and SetupCompletions functions (as well as their corresponding SetRewards and SetCompletions functions), which will define the rewards and penalties the agent receives and what conditions cause an episode to end. Before you can begin training, you need to call SetupTrainer, which will initialize the underlying data structures, and you need to call AddAgent for each agent you want to gather training data from.
See: ULearningAgentsInteractor to understand how observations and actions work.
C++ Source:
Plugin: LearningAgents
Module: LearningAgentsTraining
File: LearningAgentsTrainer.h
Editor Properties: (see get_editor_property/set_editor_property)
asset_user_data
(Array[AssetUserData]): [Read-Write] Array of user data stored with the componentauto_activate
(bool): [Read-Write] Whether the component is activated at creation or must be explicitly activated.can_ever_affect_navigation
(bool): [Read-Write] Whether this component can potentially influence navigationcompletion_objects
(Array[LearningAgentsCompletion]): [Read-Only] The list of current completion objects.component_tags
(Array[Name]): [Read-Write] Array of tags that can be used for grouping and categorizing. Can also be accessed from scripting.critic
(LearningAgentsCritic): [Read-Only] The current critic.editable_when_inherited
(bool): [Read-Write] True if this component can be modified when it was inherited from a parent actor classhas_training_failed
(bool): [Read-Only] True if trainer encountered an unrecoverable error during training (e.g. the trainer process timed out). Otherwise, false. This exists mainly to keep the editor from locking up if something goes wrong during training.helper_objects
(Array[LearningAgentsHelper]): [Read-Only] The list of current helper objects.interactor
(LearningAgentsInteractor): [Read-Only] The agent interactor associated with this component.is_editor_only
(bool): [Read-Write] If true, the component will be excluded from non-editor buildsis_setup
(bool): [Read-Only] True if this component has been setup. Otherwise, false.is_training
(bool): [Read-Only] True if training is currently in-progress. Otherwise, false.manager
(LearningAgentsManager): [Read-Only] The associated manager this component is attached to.on_component_activated
(ActorComponentActivatedSignature): [Read-Write] Called when the component has been activated, with parameter indicating if it was from a reseton_component_deactivated
(ActorComponentDeactivateSignature): [Read-Write] Called when the component has been deactivatedpolicy
(LearningAgentsPolicy): [Read-Only] The current policy for experience gathering.primary_component_tick
(ActorComponentTickFunction): [Read-Write] Main tick function for the Componentreplicate_using_registered_sub_object_list
(bool): [Read-Write] When true the replication system will only replicate the registered subobjects list When false the replication system will instead call the virtual ReplicateSubObjects() function where the subobjects need to be manually replicated.replicates
(bool): [Read-Write] Is this component currently replicating? Should the network code consider it for replication? Owning Actor must be replicating first!reward_objects
(Array[LearningAgentsReward]): [Read-Only] The list of current reward objects.
- begin_training(trainer_training_settings=[], trainer_game_settings=[], trainer_path_settings=[], critic_settings=[], reinitialize_policy_network=True, reinitialize_critic_network=True, reset_agents_on_begin=True) None ¶
Begins the training process with the provided settings.
- Parameters:
trainer_training_settings (LearningAgentsTrainerTrainingSettings) – The settings for this training run.
trainer_game_settings (LearningAgentsTrainerGameSettings) – The settings that will affect the game’s simulation.
trainer_path_settings (LearningAgentsTrainerPathSettings) – The path settings used by the trainer.
critic_settings (LearningAgentsCriticSettings) – The settings for the critic (if we are using one).
reinitialize_policy_network (bool) – If true, reinitialize the policy. Set this to false if your policy is pre-trained, e.g. with imitation learning.
reinitialize_critic_network (bool) – If true, reinitialize the critic. Set this to false if your critic is pre-trained.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.
- evaluate_completions() None ¶
Call this function when it is time to evaluate the completions for your agents. This should be done at the beginning of each iteration of your training loop after the initial step, i.e. after taking an action, you want to get into the next state before evaluating the completions.
- evaluate_rewards() None ¶
Call this function when it is time to evaluate the rewards for your agents. This should be done at the beginning of each iteration of your training loop after the initial step, i.e. after taking an action, you want to get into the next state before evaluating the rewards.
- get_reward(agent_id=-1) float ¶
Gets the current reward for an agent according to the critic. Should be called only after EvaluateRewards.
- Parameters:
agent_id (int32) – The AgentId to look-up the reward for
- Returns:
The reward
- Return type:
- has_training_failed() bool ¶
Returns true if the trainer has failed to communicate with the external training process. This can be used in combination with RunTraining to avoid filling the logs with errors.
- Returns:
True if the training has failed. Otherwise, false.
- Return type:
- is_completed(agent_id=-1) LearningAgentsCompletionEnum or None ¶
Gets if the agent will complete the episode or not according to the given set of completions. Should be called only after EvaluateCompletions.
- Parameters:
agent_id (int32) – The AgentId to look-up the completion for
- Returns:
If the agent will complete the episode
out_completion (LearningAgentsCompletionEnum): The completion type if the agent will complete the episode
- Return type:
LearningAgentsCompletionEnum or None
- is_training() bool ¶
Returns true if the trainer is currently training; Otherwise, false.
- Return type:
- process_experience() None ¶
Call this function at the end of each step of your training loop. This takes the current observations/actions/ rewards and moves them into the episode experience buffer. All agents with full episode buffers or those which have been signaled complete will be reset. If enough experience is gathered, it will be sent to the training process and an iteration of training will be run and the updated policy will be synced back.
- reset_episodes(agent_ids) None ¶
During this event, all episodes should be reset for each agent.
- Parameters:
agent_ids (Array[int32]) – The ids of the agents that need resetting. See: GetAgent to get the agent corresponding to each id.
- run_training(trainer_training_settings=[], trainer_game_settings=[], trainer_path_settings=[], critic_settings=[], reinitialize_policy_network=True, reinitialize_critic_network=True, reset_agents_on_begin=True) None ¶
Convenience function that runs a basic training loop. If training has not been started, it will start it, and then call RunInference. On each following call to this function, it will call EvaluateRewards, EvaluateCompletions, and ProcessExperience, followed by RunInference.
- Parameters:
trainer_training_settings (LearningAgentsTrainerTrainingSettings) – The settings for this training run.
trainer_game_settings (LearningAgentsTrainerGameSettings) – The settings that will affect the game’s simulation.
trainer_path_settings (LearningAgentsTrainerPathSettings) – The path settings used by the trainer.
critic_settings (LearningAgentsCriticSettings) – The settings for the critic (if we are using one).
reinitialize_policy_network (bool) – If true, reinitialize the policy. Set this to false if your policy is pre-trained, e.g. with imitation learning.
reinitialize_critic_network (bool) – If true, reinitialize the critic. Set this to false if your critic is pre-trained.
reset_agents_on_begin (bool) – If true, reset all agents at the beginning of training.
- set_completions(agent_ids) None ¶
During this event, all completions should be set for each agent.
- Parameters:
agent_ids (Array[int32]) – The list of agent ids to set completions for. See: LearningAgentsCompletions.h for the list of available completions. See: GetAgent to get the agent corresponding to each id.
- set_rewards(agent_ids) None ¶
During this event, all rewards/penalties should be set for each agent.
- Parameters:
agent_ids (Array[int32]) – The list of agent ids to set rewards/penalties for. See: LearningAgentsRewards.h for the list of available rewards/penalties. See: GetAgent to get the agent corresponding to each id.
- setup_completions() None ¶
During this event, all completions should be added to this trainer. See: LearningAgentsCompletions.h for the list of available completions.
- setup_rewards() None ¶
During this event, all rewards/penalties should be added to this trainer. See: LearningAgentsRewards.h for the list of available rewards/penalties.
- setup_trainer(interactor, policy, critic=None, trainer_settings=[]) None ¶
Initializes this object and runs the setup functions for rewards and completions.
- Parameters:
interactor (LearningAgentsInteractor) – The agent interactor we are training with.
policy (LearningAgentsPolicy) – The policy to be trained.
critic (LearningAgentsCritic) – Optional - only needs to be provided if we want the critic to be accessible at runtime.
trainer_settings (LearningAgentsTrainerSettings) – The trainer settings to use.