Develop action masking (#1080)

* [Initial Commit] Modified the model.py file and the ppo/trainer.py file to use masked actions * Preliminary modifications to the python side of the code to enable action masking * Preliminary modifications to the C# side of the code to enable action masking * Preliminary modifications to the communication side of the code to enable action masking * Implemented action masking for BC Note : The actions of the teacher are not masked * More error messages for the action masking * fix pytests * Added Documentation * Address comment * Addressed Comments on docs * Addressed second comment on docs * Addressed comments for the python side of the code * Created the action masker and associated unit tests * Addressed comments on the C# side * Addressed the comment regarding action_masking_name * Addressed the comments
6 年前 · ded0d8c7
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md

 Note that the above code example is a simplified extract from the AreaAgent class, which provides alternate implementations for both the discrete and the continuous action spaces.

+#### Masking Discrete Actions
+When using Discrete Actions, it is possible to specify that some actions are impossible for the next decision. Then the agent is controlled by an External or Internal Brain, the agent will be unable to perform the specified action. Note that when the agent is controlled by a Player or Heuristic Brain, the agent will still be able to decide to perform the masked action. In order to mask an action, call the method `SetActionMask` within the `CollectObservation` method :
+
+```csharp
+SetActionMask(branch, actionIndices)
+```
+Where : 
+
+ * `branch` is the index (starting at 0) of the branch on which you want to mask the action
+ * `actionIndices` is a list of `int` or a single `int` corresponding to the index of theaction that the agent cannot perform.
+
+For example, if you have an agent with 2 branches and on the first branch (branch 0) there are 4 possible actions : _"do nothing"_, _"jump"_, _"shoot"_ and _"change weapon"_. Then with the code bellow, the agent will either _"do nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2 are masked)
+
+```csharp
+SetActionMask(0, new int[2]{1,2})
+```
+
+Notes: 
+
+ * You can call `SetActionMask` multiple times if you want to put masks on multiple branches.
+ * You cannot mask all the actions of a branch.
+ * You cannot mask actions in continuous control.
+
+
 ## Rewards

 In reinforcement learning, the reward is a signal that the agent has done something right. The PPO reinforcement learning algorithm works by optimizing the choices an agent makes such that the agent earns the highest cumulative reward over time. The better your reward mechanism, the better your agent will learn.
--- a/python/communicator_objects/agent_info_proto_pb2.py
+++ b/python/communicator_objects/agent_info_proto_pb2.py
  name='communicator_objects/agent_info_proto.proto',
  package='communicator_objects',
  syntax='proto3',
-  serialized_pb=_b('\n+communicator_objects/agent_info_proto.proto\x12\x14\x63ommunicator_objects\"\xfd\x01\n\x0e\x41gentInfoProto\x12\"\n\x1astacked_vector_observation\x18\x01 \x03(\x02\x12\x1b\n\x13visual_observations\x18\x02 \x03(\x0c\x12\x18\n\x10text_observation\x18\x03 \x01(\t\x12\x1d\n\x15stored_vector_actions\x18\x04 \x03(\x02\x12\x1b\n\x13stored_text_actions\x18\x05 \x01(\t\x12\x10\n\x08memories\x18\x06 \x03(\x02\x12\x0e\n\x06reward\x18\x07 \x01(\x02\x12\x0c\n\x04\x64one\x18\x08 \x01(\x08\x12\x18\n\x10max_step_reached\x18\t \x01(\x08\x12\n\n\x02id\x18\n \x01(\x05\x42\x1f\xaa\x02\x1cMLAgents.CommunicatorObjectsb\x06proto3')
+  serialized_pb=_b('\n+communicator_objects/agent_info_proto.proto\x12\x14\x63ommunicator_objects\"\x92\x02\n\x0e\x41gentInfoProto\x12\"\n\x1astacked_vector_observation\x18\x01 \x03(\x02\x12\x1b\n\x13visual_observations\x18\x02 \x03(\x0c\x12\x18\n\x10text_observation\x18\x03 \x01(\t\x12\x1d\n\x15stored_vector_actions\x18\x04 \x03(\x02\x12\x1b\n\x13stored_text_actions\x18\x05 \x01(\t\x12\x10\n\x08memories\x18\x06 \x03(\x02\x12\x0e\n\x06reward\x18\x07 \x01(\x02\x12\x0c\n\x04\x64one\x18\x08 \x01(\x08\x12\x18\n\x10max_step_reached\x18\t \x01(\x08\x12\n\n\x02id\x18\n \x01(\x05\x12\x13\n\x0b\x61\x63tion_mask\x18\x0b \x03(\x08\x42\x1f\xaa\x02\x1cMLAgents.CommunicatorObjectsb\x06proto3')
 )


      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      options=None, file=DESCRIPTOR),
+    _descriptor.FieldDescriptor(
+      name='action_mask', full_name='communicator_objects.AgentInfoProto.action_mask', index=10,
+      number=11, type=8, cpp_type=7, label=3,
+      has_default_value=False, default_value=[],
+      message_type=None, enum_type=None, containing_type=None,
+      is_extension=False, extension_scope=None,
+      options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  oneofs=[
  ],
  serialized_start=70,
-  serialized_end=323,
+  serialized_end=344,
 )

 DESCRIPTOR.message_types_by_name['AgentInfoProto'] = _AGENTINFOPROTO
--- a/python/tests/test_bc.py
+++ b/python/tests/test_bc.py
                         model.dropout_rate: 1.0,
                         model.sequence_length: 1,
                         model.vector_in: np.array([[1, 2, 3, 1, 2, 3],
-                                                   [3, 4, 5, 3, 4, 5]])}
+                                                   [3, 4, 5, 3, 4, 5]]),
+                         model.action_masks: np.ones([2,2])}
            sess.run(run_list, feed_dict=feed_dict)
            env.close()

                         model.vector_in: np.array([[1, 2, 3, 1, 2, 3],
                                                   [3, 4, 5, 3, 4, 5]]),
                         model.visual_in[0]: np.ones([2, 40, 30, 3]),
-                         model.visual_in[1]: np.ones([2, 40, 30, 3])}
+                         model.visual_in[1]: np.ones([2, 40, 30, 3]),
+                         model.action_masks: np.ones([2,2])}
            sess.run(run_list, feed_dict=feed_dict)
            env.close()

--- a/python/tests/test_ppo.py
+++ b/python/tests/test_ppo.py
                         model.vector_in: np.array([[1, 2, 3, 1, 2, 3],
                                                    [3, 4, 5, 3, 4, 5]]),
                         model.visual_in[0]: np.ones([2, 40, 30, 3]),
-                         model.visual_in[1]: np.ones([2, 40, 30, 3])
+                         model.visual_in[1]: np.ones([2, 40, 30, 3]),
+                         model.action_masks: np.ones([2,2])
                         }
            sess.run(run_list, feed_dict=feed_dict)
            env.close()
            feed_dict = {model.batch_size: 2,
                         model.sequence_length: 1,
                         model.vector_in: np.array([[1, 2, 3, 1, 2, 3],
-                                                    [3, 4, 5, 3, 4, 5]])}
+                                                    [3, 4, 5, 3, 4, 5]]),
+                         model.action_masks: np.ones([2,2])}
            sess.run(run_list, feed_dict=feed_dict)
            env.close()

                         model.prev_action: [[0], [0]],
                         model.memory_in: np.zeros((1, memory_size)),
                         model.vector_in: np.array([[1, 2, 3, 1, 2, 3],
-                                                    [3, 4, 5, 3, 4, 5]])}
+                                                    [3, 4, 5, 3, 4, 5]]),
+                         model.action_masks: np.ones([1,2])}
            sess.run(run_list, feed_dict=feed_dict)
            env.close()

                                                    [3, 4, 5, 3, 4, 5]]),
                         model.next_vector_in: np.array([[1, 2, 3, 1, 2, 3],
                                                         [3, 4, 5, 3, 4, 5]]),
-                         model.action_holder: [[0], [0]]}
+                         model.action_holder: [[0], [0]],
+                         model.action_masks: np.ones([2,2])}
            sess.run(run_list, feed_dict=feed_dict)
            env.close()

                         model.visual_in[0]: np.ones([2, 40, 30, 3]),
                         model.visual_in[1]: np.ones([2, 40, 30, 3]),
                         model.next_visual_in[0]: np.ones([2, 40, 30, 3]),
-                         model.next_visual_in[1]: np.ones([2, 40, 30, 3])
+                         model.next_visual_in[1]: np.ones([2, 40, 30, 3]),
+                         model.action_masks: np.ones([2,2])
                         }
            sess.run(run_list, feed_dict=feed_dict)
            env.close()
--- a/python/unityagents/brain.py
+++ b/python/unityagents/brain.py
 class BrainInfo:
    def __init__(self, visual_observation, vector_observation, text_observations, memory=None,
                 reward=None, agents=None, local_done=None,
-                 vector_action=None, text_action=None, max_reached=None):
+                 vector_action=None, text_action=None, max_reached=None, action_mask=None):
        """
        Describes experience at current step of all agents linked to a brain.
        """
        self.agents = agents
        self.previous_vector_actions = vector_action
        self.previous_text_actions = text_action
+        self.action_masks = action_mask


 AllBrainInfo = Dict[str, BrainInfo]
--- a/python/unityagents/environment.py
+++ b/python/unityagents/environment.py
            else:
                [x.memories.extend([0] * (memory_size - len(x.memories))) for x in agent_info_list]
                memory = np.array([x.memories for x in agent_info_list])
+            total_num_actions = sum(self.brains[b].vector_action_space_size)
+            mask_actions = np.ones((len(agent_info_list), total_num_actions))
+            for agent_index, agent_info in enumerate(agent_info_list):
+                if agent_info.action_mask is not None:
+                    mask_actions[agent_index, :] = [
+                        0 if agent_info.action_mask[k] else 1 for k in range(total_num_actions)]
            if any([np.isnan(x.reward) for x in agent_info_list]):
                logger.warning("An agent had a NaN reward for brain "+b)
            if any([np.isnan(x.stacked_vector_observation).any() for x in agent_info_list]):
                local_done=[x.done for x in agent_info_list],
                vector_action=np.array([x.stored_vector_actions for x in agent_info_list]),
                text_action=[x.stored_text_actions for x in agent_info_list],
-                max_reached=[x.max_step_reached for x in agent_info_list]
+                max_reached=[x.max_step_reached for x in agent_info_list],
+                action_mask=mask_actions
                )
        return _data, global_done

--- a/python/unitytrainers/bc/models.py
+++ b/python/unitytrainers/bc/models.py
                        kernel_initializer=c_layers.variance_scaling_initializer(factor=0.01)))
            self.action_probs = tf.concat(
                [tf.nn.softmax(branch) for branch in policy_branches], axis=1, name="action_probs")
-            self.sample_action_float = tf.concat([tf.multinomial(branch, 1) for branch in policy_branches], axis=1)
+            self.action_masks = tf.placeholder(shape=[None, sum(self.a_size)], dtype=tf.float32, name="action_masks")
+            self.sample_action_float = self.create_discrete_action_masking_layer(
+                policy_branches, self.action_masks, self.a_size)
            self.sample_action_float = tf.identity(self.sample_action_float, name="action")
            self.sample_action = tf.cast(self.sample_action_float, tf.int32)
            self.true_action = tf.placeholder(shape=[None, len(policy_branches)], dtype=tf.int32, name="teacher_action")
--- a/python/unitytrainers/bc/trainer.py
+++ b/python/unitytrainers/bc/trainer.py
                feed_dict[self.model.visual_in[i]] = agent_brain.visual_observations[i]
        if self.use_vector_observations:
            feed_dict[self.model.vector_in] = agent_brain.vector_observations
+        if not self.is_continuous_action:
+            feed_dict[self.model.action_masks] = agent_brain.action_masks
        if self.use_recurrent:
            if agent_brain.memories.shape[1] == 0:
                agent_brain.memories = np.zeros((len(agent_brain.agents), self.m_size))
        :param next_info: Next AllBrainInfo (Dictionary of all current brains and corresponding BrainInfo).
        :param take_action_outputs: The outputs of the take action method.
        """
-
        # Used to collect teacher experience into training buffer
        info_teacher = curr_info[self.brain_to_imitate]
        next_info_teacher = next_info[self.brain_to_imitate]
                for i, _ in enumerate(self.model.visual_in):
                    _obs = np.array(_buffer['visual_observations%d' % i][start:end])
                    feed_dict[self.model.visual_in[i]] = _obs
+            if not self.is_continuous_action:
+                feed_dict[self.model.action_masks] = np.ones(
+                    (self.n_sequences, sum(self.brain.vector_action_space_size)))
            if self.use_recurrent:
                feed_dict[self.model.memory_in] = np.zeros([self.n_sequences, self.m_size])
            loss, _ = self.sess.run([self.model.loss, self.model.update], feed_dict=feed_dict)
--- a/python/unitytrainers/models.py
+++ b/python/unitytrainers/models.py
                                                                 num_layers, scope, reuse)
        return hidden_flat

+    @staticmethod
+    def create_discrete_action_masking_layer(branches_logits, action_masks, action_size):
+        """
+        Creates a masking layer for the discrete actions
+        :param branches_logits: A list of the unnormalized action probabilities fir each branch
+        :param action_masks: The mask for the logits. Must be of dimension [None x total_number_of_action]
+        :param action_size: A list containing the number of possible actions for each branch
+        :return: The action output dimension [batch_size, num_branches]
+        """
+        action_idx = [0] + list(np.cumsum(action_size))
+        branch_masks = [action_masks[:, action_idx[i]:action_idx[i + 1]] for i in range(len(action_size))]
+        raw_probs = [tf.multiply(tf.nn.softmax(branches_logits[k]), branch_masks[k])
+                     for k in range(len(action_size))]
+        normalized_probs = [tf.divide(raw_probs[k], tf.reduce_sum(raw_probs[k], axis=1, keepdims=True))
+                            for k in range(len(action_size))]
+        output = tf.concat([tf.multinomial(tf.log(normalized_probs[k]), 1) for k in range(len(action_size))], axis=1)
+        return output
+
    def create_observation_streams(self, num_streams, h_size, num_layers):
        """
        Creates encoding stream for observations.

        self.all_log_probs = tf.concat([branch for branch in policy_branches], axis=1, name="action_probs")

-        output = tf.concat([tf.multinomial(branch, 1) for branch in policy_branches], axis=1)
+        self.action_masks = tf.placeholder(shape=[None, sum(self.a_size)], dtype=tf.float32, name="action_masks")
+        output = self.create_discrete_action_masking_layer(policy_branches, self.action_masks, self.a_size)

        self.output = tf.identity(output, name="action")

--- a/python/unitytrainers/ppo/trainer.py
+++ b/python/unitytrainers/ppo/trainer.py

 class PPOTrainer(Trainer):
    """The PPOTrainer is an implementation of the PPO algorithm."""
-
+    action_masking_name = 'action_masks'
+    
    def __init__(self, sess, env, brain_name, trainer_parameters, training, seed, run_id):
        """
        Responsible for collecting experiences and training PPO model.
                feed_dict[self.model.visual_in[i]] = curr_brain_info.visual_observations[i]
        if self.use_vector_obs:
            feed_dict[self.model.vector_in] = curr_brain_info.vector_observations
+        if not self.is_continuous_action:
+            feed_dict[self.model.action_masks] = curr_brain_info.action_masks

        values = self.sess.run(self.inference_run_list, feed_dict=feed_dict)
        run_out = dict(zip(self.inference_run_list, values))
                    if self.is_continuous_action:
                        actions_pre = stored_take_action_outputs[self.model.output_pre]
                        self.training_buffer[agent_id]['actions_pre'].append(actions_pre[idx])
+                    else:
+                        self.training_buffer[agent_id][self.action_masking_name].append(stored_info.action_masks[idx])
                    a_dist = stored_take_action_outputs[self.model.all_log_probs]
                    value = stored_take_action_outputs[self.model.value]
                    self.training_buffer[agent_id]['actions'].append(actions[idx])
                    if self.use_recurrent:
                        feed_dict[self.model.prev_action] = np.array(buffer['prev_action'][start:end]).reshape(
                            [-1, len(self.brain.vector_action_space_size)])
+                    feed_dict[self.model.action_masks] = np.array(buffer[self.action_masking_name][start:end]).reshape(
+                        [-1, sum(self.brain.vector_action_space_size)]
+                    )
                if self.use_vector_obs:
                    total_observation_length = self.brain.vector_observation_space_size * \
                                               self.brain.num_stacked_vector_observations
--- a/unity-environment/Assets/ML-Agents/Scripts/Agent.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/Agent.cs
 using System.Collections.Generic;
+using System.Linq;
 using UnityEngine;


        /// Keeps track of the last text action taken by the Brain.
        /// </summary>
        public string storedTextActions;
+
+        /// <summary>
+        /// For discrete control, specifies the actions that the agent cannot take. Is true if
+        /// the action is masked.
+        /// </summary>
+        public bool[] actionMasks;

        /// <summary>
        /// Used by the Trainer to store information about the agent. This data
        /// to separate between different agents in the environment.
        int id;

+        /// Keeps track of the actions that are masked at each step.
+        private ActionMasker actionMasker;
+
        /// Array of Texture2D used to render to from render buffer before  
        /// transforming into float tensor.
        Texture2D[] textureArray;
            }

            BrainParameters param = brain.brainParameters;
+            actionMasker = new ActionMasker(param);
            if (param.vectorActionSpaceType == SpaceType.continuous)
            {
                action.vectorActions = new float[param.vectorActionSize[0]];
            info.storedVectorActions = action.vectorActions;
            info.storedTextActions = action.textActions;
            info.vectorObservation.Clear();
+            actionMasker.ResetMask();
+            info.actionMasks = actionMasker.GetMask();

            BrainParameters param = brain.brainParameters;
            if (info.vectorObservation.Count != param.vectorObservationSize)
        {

        }
+
+        /// <summary>
+        /// Sets an action mask for discrete control agents. When used, the agent will not be
+        /// able to perform the action passed as argument at the next decision. If no branch is
+        /// specified, the default branch will be 0. The actionIndex or actionIndices correspond
+        /// to the action the agent will be unable to perform.
+        /// </summary>
+        /// <param name="actionIndices">The indices of the masked actions on branch 0</param>
+        protected void SetActionMask(IEnumerable<int> actionIndices)
+        {
+            actionMasker.SetActionMask(0, actionIndices);
+        }
+        
+        /// <summary>
+        /// Sets an action mask for discrete control agents. When used, the agent will not be
+        /// able to perform the action passed as argument at the next decision. If no branch is
+        /// specified, the default branch will be 0. The actionIndex or actionIndices correspond
+        /// to the action the agent will be unable to perform.
+        /// </summary>
+        /// <param name="actionIndex">The index of the masked action on branch 0</param>
+        protected void SetActionMask(int actionIndex)
+        {
+            actionMasker.SetActionMask(0, new int[1]{actionIndex});
+        }
+        
+        /// <summary>
+        /// Sets an action mask for discrete control agents. When used, the agent will not be
+        /// able to perform the action passed as argument at the next decision. If no branch is
+        /// specified, the default branch will be 0. The actionIndex or actionIndices correspond
+        /// to the action the agent will be unable to perform.
+        /// </summary>
+        /// <param name="branch">The branch for which the actions will be masked</param>
+        /// <param name="actionIndex">The index of the masked action</param>
+        protected void SetActionMask(int branch, int actionIndex)
+        {
+            actionMasker.SetActionMask(branch, new int[1]{actionIndex});
+        }
+
+        /// <summary>
+        /// Modifies an action mask for discrete control agents. When used, the agent will not be
+        /// able to perform the action passed as argument at the next decision. If no branch is
+        /// specified, the default branch will be 0. The actionIndex or actionIndices correspond
+        /// to the action the agent will be unable to perform.
+        /// </summary>
+        /// <param name="branch">The branch for which the actions will be masked</param>
+        /// <param name="actionIndices">The indices of the masked actions</param>
+        protected void SetActionMask(int branch, IEnumerable<int> actionIndices)
+        {
+            actionMasker.SetActionMask(branch, actionIndices);
+        }
+        

        /// <summary>
        /// Adds a float observation to the vector observations of the agent.
--- a/unity-environment/Assets/ML-Agents/Scripts/Batcher.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/Batcher.cs
                Done = info.done,
                Id = info.id,
            };
-
            if (info.memories != null)
            {
                agentInfoProto.Memories.Add(info.memories);
--- a/unity-environment/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentInfoProto.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/CommunicatorObjects/AgentInfoProto.cs
      byte[] descriptorData = global::System.Convert.FromBase64String(
          string.Concat(
            "Citjb21tdW5pY2F0b3Jfb2JqZWN0cy9hZ2VudF9pbmZvX3Byb3RvLnByb3Rv",
-            "EhRjb21tdW5pY2F0b3Jfb2JqZWN0cyL9AQoOQWdlbnRJbmZvUHJvdG8SIgoa",
+            "EhRjb21tdW5pY2F0b3Jfb2JqZWN0cyKSAgoOQWdlbnRJbmZvUHJvdG8SIgoa",
-            "aWQYCiABKAVCH6oCHE1MQWdlbnRzLkNvbW11bmljYXRvck9iamVjdHNiBnBy",
-            "b3RvMw=="));
+            "aWQYCiABKAUSEwoLYWN0aW9uX21hc2sYCyADKAhCH6oCHE1MQWdlbnRzLkNv",
+            "bW11bmljYXRvck9iamVjdHNiBnByb3RvMw=="));
-            new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.AgentInfoProto), global::MLAgents.CommunicatorObjects.AgentInfoProto.Parser, new[]{ "StackedVectorObservation", "VisualObservations", "TextObservation", "StoredVectorActions", "StoredTextActions", "Memories", "Reward", "Done", "MaxStepReached", "Id" }, null, null, null)
+            new pbr::GeneratedClrTypeInfo(typeof(global::MLAgents.CommunicatorObjects.AgentInfoProto), global::MLAgents.CommunicatorObjects.AgentInfoProto.Parser, new[]{ "StackedVectorObservation", "VisualObservations", "TextObservation", "StoredVectorActions", "StoredTextActions", "Memories", "Reward", "Done", "MaxStepReached", "Id", "ActionMask" }, null, null, null)
          }));
    }
    #endregion
      done_ = other.done_;
      maxStepReached_ = other.maxStepReached_;
      id_ = other.id_;
+      actionMask_ = other.actionMask_.Clone();
      _unknownFields = pb::UnknownFieldSet.Clone(other._unknownFields);
    }

      }
    }

+    /// <summary>Field number for the "action_mask" field.</summary>
+    public const int ActionMaskFieldNumber = 11;
+    private static readonly pb::FieldCodec<bool> _repeated_actionMask_codec
+        = pb::FieldCodec.ForBool(90);
+    private readonly pbc::RepeatedField<bool> actionMask_ = new pbc::RepeatedField<bool>();
+    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
+    public pbc::RepeatedField<bool> ActionMask {
+      get { return actionMask_; }
+    }
+
    [global::System.Diagnostics.DebuggerNonUserCodeAttribute]
    public override bool Equals(object other) {
      return Equals(other as AgentInfoProto);
      if (Done != other.Done) return false;
      if (MaxStepReached != other.MaxStepReached) return false;
      if (Id != other.Id) return false;
+      if(!actionMask_.Equals(other.actionMask_)) return false;
      return Equals(_unknownFields, other._unknownFields);
    }

      if (Done != false) hash ^= Done.GetHashCode();
      if (MaxStepReached != false) hash ^= MaxStepReached.GetHashCode();
      if (Id != 0) hash ^= Id.GetHashCode();
+      hash ^= actionMask_.GetHashCode();
      if (_unknownFields != null) {
        hash ^= _unknownFields.GetHashCode();
      }
        output.WriteRawTag(80);
        output.WriteInt32(Id);
      }
+      actionMask_.WriteTo(output, _repeated_actionMask_codec);
      if (_unknownFields != null) {
        _unknownFields.WriteTo(output);
      }
      if (Id != 0) {
        size += 1 + pb::CodedOutputStream.ComputeInt32Size(Id);
      }
+      size += actionMask_.CalculateSize(_repeated_actionMask_codec);
      if (_unknownFields != null) {
        size += _unknownFields.CalculateSize();
      }
      if (other.Id != 0) {
        Id = other.Id;
      }
+      actionMask_.Add(other.actionMask_);
      _unknownFields = pb::UnknownFieldSet.MergeFrom(_unknownFields, other._unknownFields);
    }

          }
          case 80: {
            Id = input.ReadInt32();
+            break;
+          }
+          case 90:
+          case 88: {
+            actionMask_.AddEntriesFrom(input, _repeated_actionMask_codec);
            break;
          }
        }
--- a/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/CoreBrainInternal.cs

        /// Modify only in inspector : Name of the previous action node
        public string PreviousActionPlaceholderName = "prev_action";
+
+        /// Name of the action mask node
+        private string ActionMaskPlaceholderName = "action_masks";
+        
 #if ENABLE_TENSORFLOW
        TFGraph graph;
        TFSession session;
        bool hasPrevAction;
+        bool hasMaskedActions; 
+        float[,] maskedActions;
        List<Texture2D> texturesHolder;
        int memorySize;
 #endif
                {
                    hasValueEstimate = true;
                }
+                if (graph[graphScope + ActionMaskPlaceholderName] != null)
+                {
+                    hasMaskedActions = true;
+                }
+

            observationMatrixList = new List<float[,,,]>();
            texturesHolder = new List<Texture2D>();
                    i++;
                }
            }
-
-
+            
+            if (hasMaskedActions)
+            {
+                maskedActions = new float[
+                    currentBatchSize, 
+                    brain.brainParameters.vectorActionSize.Sum()
+                ];
+                var i = 0;
+                foreach (Agent agent in agentList)
+                {
+                    for (int j = 0; j < brain.brainParameters.vectorActionSize.Sum(); j++)
+                    {
+                        if (agentInfo[agent].actionMasks != null)
+                        {
+                            maskedActions[i, j] = agentInfo[agent].actionMasks[j] ? 0.0f : 1.0f;
+                        }
+                        else
+                        {
+                            maskedActions[i, j] = 1.0f;
+                        }
+                    }
+                    i++;
+                }
+            }
+            
            observationMatrixList.Clear();
            for (int observationIndex =
                    0;
                runner.AddInput(graph[graphScope + PreviousActionPlaceholderName][0], inputPrevAction);
            }

+            // Create the mask action tensor
+            if (hasMaskedActions)
+            {
+                runner.AddInput(graph[graphScope + ActionMaskPlaceholderName][0], maskedActions);
+            }
+            
            // Create the observation tensors
            for (int obsNumber =
                    0;
--- a/unity-environment/Assets/ML-Agents/Editor/Tests.meta
+++ b/unity-environment/Assets/ML-Agents/Editor/Tests.meta
+fileFormatVersion: 2
+guid: 172fcc71d343247a9a91d5b54dd21cd6
+folderAsset: yes
+DefaultImporter:
+  externalObjects: {}
+  userData: 
+  assetBundleName: 
+  assetBundleVariant: 
--- a/unity-environment/Assets/ML-Agents/Scripts/ActionMasker.cs
+++ b/unity-environment/Assets/ML-Agents/Scripts/ActionMasker.cs
+using System;
+using System.Collections.Generic;
+using System.Linq;
+
+namespace MLAgents
+{
+    public class ActionMasker
+    {
+        /// When using discrete control, is the starting indices of the actions
+        /// when all the branches are concatenated with each other.
+        private int[] _startingActionIndices;
+
+        private bool[] _currentMask;
+
+        private readonly BrainParameters _brainParameters;
+
+        public ActionMasker(BrainParameters brainParameters)
+        {
+            this._brainParameters = brainParameters;
+        }
+        
+        /// <summary>
+        /// Modifies an action mask for discrete control agents. When used, the agent will not be
+        /// able to perform the action passed as argument at the next decision. If no branch is
+        /// specified, the default branch will be 0. The actionIndex or actionIndices correspond
+        /// to the action the agent will be unable to perform.
+        /// </summary>
+        /// <param name="branch">The branch for which the actions will be masked</param>
+        /// <param name="actionIndices">The indices of the masked actions</param>
+        public void SetActionMask(int branch, IEnumerable<int> actionIndices)
+        {   
+            // If the branch does not exist, raise an error
+            if (branch >= _brainParameters.vectorActionSize.Length )
+                throw new UnityAgentsException(
+                    "Invalid Action Masking : Branch "+branch+" does not exist.");
+
+            int totalNumberActions = _brainParameters.vectorActionSize.Sum();
+            
+            // By default, the masks are null. If we want to specify a new mask, we initialize
+            // the actionMasks with trues.
+            if (_currentMask == null)
+            {
+                _currentMask = new bool[totalNumberActions];
+            }
+
+            // If this is the first time the masked actions are used, we generate the starting
+            // indices for each branch.
+            if (_startingActionIndices == null)
+            {
+                _startingActionIndices = CreateActionStartinIndices();
+            }
+            
+            // Perform the masking
+            foreach (var actionIndex in actionIndices)
+            {
+                if (actionIndex >= _brainParameters.vectorActionSize[branch])
+                {
+                    throw new UnityAgentsException(
+                        "Invalid Action Masking: Action Mask is too large for specified branch.");
+                }
+                _currentMask[actionIndex + _startingActionIndices[branch]] = true;
+            } 
+        }
+
+        /// <summary>
+        /// Get the current mask for an agent
+        /// </summary>
+        /// <returns>A mask for the agent. A boolean array of length equal to the total number of
+        /// actions.</returns>
+        public bool[] GetMask()
+        {
+            AssertMask();
+            return _currentMask;
+        }
+
+        /// <summary>
+        /// Makes sure that the current mask is usable.
+        /// </summary>
+        private void AssertMask()
+        {
+            // Action Masks can only be used in Discrete Control.
+            if (_brainParameters.vectorActionSpaceType != SpaceType.discrete)
+            {
+                throw new UnityAgentsException(
+                    "Invalid Action Masking : Can only set action mask for Discrete Control.");
+            }
+            
+            var numBranches = _brainParameters.vectorActionSize.Length;
+            for (var branchIndex = 0 ; branchIndex < numBranches; branchIndex++ )
+            {
+                if (AreAllActionsMasked(branchIndex))
+                {
+                    throw new UnityAgentsException(
+                        "Invalid Action Masking : All the actions of branch " + branchIndex +
+                        " are masked.");
+                }
+            }
+        }
+
+        /// <summary>
+        /// Resets the current mask for an agent
+        /// </summary>
+        public void ResetMask()
+        {
+            if (_currentMask != null)
+            {
+                Array.Clear(_currentMask, 0, _currentMask.Length);
+            }
+        }
+
+        /// <summary>
+        /// Generates an array containing the starting indicies of each branch in the vector action
+        /// Makes a cumulative sum.
+        /// </summary>
+        /// <returns></returns>
+        private int[] CreateActionStartinIndices()
+        {
+            var vectorActionSize = _brainParameters.vectorActionSize;
+            var runningSum = 0;
+            var result = new int[vectorActionSize.Length + 1];
+            for (var actionIndex = 0;
+                actionIndex < vectorActionSize.Length; actionIndex++)
+            {
+                runningSum += vectorActionSize[actionIndex];
+                result[actionIndex + 1] = runningSum;
+            }
+            return result;
+        }
+
+        /// <summary>
+        /// Checks if all the actions in the input branch are masked
+        /// </summary>
+        /// <param name="branch"> The index of the branch to check</param>
+        /// <returns> True if all the actions of the branch are masked</returns>
+        private bool AreAllActionsMasked(int branch)
+        {
+            if (_currentMask == null)
+            {
+                return false;
+            }
+            var start = _startingActionIndices[branch];
+            var end = _startingActionIndices[branch + 1];
+            for (var i = start; i < end; i++)
+            {
+                if (!_currentMask[i])
+                {
+                    return false;
+                }
+            }
+            return true;
+
+        }
+    }
+}
--- a/unity-environment/Assets/ML-Agents/Scripts/ActionMasker.cs.meta
+++ b/unity-environment/Assets/ML-Agents/Scripts/ActionMasker.cs.meta
+fileFormatVersion: 2
+guid: 8a0ec4ccf4ee450da7766f65228d5460
+timeCreated: 1534530911
--- a/unity-environment/Assets/ML-Agents/Editor/Tests/EditModeTestActionMasker.cs
+++ b/unity-environment/Assets/ML-Agents/Editor/Tests/EditModeTestActionMasker.cs
+using NUnit.Framework;
+
+namespace MLAgents.Tests
+{
+    public class EditModeTestActionMasker
+    {
+        [Test]
+        public void Contruction()
+        {
+            var bp = new BrainParameters();
+            var masker = new ActionMasker(bp);
+            Assert.IsNotNull(masker);
+        }
+
+        [Test]
+        public void FailsWithContinuous()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.continuous;
+            bp.vectorActionSize = new int[1] {4};
+            var masker = new ActionMasker(bp);
+            masker.SetActionMask(0, new int[1] {0});
+            Assert.Catch<UnityAgentsException>(() => masker.GetMask());
+            
+        }
+
+        [Test]
+        public void NullMask()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            var masker = new ActionMasker(bp);
+            var mask = masker.GetMask();
+            Assert.IsNull(mask);
+        }
+        
+        [Test]
+        public void FirstBranchMask()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            bp.vectorActionSize = new int[3] {4, 5, 6};
+            var masker = new ActionMasker(bp);
+            var mask = masker.GetMask();
+            Assert.IsNull(mask);
+            masker.SetActionMask(0, new int[]{1,2,3});
+            mask = masker.GetMask();
+            Assert.IsFalse(mask[0]);
+            Assert.IsTrue(mask[1]);
+            Assert.IsTrue(mask[2]);
+            Assert.IsTrue(mask[3]);
+            Assert.IsFalse(mask[4]);
+            Assert.AreEqual(mask.Length, 15);
+        }
+        
+        [Test]
+        public void SecondBranchMask()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            bp.vectorActionSize = new int[3] {4, 5, 6};
+            var masker = new ActionMasker(bp);
+            bool[] mask = masker.GetMask();
+            masker.SetActionMask(1, new int[]{1,2,3});
+            mask = masker.GetMask();
+            Assert.IsFalse(mask[0]);
+            Assert.IsFalse(mask[4]);
+            Assert.IsTrue(mask[5]);
+            Assert.IsTrue(mask[6]);
+            Assert.IsTrue(mask[7]);
+            Assert.IsFalse(mask[8]);
+            Assert.IsFalse(mask[9]);
+        }
+        
+        [Test]
+        public void MaskReset()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            bp.vectorActionSize = new int[3] {4, 5, 6};
+            var masker = new ActionMasker(bp);
+            var mask = masker.GetMask();
+            masker.SetActionMask(1, new int[3]{1,2,3});
+            mask = masker.GetMask();
+            masker.ResetMask();
+            mask = masker.GetMask();
+            for (var i = 0; i < 15; i++)
+            {
+                Assert.IsFalse(mask[i]);
+            }
+        }
+
+        [Test]
+        public void ThrowsError()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            bp.vectorActionSize = new int[3] {4, 5, 6};
+            var masker = new ActionMasker(bp);
+            
+            Assert.Catch<UnityAgentsException>(
+                () => masker.SetActionMask(0, new int[1]{5}));
+            Assert.Catch<UnityAgentsException>(
+                () => masker.SetActionMask(1, new int[1]{5}));
+            masker.SetActionMask(2, new int[1] {5});
+            Assert.Catch<UnityAgentsException>(
+                () => masker.SetActionMask(3, new int[1]{1}));
+            masker.GetMask();
+            masker.ResetMask();
+            masker.SetActionMask(0, new int[4] {0, 1, 2, 3});    
+            Assert.Catch<UnityAgentsException>(
+                () => masker.GetMask());
+        }
+        
+        [Test]
+        public void MultipleMaskEdit()
+        {
+            var bp = new BrainParameters();
+            bp.vectorActionSpaceType = SpaceType.discrete;
+            bp.vectorActionSize = new int[3] {4, 5, 6};
+            var masker = new ActionMasker(bp);
+            masker.SetActionMask(0, new int[2] {0, 1});
+            masker.SetActionMask(0, new int[1] {3});
+            masker.SetActionMask(2, new int[1] {1});
+            var mask = masker.GetMask();
+            for (var i = 0; i < 15; i++)
+            {
+                if ((i == 0) || (i == 1) || (i == 3)|| (i == 10))
+                {
+                    Assert.IsTrue(mask[i]);
+                }
+                else
+                {
+                    Assert.IsFalse(mask[i]);
+                }
+            }
+        }
+    }
+}
--- a/unity-environment/Assets/ML-Agents/Editor/Tests/EditModeTestActionMasker.cs.meta
+++ b/unity-environment/Assets/ML-Agents/Editor/Tests/EditModeTestActionMasker.cs.meta
+fileFormatVersion: 2
+guid: 2e2810ee6c8c64fb39abdf04b5d17f50
+MonoImporter:
+  externalObjects: {}
+  serializedVersion: 2
+  defaultReferences: []
+  executionOrder: 0
+  icon: {instanceID: 0}
+  userData: 
+  assetBundleName: 
+  assetBundleVariant: 
--- a//unity-environment/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs
+++ b//unity-environment/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs
--- a//unity-environment/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs.meta
+++ b//unity-environment/Assets/ML-Agents/Editor/Tests/MLAgentsEditModeTest.cs.meta