Abstract:
"
Exploration is a challenging task in reinforcement learning environments with sparse reward
signals. This problem is exponential in long horizon tasks with complex state and action spaces
which makes the applicability of reinforcement learning in real world problems out of practical
use. This research work is to introduce a novel learn from demonstration approach to address this
problem by combining human expertise and speed up the training process by minimizing the
ignorant learning process of the reinforcement learning agent. Our method is built upon the Deep
Deterministic Policy Gradient(DDPG) algorithm and Hindsight Experience replay(HER). This
novel approach is successfully proven to outperform the learning efficiency of the baseline
algorithm and account for the suboptimality of expert demonstration and overfitting bias of
current approaches.
"