[Reproduce] Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Submitted to NeurIPS 2019 Reproducibility Challenge
We tackle the issue of long-horizon planning and temporally-extended tasks in our replication,
using language as abstraction for hierarchical reinforcement learning. The proposed approach
selects language as the choice of abstraction because of its compositional structure, ensuring
an ability to break down tasks into smaller sub-tasks. The authors train a low-level policy and
high-level policy using an interactive environment built using the MuJoCo physics engine and the
CLEVR engine. The authors show that using language as the framework between low-level policy and
high-level policy allows the agent to learn complex tasks requiring long term planning,
including object sorting and multi-object rearrangement. We focused on implementing and training
the low-level policy from scratch, as that is where HIR is first introduced. For the low-level
policy, we show that encoding the instruction with a GRU and using HIR performs better than a
one-hot encoded representation of the instruction. However, our results for one-hot encoded
representation as the number of total instructions grew contradicted what the conclusions from
the original paper.
PDF
Code
Original Paper