Reinforcement learning (RL) has emerged as a key technique for designing dialogue policies. However, action space inflation in dialogue tasks has led to a heavy decision burden and incoherence problems for dialogue policies. In this paper, we propose a novel decomposed deep Q-network (D2Q) that exploits the natural structure of dialogue actions to perform decomposition on Q-function, realizing efficient and coherent dialogue policy learning. Instead of directly evaluating the Q-function, it consists of two separate estimators, one for the abstract action-value functions and the other for the s...