周雅情 徐晨伟 欧阳晨 李鹏飞 题为 “ Human-in-the-Loop Reinforcement Learning with Risk-Aware Intervention and Imitation​” 的论文已被《Expert Systems With Applications》接受发表。该论文摘要如下:

Human-in-the-loop reinforcement learning (HiL-RL) improves policy safety and learning efficiency by incorporating real-time human interventions and demonstration data. However, existing HiL-RL methods often suffer from inaccurate intervention timing and inefficient use of demonstration data. To address these issues, we propose a novel framework called HiRIL (Human-in-the-loop Risk-aware Imitation-enhanced Learning), which establishes a closed-loop learning mechanism that integrates risk-aware intervention triggering and imitation-based policy optimization under a unified risk metric. At the core of HiRIL is the Bayesian Implicit Quantile Network (BIQN), which captures both epistemic and aleatoric uncertainty through Bayesian weight sampling and quantile-based return modeling. These uncertainties are combined to generate risk scores for state–action pairs, guiding when to trigger human intervention. To better utilize demonstration data, HiRIL introduces a prioritized experience replay mechanism based on risk difference, which emphasizes human interventions that significantly reduce risk. During policy optimization, a local imitation loss is applied to clone human actions at intervention points, enabling risk-guided joint optimization. We validate the effectiveness of HiRIL through extensive experiments on end-to-end autonomous driving tasks in the CARLA simulator. The results show that HiRIL achieves excellent performance in terms of training speed, task success rate, driving distance, and collision reduction, while maintaining strong robustness under control disturbances and non-expert interventions.