个人信息

参与实验室科研项目
人机混合智能系统双层智能测试评估技术研究
复杂环境下非完全信息博弈决策的智能基础模型研究
研究课题
强化学习驱动机器下的人机控制方法研究
学术成果
共撰写/参与撰写专利 0 项,录用/发表论文 3 篇,投出待录用论文4篇。
Journal Articles
-
Human-in-the-Loop Reinforcement Learning with Risk-Aware Intervention and Imitation
Yaqing Zhou,
Yun-Bo Zhao ,
Chenwei Xu,
Chen Ouyang,
and Pengfei Li
Expert Systems with Applications
2026
[Abs]
[doi]
[pdf]
Human-in-the-loop reinforcement learning (HiL-RL) improves policy safety and learning efficiency by incorporating real-time human interventions and demonstration data. However, existing HiL-RL methods often suffer from inaccurate intervention timing and inefficient use of demonstration data. To address these issues, we propose a novel framework called HiRIL (Human-in-the-loop Risk-aware Imitation-enhanced Learning), which establishes a closed-loop learning mechanism that integrates risk-aware intervention triggering and imitation-based policy optimization under a dual-mode uncertainty metric. At the core of HiRIL is the Bayesian Implicit Quantile Network (BIQN), which captures both epistemic and aleatoric uncertainty through Bayesian weight sampling and quantile-based return modeling. These uncertainties are combined to generate risk scores for state-action pairs, guiding when to trigger human intervention. To better utilize intervention data, HiRIL introduces a prioritized experience replay mechanism based on risk difference, which emphasizes human interventions that significantly reduce risk. During policy optimization, a local imitation loss is applied to clone human actions at intervention points, enabling risk-guided joint optimization. We conduct extensive experiments on the CARLA end-to-end autonomous driving benchmark. Results show that HiRIL consistently outperforms baselines across multiple metrics and maintains strong robustness under perturbations and non-stationary human intervention.
-
A Dual Confidence Evaluation-Based Shared Control Approach for Human-Machine Collaboration
Yaqing Zhou,
Yun-Bo Zhao ,
Pengfei Li,
Xia Tian,
Shuyue Jiang,
and Yu Kang
Neurocomputing
2026
[Abs]
[doi]
[pdf]
Shared control has become a key strategy for enhancing the safety and adaptability of human-machine collaboration systems, particularly in complex and uncertain environments. However, existing rule-based and confidence-based authority allocation approaches often suffer from limited generalizability or excessive reliance on physiological signals, which hinders their practical deployment. This paper proposes a Dual Confidence-Based Shared Control (DC-SC) approach that enables dynamic and interpretable authority allocation by quantifying the decision confidence of both humans and machines. The human confidence model is constructed through a knowledge-task matching function that measures the cognitive alignment between the operator’s expertise and task difficulty, while the machine confidence model assesses decision reliability via an uncertainty-tolerance matching mechanism. These two types of confidence indicators are jointly used to construct a shared control policy, in which the fusion weights are dynamically adjusted using environmental feedback within a policy gradient optimization framework, thereby maximizing human-machine collaborative performance. Theoretical analysis validates the soundness of the confidence models, and experiments conducted in benchmark environments such as LunarLander and UAV path planning demonstrate that DC-SC significantly outperforms both reinforcement learning baselines and traditional shared control approaches in terms of policy performance and system safety.
Book Chapters
-
Uncertainty-Based Dynamic Weighted Experience Replay for Human-in-the-Loop Deep Reinforcement Learning
Xia Tian,
Yu Kang,
Yunbo Zhao,
Yaqing Zhou,
and Pengfei Li
In Frontiers in Artificial Intelligence and Applications
2025
[Abs]
[doi]
[pdf]
Human-in-the-loop reinforcement learning (HIRL) enhances sampling efficiency in deep reinforcement learning by incorporating human expertise and experience into the training process. However, HIRL methods still heavily depend on expert guidance, which is a key factor limiting their further development and largescale application. In this paper, an uncertainty-based dynamic weighted experience replay approach (UDWER) is proposed to solve the above problem. Our approach enables the algorithm to detect decision uncertainty, triggering human intervention only when uncertainty exceeds a threshold. This reduces the need for continuous human supervision. Additionally, we design a dynamic experience replay mechanism that prioritizes machine self-exploration and human-guided samples with different weights based on decision uncertainty. We also provide a theoretical derivation and related discussion. Experiments in the Lunar Lander environment demonstrate improved sampling efficiency and reduced reliance on human guidance.
博客文章