NIC Lab | 张倩倩

个人信息

参与实验室科研项目

研究课题

面向新型人机混合智能系统，考虑如何平衡机器智能和人类智能在决策层面的融合问题。基于对自主性边界的讨论探究，进而实现对现有人机“介入”控制系统（Human-On-The-Loop）和人机“共享”控制系统（Human-In-The-Loop）的优化设计。

学术成果

共撰写/参与撰写专利 1 项，录用/发表论文 5 篇，投出待录用论文2篇。联培学生可能有其他不在此展示的论文/专利。

patent

基于人机融合的SMT产线关键工艺参数优化方法及存储介质张倩倩, 赵云波, 康宇, 许镇义, 丁振桓, and 李泽瑞 [Abs]
本发明的一种基于人机融合的SMT产线关键工艺参数优化方法及存储介质，包括以下步骤，通过锡膏印刷系统对SMT产线上的电路主板进行锡膏点印；通过锡膏印刷检测系统对经过锡膏印刷系统的电路主板印刷情况进行检测；构建印刷质量预测模型对从关键工艺参数到SPI检测数据的对应关系进行拟合训练；构建印刷工艺参数策略模型对从SPI检测数据到关键工艺参数改进之间的策略模型进行拟合训练；结合人类专家的经验知识辅助机器智能的训练以及危机情况的纠错。本发明引入了强化学习用于决策锡膏印刷关键工艺参数的生成，结合基于MLP的印刷质量预测模型，形成一套优化印刷关键工艺参数的优化系统，具有较好的稳健性，且能适应多步预测等复杂情况。

Journal Articles

面向人机序贯决策实现共享控制下的仲裁优化张倩倩, 赵云波, 吕文君, and 陈谋 中国科学：信息科学 2023 [Abs] [doi] [pdf]
共享控制存在于众多由人类智能和机器智能共同参与的序贯决策场景. 由于人的决策范围和智能机器的决策范围尚未予以明确划分, 需要加以实时仲裁从而达到人机共存并且共享决策权限. 为此本文提出了一种仲裁优化方法, 该方法的独特之处在于引入自主性边界概念, 优化了共享控制中人机决策动作的仲裁机制. 本文为自主性边界的计算和更新维护提供了思路, 能够基于贝叶斯规则的意图推理分析人机共享系统可能要实现的目标, 从而确定仲裁参数. 此外, 本文还分析了自主性边界的不确定性以促进边界信息对共享控制中决策质量的优化效果. 实验结果表明, 所提出的方法在累积奖励、成功率、撞击率方面表现出色, 这些说明了本文提出的共享控制中的仲裁优化方法在求解人机序贯决策问题时的有效性和价值.
Traded Control of Human–Machine Systems for Sequential Decision-Making Based on Reinforcement Learning Qianqian Zhang, Yu Kang, Yun-Bo Zhao , Pengfei Li, and Shiyi You IEEE Trans. Artif. Intell. 2022 [Abs] [doi] [pdf]
Sequential decision-making (SDM) is a common type of decision-making problem with sequential and multistage characteristics. Among them, the learning and updating of policy are the main challenges in solving SDM problems. Unlike previous machine autonomy driven by artificial intelligence alone, we improve the control performance of SDM tasks by combining human intelligence and machine intelligence. Specifically, this article presents a paradigm of a human–machine traded control systems based on reinforcement learning methods to optimize the solution process of sequential decision problems. By designing the idea of autonomous boundary and credibility assessment, we enable humans and machines at the decision-making level of the systems to collaborate more effectively. And the arbitration in the humanmachine traded control systems introduces the Bayesian neural network and the dropout mechanism to consider the uncertainty and security constraints. Finally, experiments involving machine traded control, human traded control were implemented. The preliminary experimental results of this article show that our traded control method improves decision-making performance and verifies the effectiveness for SDM problems.

Conference Articles

Adaptive Arbitration for Minimal Intervention Shared Control via Deep Reinforcement Learning Shiyi You, Yu Kang, Yun-Bo Zhao , and Qianqian Zhang In 2021 China Automation Congress (CAC) 2021 [Abs] [doi] [pdf]
In shared control, humans and intelligent robots jointly complete real-time control tasks with their complementary capabilities for improved performance unavailable by neither side on its own, which is attracting more and more attentions in recent years. Arbitration, as an indispensable part of shared control, determines how control authority is allocated between the human and robot, and the definition of that policy has always been one of the fundamental problems. In this paper, we propose an adaptive arbitration method for shared control systems, which minimizes the deviation from the human inputs while ensuring the system performance based on deep reinforcement learning. We provide humans the maximum assistance with the minimal intervention, in order to balance human’s need for control authority and need for performance. We apply our method to real-time control tasks, and the results show that our method achieves high task success rate and shorter task completion time with less human workload, while maintaining higher human satisfaction.
Autonomous Boundary of Human-Machine Collaboration System Based on Reinforcement Learning Qianqian Zhang, Yun-Bo Zhao , and Yu Kang In 2020 Australian and New Zealand Control Conference (ANZCC) 2020 [Abs] [doi] [pdf]
This paper provides a human-machine collaborative control framework, including artificial intelligence decision systems, human-level control, arbiter judgment, and learning of autonomous boundary, so that human suggestions are incorporated into the training process of decisions, assisting agents to learn quickly control decision tasks. Based on the model-free deep reinforcement learning algorithm HITL-AC, the human feedback (reward or punishment) is connected with the reward of the agent, so that the agent continuously tries to find a better boundary during the system’s operation, avoiding defects of pre-fixed boundary. This formulation improves the data efficiency of reinforcement learning and plays a guiding role in seeking human intervention when the agent is in an uncertain environmental state during the test use phase. The fourth section of the paper gives a training demonstration of the bipedal walker. The experimental results show that human intervention can accelerate the process of agent reinforcement learning during the training phase, and seek human help when guiding the dangerous state of the agent during the test phase. This is beneficial for solving real-world problems, further proving the feasibility and effectiveness of the proposed framework and method.

Theses

面向人机序贯决策的混合智能方法研究张倩倩中国科学技术大学, 合肥 2021 [Abs] [pdf]
随着人工智能技术的发展，机器智能得到不断的提高，随之而来的则是机器智能得以在各行各业应用发展。在此进程中，不可避免的会遇到机器自主性不足以解决本身该由人类解决或者人类必须参与决策的情况，考虑此种场景下人类智能和机器智能共同作用的决策问题则显得尤为重要和有意义。更具体地，序贯决策问题作为一类具有时序性和多阶段性的动态决策问题，其发展与当下人工智能时代下的工程应用、生产生活等领域息息相关。人的作用体现在序贯决策问题的两方面，一则，人本身属于序贯决策问题模型中的一部分，即该类问题是离不开人的如微创外科手术等；二则，人的相关信息不体现在序贯决策问题模型中，而是因人独特的认知能力使得其可以出现在问题的求解办法中，达到改善问题求解的目的如人对机器搜救系统的引导等，我们将上述两种场景统称为 “人机序贯决策问题”。针对人机序贯决策问题，由于人类智能和机器智能本质上的区别，数学表达上的巨大差异，使得人和机器共同作用于问题求解时，不可避免的因为协调原因造成决策质量不高甚至决策失误的现象。然而直接应用传统人机系统的控制算法不能有效处理这些问题，从而引起机器代理失效，人力浪费，甚至还会造成决策系统性能恶化甚至崩溃。因此，亟需设计有效的人机混合智能算法来解决这些问题。本文以人机序贯决策问题为研究对象，围绕人机混合智能控制中的决策权限划分、介入控制触发切换时机和共享控制混合人机决策动作程度三个问题展开研究，旨在提出有效的人机混合智能算法来改善提升人机序贯决策问题的求解。本文的研究工作主要包括以下几个方面: 1. 提出了基于强化学习方法的人机混合智能控制框架。通过将机器代理的决策和人类的决策以可信性和安全性为评价指标进行仲裁选择，以确定更优的待执行决策动作。同时考虑了基于模型的强化学习子系统和基于无模型的强化学习子系统，为适应广泛的序贯决策应用场景提供了更多可能。 2. 针对人机序贯决策中的介入控制问题，提出了自主性及自主性边界的概念，通过将自主性边界的求解形式化为与任务目标相关的常规优化问题进行讨论判定，优化介入控制的控制方案和算法，实现人机序贯决策中人介入机器场景和机器介入人场景下的决策性能提升。 3. 针对人机序贯决策中的共享控制问题，提出了基于自主性边界的混合参数优化设计方案，通过自适应调节混合参数大小直接影响最终待执行动作的生成。考虑了人机动作的融合程度，使得最优解在人的动作空间和机器的动作空间所共同张成的扩展空间中出现，为决策质量的提升提供了扩展空间。 4. 针对介入控制和共享控制中所估计的自主性边界值可能存在单值估计不准确的问题，提出了基于贝叶斯神经网络的不确定性估计办法，获得自主性边界的概率分布信息并用于决策动作生成，利用自主性边界的不确定性优化设计人机混合智能算法，既使得决策动作的优化存在更多选择，也更加符合人们对决策边界的模糊性思考。综上所述，本文面向人机序贯决策对混合智能算法所面临的问题进行了系统性的研究，创新性地提出了对应的解决方案，推动了人机序贯决策求解和混合智能算法的进一步发展。

学位论文

毕业去向

安徽大学人工智能学院，讲师