“自动化失败归因”测试集-WhoWhen
在MAS(Multi-Agent系统)中,Agent之间自主协作、信息链条长,异常常见而且诊断困难。
Who&When就是测者MAS系统异常诊断的benchmark。Who&When的prompt问题来源于GAIA and AssistantBench,包含了127个LLMMulti-Agent系统中收集的异常日志,并经过系统和人工处理。Who&When样本配有如下所示的细粒度标注:
“谁”(Who):哪个Agent是“罪魁祸首”。
“何时”(When):决定性错误发生在交互的哪一步。
“为何”(Why):对失败原因的自然语言解释。
Who&When样本细节:
"question": "In the fictional language of Tizin, basic sentences are arranged with the Verb first, followed by the direct object, followed by the subject of the sentence. I want to express my love for apples to my Tizin friend."
ground_truth: "Maktay mato apple",
"mistake_agent": "Orchestrator"
"mistake_step": "1"
"mistake_reason": "The Orchestrator made an error when performing the translation.",
异常说明:
agent内容见“agents日志内容附录"
第0个agent的回答没有问题,问题出在第1个agent Orchestrator的解答
Tizin采用被动的方式,like表达为is pleasing to的被动句型,所以
i like apple 先转化为被动句型 apple is pleasing to me. apple这时为subject,me为object
在转化为Tizin,应该为"Maktay mato apple"
第一个agent虽然提到了Tizin的别动式句型习惯,但实际没按Tizin的表达方式做句型转换。
\n\nPlan:
\n1. Construct the sentence by placing the verb \"Maktay\" first.\n2. Follow the verb with the direct object \"Zapple.\"\n3. End the sentence with the subject \"Mato.\"\n\nTherefore, the resulting Tizin sentence should be:
\n\"Maktay Zapple Mato\"\n\nWe achieve this entirely through the given facts and logical deduction. There is no need for any additional team members for this simple translation task.
agents日志内容附录
"history": [
{
"content": "In the fictional language of Tizin, basic sentences are arranged with the Verb first, followed by the direct object, followed by the subject of the sentence. I want to express my love for apples to my Tizin friend.\n\nThe word that indicates oneself is \"Pa\" is the nominative form,
\"Mato\" is the accusative form, and \"Sing\" is the genitive form.\n\nThe root verb that indicates an intense like for something is \"Maktay\". When it is used in the present, it is used in it's root form, when it is used in the preterit past, it is \"Tay\", and when it is used in the imperfect past, it is \"Aktay\". It is used differently than in English, and is better translated as \"is pleasing to\", meaning that the thing doing the liking is actually the object of the sentence rather than the subject.\n\nThe word for apples is borrowed from English in Tizin, and so it is \"Apple\" is the nominative form, \"Zapple\" is the accusative form, and \"Izapple\" is the genitive form.
\n\nPlease translate \"I like apples\" to Tizin.\n",
"role": "human"
},
{
"content": "Initial plan:\n\nWe are working to address the following user request:
\n\nIn the fictional language of Tizin, basic sentences are arranged with the Verb first, followed by the direct object, followed by the subject of the sentence. I want to express my love for apples to my Tizin friend.
\n\nThe word that indicates oneself is \"Pa\" is the nominative form, \"Mato\" is the accusative form, and \"Sing\" is the genitive form.
\n\nThe root verb that indicates an intense like for something is \"Maktay\". When it is used in the present, it is used in it's root form, when it is used in the preterit past, it is \"Tay\", and when it is used in the imperfect past, it is \"Aktay\". It is used differently than in English, and is better translated as \"is pleasing to\", meaning that the thing doing the liking is actually the object of the sentence rather than the subject.
\n\nThe word for apples is borrowed from English in Tizin, and so it is \"Apple\" is the nominative form, \"Zapple\" is the accusative form, and \"Izapple\" is the genitive form.
\n\nPlease translate \"I like apples\" to Tizin.
\n\n\nTo answer this request we have assembled the following team:
\n\nAssistant: A helpful and general-purpose AI assistant that has strong language skills, Python skills, and Linux command line skills.\nComputerTerminal: A computer terminal that performs no other action than running Python scripts (provided to it quoted in ```python code blocks), or sh shell scripts (provided to it quoted in ```sh code blocks)
\nFileSurfer: An agent that can handle local files.
\nWebSurfer: A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, and interact with content (e.g., clicking links, scrolling the viewport, etc., filling in form fields, etc.) It can also summarize the entire page, or answer questions based on the content of the page. It can also be asked to sleep and wait for pages to load, in cases where the pages seem to be taking a while to load.
\n\n\n\nHere is an initial fact sheet to consider:\n\n1. GIVEN OR VERIFIED FACTS\n- In Tizin, basic sentences are arranged with the Verb first, followed by the direct object, followed by the subject of the sentence.
\n- The word that indicates oneself in the nominative form is \"Pa,\" in the accusative form is \"Mato,\" and in the genitive form is \"Sing.\"
\n- The root verb that indicates an intense like for something is \"Maktay.\" In the present tense, it is \"Maktay,\" in the preterit past, it is \"Tay,\" and in the imperfect past, it is \"Aktay.\"\n- The verb \"Maktay\" is used differently than in English and is better translated as \"is pleasing to,\" meaning that the thing doing the liking is actually the object of the sentence.
\n- The word for apples in the nominative form is \"Apple,\" in the accusative form is \"Zapple,\" and in the genitive form is \"Izapple.\"\n\n2. FACTS TO LOOK UP
\n- None specified. All necessary information appears to be provided in the request.\n\n3. FACTS TO DERIVE\n- The correct arrangement of the sentence \"I like apples\" in Tizin given the verb-object-subject structure and the specific verb usage.
\n\n4. EDUCATED GUESSES\n- None needed, as all information needed for the translation is provided.
\n\n\nHere is the plan to follow as best as possible:\n\n- We have all the required information to translate \"I like apples\" to Tizin.
\n- The sentence structure in Tizin is Verb-Object-Subject.\
n- The verb in the present tense is \"Maktay.\"
\n- \"I\" as the subject in the nominative form is \"Pa.\"
\n- \"Apples\" as the direct object in the accusative form is \"Zapple.\"
\n\nPlan:
\n1. Construct the sentence by placing the verb \"Maktay\" first.\n2. Follow the verb with the direct object \"Zapple.\"\n3. End the sentence with the subject \"Mato.\"\n\nTherefore, the resulting Tizin sentence should be:
\n\"Maktay Zapple Mato\"\n\nWe achieve this entirely through the given facts and logical deduction. There is no need for any additional team members for this simple translation task.
\n\n/usr/local/lib/python3.11/site-packages/autogen_magentic_one/agents/orchestrator.py:216: UserWarning: Resolved model mismatch: gpt-4o-2024-08-06 != gpt-4o-2024-05-13. Model mapping may be incorrect.
\n ledger_response = await self._model_client.create(",
"role": "Orchestrator (thought)"
},
{
"content": "Updated Ledger:
\n{
\n \"is_request_satisfied\": {
\n \"reason\": \"The translation 'Maktay Zapple Mato' correctly forms the sentence 'I like apples' in Tizin following the syntax and word forms provided.\",
\n \"answer\": true
\n },
\n \"is_in_loop\": {
\n \"reason\": \"The request was satisfied in the first response, and there has been no repetition of the same requests or responses.\",
\n \"answer\": false\n },
\n \"is_progress_being_made\": {
\n \"reason\": \"The translation task was completed accurately in the initial response.\",
\n \"answer\": true\n },
\n \"next_speaker\": {
\n \"reason\": \"There is no additional task or follow-up needed as the request has been fully satisfied.\",
\n \"answer\": \"Assistant\"\n },
\n \"instruction_or_question\": {
\n \"reason\": \"The translation of the sentence 'I like apples' to Tizin has been completed.\",
\n \"answer\": \"The translation of 'I like apples' to Tizin is 'Maktay Zapple Mato.'\"
\n }
\n}",
"role": "Orchestrator (thought)"
},
{
"content": "Request satisfied.",
"role": "Orchestrator (thought)"
},
{
"content": "No agent selected.\n response = await client.create(messages)\nFINAL ANSWER: Maktay Zapple Mato\nSCENARIO.PY COMPLETE !#!#\nRUN.SH COMPLETE !#!#",
"role": "Orchestrator (termination condition)"
}
]
---
Who and When.
https://huggingface.co/datasets/Kevin355/Who_and_When
Agents_Failure_Attribution.
https://github.com/mingyin1/Agents_Failure_Attribution
通过金融大模型实现数据智能分析和归因洞察.
https://blog.51cto.com/dipeak/12468170
解密Prompt系列23.大模型幻觉分类&归因&检测&缓解方案脑图全梳理.
https://cloud.tencent.com/developer/article/2378383