当前位置: 首页 > news >正文

GraphRAG的部署和生成检索过程体验

文章目录

  • 零 参考资料
  • 一 前置环境
  • 二 说明
  • 三 缺陷
  • 四 生成索引实践过程
  • 五 局部查询过程
    • 5.1 本地搜索
    • 5.2 全局搜索

零 参考资料

  • graphrag get_started
  • GraphRAG快速部署与调用方法详解
  • openai-hk openai 中转网站
  • GraphRAG从原理到实战技术精讲

一 前置环境

  • Anaconda环境
  • Window环境
  • 稳定的网络环境

二 说明

  • 本次实践采用在线大模型gpt-4o-mini和在线向量化模型text-embedding-3-small,对本地GPU性能不做要求。

三 缺陷

  • 构建索引过程受限于网络传输,耗时,token消耗较多。
  • 最终效果严重依赖大模型的处理能力。

四 生成索引实践过程

  1. 创建并激活虚拟环境
    conda create --name graphrag python=3.12
    conda activate graphrag
    
  2. 在虚拟环境中安装grapgrag
    pip install graphrag
    
  3. 在当前用户目录下创建graphrag/input目录(这里使用window环境)
# win command
mkdir graphrag\input 
mkdir graphrag\inputs
# linux /mac command
mkdir -p ./ragtest/input
mkdir -p ./ragtest/inouts
  1. 将需要构建RAG的文本放入graphrag/input目录,截至2025.7.22,graphrag官方默认支持的文件格式主要为:txt,csv,json。具体详情可查看graphrag输入知道文档

  2. 初始化工作区,执行以下命令将在ragtest中创建两个文件:.envsettings.yaml

    # win command
    graphrag init --root ragtest
    # linux command
    graphrag init --root ./ragtest
    
  3. 自动优化提示词

    graphrag prompt-tune --config ./settings.yaml --root ./ --no-discover-entity-types --language Chinese --output ./prompts
    
    (graphrag) C:\Users\kongyue\Desktop\ragtest>graphrag prompt-tune --config ./settings.yaml --root ./ --no-discover-entity-types --language Chinese --output ./prompts
    2025-07-23 09:15:20.0856 - INFO - graphrag.cli.prompt_tune - Logging enabled at C:\Users\kongyue\Desktop\ragtest\logs\logs.txt
    2025-07-23 09:15:20.0866 - INFO - graphrag.api.prompt_tune - Chunking documents...
    2025-07-23 09:15:20.0866 - INFO - graphrag.storage.file_pipeline_storage - Creating file storage at C:\Users\kongyue\Desktop\ragtest\input
    2025-07-23 09:15:20.0867 - INFO - graphrag.index.input.factory - loading input from root_dir=C:\Users\kongyue\Desktop\ragtest\input
    2025-07-23 09:15:20.0867 - INFO - graphrag.index.input.factory - Loading Input InputFileType.text
    2025-07-23 09:15:20.0867 - INFO - graphrag.storage.file_pipeline_storage - search C:\Users\kongyue\Desktop\ragtest\input for files matching .*\.txt$
    2025-07-23 09:15:20.0888 - INFO - graphrag.index.input.util - Found 7 InputFileType.text files, loading 7
    2025-07-23 09:15:20.0889 - INFO - graphrag.index.input.util - Total number of unfiltered InputFileType.text rows: 7
    2025-07-23 09:15:20.0894 - INFO - graphrag.index.workflows.create_base_text_units - Starting chunking process for 7 documents
    2025-07-23 09:15:21.0103 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  1/7
    2025-07-23 09:15:21.0107 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  2/7
    2025-07-23 09:15:21.0111 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  3/7
    2025-07-23 09:15:21.0115 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  4/7
    2025-07-23 09:15:21.0120 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  5/7
    2025-07-23 09:15:21.0123 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  6/7
    2025-07-23 09:15:21.0126 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  7/7
    2025-07-23 09:15:21.0136 - INFO - graphrag.api.prompt_tune - Retrieving language model configuration...
    2025-07-23 09:15:21.0136 - INFO - graphrag.api.prompt_tune - Creating language model...
    2025-07-23 09:15:22.0234 - INFO - graphrag.api.prompt_tune - Generating domain...
    2025-07-23 09:15:25.0649 - INFO - graphrag.api.prompt_tune - Generating persona...
    2025-07-23 09:17:22.0279 - INFO - graphrag.api.prompt_tune - Generating community report ranking description...
    2025-07-23 09:18:24.0945 - INFO - graphrag.api.prompt_tune - Generating entity relationship examples...
    2025-07-23 09:23:27.0014 - INFO - graphrag.api.prompt_tune - Generating entity extraction prompt...
    2025-07-23 09:23:27.0020 - INFO - graphrag.api.prompt_tune - Generating entity summarization prompt...
    2025-07-23 09:23:27.0020 - INFO - graphrag.api.prompt_tune - Generating community reporter role...
    2025-07-23 09:24:26.0767 - INFO - graphrag.api.prompt_tune - Generating community summarization prompt...
    2025-07-23 09:24:26.0768 - INFO - graphrag.cli.prompt_tune - Writing prompts to C:\Users\kongyue\Desktop\ragtest\prompts
    2025-07-23 09:24:26.0773 - INFO - graphrag.cli.prompt_tune - Prompts written to C:\Users\kongyue\Desktop\ragtest\prompts
    
  4. 打开openai-hk openai 中转网站,充值5元,获取api key,将其填写到.env文件中,并同时修改setting.yaml

    GRAPHRAG_API_KEY=hk-xxx
    
    https://api.openai-hk.com/v1/
    

    在这里插入图片描述

  5. 执行构建索引的命令

(graphrag) C:\Users\kongyue>graphrag index --root ./ragtest
(graphrag) C:\Users\kongyue>graphrag index --root ./ragtest
2025-07-22 19:00:30.0503 - INFO - graphrag.cli.index - Logging enabled at C:\Users\kongyue\ragtest\logs\logs.txt
2025-07-22 19:01:31.0835 - INFO - graphrag.index.validate_config - LLM Config Params Validated
2025-07-22 19:01:34.0057 - INFO - graphrag.index.validate_config - Embedding LLM Config Params Validated
2025-07-22 19:01:34.0058 - INFO - graphrag.cli.index - Starting pipeline run. False
2025-07-22 19:01:34.0061 - INFO - graphrag.cli.index - Using default configuration: {"root_dir": "C:\\Users\\kongyue\\ragtest","models": {"default_chat_model": {"api_key": "==== REDACTED ====","auth_type": "api_key","type": "openai_chat","model": "gpt-4o-mini","encoding_model": "o200k_base","api_base": "https://api.openai-hk.com/v1/","api_version": null,"deployment_name": null,"proxy": null,"audience": null,"model_supports_json": true,"request_timeout": 180.0,"tokens_per_minute": "auto","requests_per_minute": "auto","retry_strategy": "native","max_retries": 10,"max_retry_wait": 10.0,"concurrent_requests": 25,"async_mode": "threaded","responses": null,"max_tokens": null,"temperature": 0,"max_completion_tokens": null,"reasoning_effort": null,"top_p": 1,"n": 1,"frequency_penalty": 0.0,"presence_penalty": 0.0},"default_embedding_model": {"api_key": "==== REDACTED ====","auth_type": "api_key","type": "openai_embedding","model": "text-embedding-3-small","encoding_model": "cl100k_base","api_base": "https://api.openai-hk.com/v1/","api_version": null,"deployment_name": null,"proxy": null,"audience": null,"model_supports_json": true,"request_timeout": 180.0,"tokens_per_minute": null,"requests_per_minute": null,"retry_strategy": "native","max_retries": 10,"max_retry_wait": 10.0,"concurrent_requests": 25,"async_mode": "threaded","responses": null,"max_tokens": null,"temperature": 0,"max_completion_tokens": null,"reasoning_effort": null,"top_p": 1,"n": 1,"frequency_penalty": 0.0,"presence_penalty": 0.0}},"input": {"storage": {"type": "file","base_dir": "C:\\Users\\kongyue\\ragtest\\input","storage_account_blob_url": null,"cosmosdb_account_url": null},"file_type": "text","encoding": "utf-8","file_pattern": ".*\\.txt$","file_filter": null,"text_column": "text","title_column": null,"metadata": null},"chunks": {"size": 1200,"overlap": 100,"group_by_columns": ["id"],"strategy": "tokens","encoding_model": "cl100k_base","prepend_metadata": false,"chunk_size_includes_metadata": false},"output": {"type": "file","base_dir": "C:\\Users\\kongyue\\ragtest\\output","storage_account_blob_url": null,"cosmosdb_account_url": null},"outputs": null,"update_index_output": {"type": "file","base_dir": "C:\\Users\\kongyue\\ragtest\\update_output","storage_account_blob_url": null,"cosmosdb_account_url": null},"cache": {"type": "file","base_dir": "cache","storage_account_blob_url": null,"cosmosdb_account_url": null},"reporting": {"type": "file","base_dir": "C:\\Users\\kongyue\\ragtest\\logs","storage_account_blob_url": null},"vector_store": {"default_vector_store": {"type": "lancedb","db_uri": "C:\\Users\\kongyue\\ragtest\\output\\lancedb","url": null,"audience": null,"container_name": "==== REDACTED ====","database_name": null,"overwrite": true}},"workflows": null,"embed_text": {"model_id": "default_embedding_model","vector_store_id": "default_vector_store","batch_size": 16,"batch_max_tokens": 8191,"names": ["entity.description","community.full_content","text_unit.text"],"strategy": null},"extract_graph": {"model_id": "default_chat_model","prompt": "prompts/extract_graph.txt","entity_types": ["organization","person","geo","event"],"max_gleanings": 1,"strategy": null},"summarize_descriptions": {"model_id": "default_chat_model","prompt": "prompts/summarize_descriptions.txt","max_length": 500,"max_input_tokens": 4000,"strategy": null},"extract_graph_nlp": {"normalize_edge_weights": true,"text_analyzer": {"extractor_type": "regex_english","model_name": "en_core_web_md","max_word_length": 15,"word_delimiter": " ","include_named_entities": true,"exclude_nouns": ["stuff","thing","things","bunch","bit","bits","people","person","okay","hey","hi","hello","laughter","oh"],"exclude_entity_tags": ["DATE"],"exclude_pos_tags": ["DET","PRON","INTJ","X"],"noun_phrase_tags": ["PROPN","NOUNS"],"noun_phrase_grammars": {"PROPN,PROPN": "PROPN","NOUN,NOUN": "NOUNS","NOUNS,NOUN": "NOUNS","ADJ,ADJ": "ADJ","ADJ,NOUN": "NOUNS"}},"concurrent_requests": 25},"prune_graph": {"min_node_freq": 2,"max_node_freq_std": null,"min_node_degree": 1,"max_node_degree_std": null,"min_edge_weight_pct": 40.0,"remove_ego_nodes": true,"lcc_only": false},"cluster_graph": {"max_cluster_size": 10,"use_lcc": true,"seed": 3735928559},"extract_claims": {"enabled": false,"model_id": "default_chat_model","prompt": "prompts/extract_claims.txt","description": "Any claims or facts that could be relevant to information discovery.","max_gleanings": 1,"strategy": null},"community_reports": {"model_id": "default_chat_model","graph_prompt": "prompts/community_report_graph.txt","text_prompt": "prompts/community_report_text.txt","max_length": 2000,"max_input_length": 8000,"strategy": null},"embed_graph": {"enabled": false,"dimensions": 1536,"num_walks": 10,"walk_length": 40,"window_size": 2,"iterations": 3,"random_seed": 597832,"use_lcc": true},"umap": {"enabled": false},"snapshots": {"embeddings": false,"graphml": false,"raw_graph": false},"local_search": {"prompt": "prompts/local_search_system_prompt.txt","chat_model_id": "default_chat_model","embedding_model_id": "default_embedding_model","text_unit_prop": 0.5,"community_prop": 0.15,"conversation_history_max_turns": 5,"top_k_entities": 10,"top_k_relationships": 10,"max_context_tokens": 12000},"global_search": {"map_prompt": "prompts/global_search_map_system_prompt.txt","reduce_prompt": "prompts/global_search_reduce_system_prompt.txt","chat_model_id": "default_chat_model","knowledge_prompt": "prompts/global_search_knowledge_system_prompt.txt","max_context_tokens": 12000,"data_max_tokens": 12000,"map_max_length": 1000,"reduce_max_length": 2000,"dynamic_search_threshold": 1,"dynamic_search_keep_parent": false,"dynamic_search_num_repeats": 1,"dynamic_search_use_summary": false,"dynamic_search_max_level": 2},"drift_search": {"prompt": "prompts/drift_search_system_prompt.txt","reduce_prompt": "prompts/drift_search_reduce_prompt.txt","chat_model_id": "default_chat_model","embedding_model_id": "default_embedding_model","data_max_tokens": 12000,"reduce_max_tokens": null,"reduce_temperature": 0,"reduce_max_completion_tokens": null,"concurrency": 32,"drift_k_followups": 20,"primer_folds": 5,"primer_llm_max_tokens": 12000,"n_depth": 3,"local_search_text_unit_prop": 0.9,"local_search_community_prop": 0.1,"local_search_top_k_mapped_entities": 10,"local_search_top_k_relationships": 10,"local_search_max_data_tokens": 12000,"local_search_temperature": 0,"local_search_top_p": 1,"local_search_n": 1,"local_search_llm_max_gen_tokens": null,"local_search_llm_max_gen_completion_tokens": null},"basic_search": {"prompt": "prompts/basic_search_system_prompt.txt","chat_model_id": "default_chat_model","embedding_model_id": "default_embedding_model","k": 10,"max_context_tokens": 12000}
}
2025-07-22 19:01:34.0067 - INFO - graphrag.api.index - Initializing indexing pipeline...
2025-07-22 19:01:34.0067 - INFO - graphrag.index.workflows.factory - Creating pipeline with workflows: ['load_input_documents', 'create_base_text_units', 'create_final_documents', 'extract_graph', 'finalize_graph', 'extract_covariates', 'create_communities', 'create_final_text_units', 'create_community_reports', 'generate_text_embeddings']
2025-07-22 19:01:34.0068 - INFO - graphrag.storage.file_pipeline_storage - Creating file storage at C:\Users\kongyue\ragtest\input
2025-07-22 19:01:34.0068 - INFO - graphrag.storage.file_pipeline_storage - Creating file storage at C:\Users\kongyue\ragtest\output
2025-07-22 19:01:34.0077 - INFO - graphrag.index.run.run_pipeline - Running standard indexing.
2025-07-22 19:01:34.0083 - INFO - graphrag.index.run.run_pipeline - Executing pipeline...
2025-07-22 19:01:34.0084 - INFO - graphrag.index.input.factory - loading input from root_dir=C:\Users\kongyue\ragtest\input
2025-07-22 19:01:34.0085 - INFO - graphrag.index.input.factory - Loading Input InputFileType.text
2025-07-22 19:01:34.0086 - INFO - graphrag.storage.file_pipeline_storage - search C:\Users\kongyue\ragtest\input for files matching .*\.txt$
2025-07-22 19:01:34.0098 - INFO - graphrag.index.input.util - Found 1 InputFileType.text files, loading 1
2025-07-22 19:01:34.0099 - INFO - graphrag.index.input.util - Total number of unfiltered InputFileType.text rows: 1
2025-07-22 19:01:34.0100 - INFO - graphrag.index.workflows.load_input_documents - Final # of rows loaded: 1
2025-07-22 19:01:34.0146 - INFO - graphrag.api.index - Workflow load_input_documents completed successfully
2025-07-22 19:01:34.0162 - INFO - graphrag.index.workflows.create_base_text_units - Workflow started: create_base_text_units
2025-07-22 19:01:34.0163 - INFO - graphrag.utils.storage - reading table from storage: documents.parquet
2025-07-22 19:01:34.0215 - INFO - graphrag.index.workflows.create_base_text_units - Starting chunking process for 1 documents
2025-07-22 19:01:34.0524 - INFO - graphrag.index.workflows.create_base_text_units - chunker progress:  1/1
2025-07-22 19:01:34.0542 - INFO - graphrag.index.workflows.create_base_text_units - Workflow completed: create_base_text_units
2025-07-22 19:01:34.0542 - INFO - graphrag.api.index - Workflow create_base_text_units completed successfully
2025-07-22 19:01:34.0553 - INFO - graphrag.index.workflows.create_final_documents - Workflow started: create_final_documents
2025-07-22 19:01:34.0554 - INFO - graphrag.utils.storage - reading table from storage: documents.parquet
2025-07-22 19:01:34.0564 - INFO - graphrag.utils.storage - reading table from storage: text_units.parquet
2025-07-22 19:01:34.0598 - INFO - graphrag.index.workflows.create_final_documents - Workflow completed: create_final_documents
2025-07-22 19:01:34.0599 - INFO - graphrag.api.index - Workflow create_final_documents completed successfully
2025-07-22 19:01:34.0611 - INFO - graphrag.index.workflows.extract_graph - Workflow started: extract_graph
2025-07-22 19:01:34.0612 - INFO - graphrag.utils.storage - reading table from storage: text_units.parquet
2025-07-22 19:12:35.0481 - INFO - graphrag.logger.progress - extract graph progress: 1/11
2025-07-22 19:13:35.0500 - INFO - graphrag.logger.progress - extract graph progress: 2/11
2025-07-22 19:14:35.0510 - INFO - graphrag.logger.progress - extract graph progress: 3/11
2025-07-22 19:15:35.0521 - INFO - graphrag.logger.progress - extract graph progress: 4/11
2025-07-22 19:16:35.0522 - INFO - graphrag.logger.progress - extract graph progress: 5/11
2025-07-22 19:17:35.0527 - INFO - graphrag.logger.progress - extract graph progress: 6/11
2025-07-22 19:18:35.0536 - INFO - graphrag.logger.progress - extract graph progress: 7/11
2025-07-22 19:19:35.0528 - INFO - graphrag.logger.progress - extract graph progress: 8/11
2025-07-22 19:20:35.0540 - INFO - graphrag.logger.progress - extract graph progress: 9/11
2025-07-22 19:21:35.0559 - INFO - graphrag.logger.progress - extract graph progress: 10/11
2025-07-22 19:22:35.0571 - INFO - graphrag.logger.progress - extract graph progress: 11/11
2025-07-22 19:22:36.0175 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 1/70
2025-07-22 19:22:36.0176 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 2/70
2025-07-22 19:22:36.0176 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 3/70
2025-07-22 19:22:36.0176 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 4/70
2025-07-22 19:22:36.0177 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 5/70
2025-07-22 19:22:36.0177 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 6/70
2025-07-22 19:22:36.0178 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 7/70
2025-07-22 19:22:36.0178 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 8/70
2025-07-22 19:22:36.0180 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 9/70
2025-07-22 19:22:36.0181 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 10/70
2025-07-22 19:22:36.0181 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 11/70
2025-07-22 19:22:36.0181 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 12/70
2025-07-22 19:22:36.0182 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 13/70
2025-07-22 19:22:36.0187 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 14/70
2025-07-22 19:22:36.0188 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 15/70
2025-07-22 19:22:36.0188 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 16/70
2025-07-22 19:22:36.0188 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 17/70
2025-07-22 19:22:36.0189 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 18/70
2025-07-22 19:22:36.0189 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 19/70
2025-07-22 19:22:36.0189 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 20/70
2025-07-22 19:22:36.0190 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 21/70
2025-07-22 19:22:36.0190 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 22/70
2025-07-22 19:22:36.0204 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 23/70
2025-07-22 19:22:36.0205 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 24/70
2025-07-22 19:22:36.0206 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 25/70
2025-07-22 19:22:36.0206 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 26/70
2025-07-22 19:22:36.0207 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 27/70
2025-07-22 19:22:36.0207 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 28/70
2025-07-22 19:22:36.0208 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 29/70
2025-07-22 19:22:36.0208 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 30/70
2025-07-22 19:22:36.0209 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 31/70
2025-07-22 19:22:40.0791 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 32/70
2025-07-22 19:23:42.0152 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 33/70
2025-07-22 19:23:42.0154 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 34/70
2025-07-22 19:23:42.0154 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 35/70
2025-07-22 19:23:42.0154 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 36/70
2025-07-22 19:23:42.0155 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 37/70
2025-07-22 19:23:42.0155 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 38/70
2025-07-22 19:23:42.0155 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 39/70
2025-07-22 19:23:42.0156 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 40/70
2025-07-22 19:23:42.0156 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 41/70
2025-07-22 19:23:42.0157 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 42/70
2025-07-22 19:23:42.0157 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 43/70
2025-07-22 19:23:42.0157 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 44/70
2025-07-22 19:23:42.0158 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 45/70
2025-07-22 19:23:42.0158 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 46/70
2025-07-22 19:23:42.0158 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 47/70
2025-07-22 19:23:42.0159 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 48/70
2025-07-22 19:23:42.0160 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 49/70
2025-07-22 19:23:42.0160 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 50/70
2025-07-22 19:23:42.0160 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 51/70
2025-07-22 19:23:42.0162 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 52/70
2025-07-22 19:23:42.0162 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 53/70
2025-07-22 19:23:42.0164 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 54/70
2025-07-22 19:23:42.0165 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 55/70
2025-07-22 19:23:42.0165 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 56/70
2025-07-22 19:23:42.0166 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 57/70
2025-07-22 19:23:42.0166 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 58/70
2025-07-22 19:23:42.0167 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 59/70
2025-07-22 19:23:42.0167 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 60/70
2025-07-22 19:23:42.0167 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 61/70
2025-07-22 19:23:42.0168 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 62/70
2025-07-22 19:23:42.0168 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 63/70
2025-07-22 19:23:42.0168 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 64/70
2025-07-22 19:23:42.0174 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 65/70
2025-07-22 19:23:42.0175 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 66/70
2025-07-22 19:25:42.0447 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 67/70
2025-07-22 19:26:39.0431 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 68/70
2025-07-22 19:28:36.0263 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 69/70
2025-07-22 19:29:36.0272 - INFO - graphrag.logger.progress - Summarize entity/relationship description progress: 70/70
2025-07-22 19:29:36.0290 - INFO - graphrag.index.workflows.extract_graph - Workflow completed: extract_graph
2025-07-22 19:29:36.0291 - INFO - graphrag.api.index - Workflow extract_graph completed successfully
2025-07-22 19:29:36.0323 - INFO - graphrag.index.workflows.finalize_graph - Workflow started: finalize_graph
2025-07-22 19:29:36.0324 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-22 19:29:36.0331 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-22 19:29:36.0369 - INFO - graphrag.index.workflows.finalize_graph - Workflow completed: finalize_graph
2025-07-22 19:29:36.0369 - INFO - graphrag.api.index - Workflow finalize_graph completed successfully
2025-07-22 19:29:36.0397 - INFO - graphrag.index.workflows.extract_covariates - Workflow started: extract_covariates
2025-07-22 19:29:36.0397 - INFO - graphrag.index.workflows.extract_covariates - Workflow completed: extract_covariates
2025-07-22 19:29:36.0397 - INFO - graphrag.api.index - Workflow extract_covariates completed successfully
2025-07-22 19:29:36.0397 - INFO - graphrag.index.workflows.create_communities - Workflow started: create_communities
2025-07-22 19:29:36.0398 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-22 19:29:36.0406 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-22 19:29:36.0458 - INFO - graphrag.index.workflows.create_communities - Workflow completed: create_communities
2025-07-22 19:29:36.0458 - INFO - graphrag.api.index - Workflow create_communities completed successfully
2025-07-22 19:29:36.0472 - INFO - graphrag.index.workflows.create_final_text_units - Workflow started: create_final_text_units
2025-07-22 19:29:36.0473 - INFO - graphrag.utils.storage - reading table from storage: text_units.parquet
2025-07-22 19:29:36.0479 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-22 19:29:36.0485 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-22 19:29:36.0521 - INFO - graphrag.index.workflows.create_final_text_units - Workflow completed: create_final_text_units
2025-07-22 19:29:36.0521 - INFO - graphrag.api.index - Workflow create_final_text_units completed successfully
2025-07-22 19:29:36.0537 - INFO - graphrag.index.workflows.create_community_reports - Workflow started: create_community_reports
2025-07-22 19:29:36.0538 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-22 19:29:36.0545 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-22 19:29:36.0550 - INFO - graphrag.utils.storage - reading table from storage: communities.parquet
2025-07-22 19:29:36.0570 - INFO - graphrag.index.operations.summarize_communities.graph_context.context_builder - Number of nodes at level=0 => 20
2025-07-22 19:32:37.0302 - INFO - graphrag.logger.progress - level 0 summarize communities progress: 1/3
2025-07-22 19:33:37.0309 - INFO - graphrag.logger.progress - level 0 summarize communities progress: 2/3
2025-07-22 19:34:37.0322 - INFO - graphrag.logger.progress - level 0 summarize communities progress: 3/3
2025-07-22 19:34:37.0336 - INFO - graphrag.index.workflows.create_community_reports - Workflow completed: create_community_reports
2025-07-22 19:34:37.0337 - INFO - graphrag.api.index - Workflow create_community_reports completed successfully
2025-07-22 19:34:37.0358 - INFO - graphrag.index.workflows.generate_text_embeddings - Workflow started: generate_text_embeddings
2025-07-22 19:34:37.0359 - INFO - graphrag.utils.storage - reading table from storage: documents.parquet
2025-07-22 19:34:37.0366 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-22 19:34:37.0372 - INFO - graphrag.utils.storage - reading table from storage: text_units.parquet
2025-07-22 19:34:37.0379 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-22 19:34:37.0386 - INFO - graphrag.utils.storage - reading table from storage: community_reports.parquet
2025-07-22 19:34:37.0399 - INFO - graphrag.index.workflows.generate_text_embeddings - Creating embeddings
2025-07-22 19:34:37.0399 - INFO - graphrag.index.operations.embed_text.embed_text - using vector store lancedb with container_name default for embedding entity.description: default-entity-description
2025-07-22 19:34:37.0432 - INFO - graphrag.index.operations.embed_text.embed_text - uploading text embeddings batch 1/1 of size 500 to vector store
2025-07-22 19:34:38.0123 - INFO - graphrag.index.operations.embed_text.strategies.openai - embedding 31 inputs via 31 snippets using 2 batches. max_batch_size=16, batch_max_tokens=8191
2025-07-22 19:34:40.0141 - INFO - graphrag.logger.progress - generate embeddings progress: 1/2
2025-07-22 19:34:40.0145 - INFO - graphrag.logger.progress - generate embeddings progress: 2/2
[2025-07-22T11:34:40Z WARN  lance::dataset::write::insert] No existing dataset at C:\Users\kongyue\ragtest\output\lancedb\default-entity-description.lance, it will be created
2025-07-22 19:34:40.0514 - INFO - graphrag.index.operations.embed_text.embed_text - using vector store lancedb with container_name default for embedding community.full_content: default-community-full_content
2025-07-22 19:34:40.0520 - INFO - graphrag.index.operations.embed_text.embed_text - uploading text embeddings batch 1/1 of size 500 to vector store
2025-07-22 19:34:40.0524 - INFO - graphrag.index.operations.embed_text.strategies.openai - embedding 3 inputs via 3 snippets using 1 batches. max_batch_size=16, batch_max_tokens=8191
2025-07-22 19:34:41.0600 - INFO - graphrag.logger.progress - generate embeddings progress: 1/1
[2025-07-22T11:34:41Z WARN  lance::dataset::write::insert] No existing dataset at C:\Users\kongyue\ragtest\output\lancedb\default-community-full_content.lance, it will be created
2025-07-22 19:34:41.0626 - INFO - graphrag.index.operations.embed_text.embed_text - using vector store lancedb with container_name default for embedding text_unit.text: default-text_unit-text
2025-07-22 19:34:41.0630 - INFO - graphrag.index.operations.embed_text.embed_text - uploading text embeddings batch 1/1 of size 500 to vector store
2025-07-22 19:34:41.0650 - INFO - graphrag.index.operations.embed_text.strategies.openai - embedding 11 inputs via 11 snippets using 2 batches. max_batch_size=16, batch_max_tokens=8191
2025-07-22 19:34:43.0131 - INFO - graphrag.logger.progress - generate embeddings progress: 1/2
2025-07-22 19:34:43.0213 - INFO - graphrag.logger.progress - generate embeddings progress: 2/2
[2025-07-22T11:34:43Z WARN  lance::dataset::write::insert] No existing dataset at C:\Users\kongyue\ragtest\output\lancedb\default-text_unit-text.lance, it will be created
2025-07-22 19:34:43.0241 - INFO - graphrag.index.workflows.generate_text_embeddings - Workflow completed: generate_text_embeddings
2025-07-22 19:34:43.0242 - INFO - graphrag.api.index - Workflow generate_text_embeddings completed successfully
2025-07-22 19:34:43.0303 - INFO - graphrag.index.run.run_pipeline - Indexing pipeline complete.
2025-07-22 19:34:43.0311 - INFO - graphrag.cli.index - All workflows completed successfully.
import osimport pandas as pd
import tiktokenfrom graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (read_indexer_covariates,read_indexer_entities,read_indexer_relationships,read_indexer_reports,read_indexer_text_units,
)
from graphrag.query.question_gen.local_gen import LocalQuestionGen
from graphrag.query.structured_search.local_search.mixed_context import (LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

在这里插入图片描述

五 局部查询过程

5.1 本地搜索

graphrag query --root ./ --method local --query "什么是graphrag?"
(graphrag) C:\Users\kongyue\ragtest>graphrag query --root ./ --method local --query "什么是graphrag?"
2025-07-23 10:13:10.0195 - INFO - graphrag.storage.file_pipeline_storage - Creating file storage at C:\Users\kongyue\ragtest\output
2025-07-23 10:13:10.0199 - INFO - graphrag.utils.storage - reading table from storage: communities.parquet
2025-07-23 10:13:10.0221 - INFO - graphrag.utils.storage - reading table from storage: community_reports.parquet
2025-07-23 10:13:10.0232 - INFO - graphrag.utils.storage - reading table from storage: text_units.parquet
2025-07-23 10:13:10.0241 - INFO - graphrag.utils.storage - reading table from storage: relationships.parquet
2025-07-23 10:13:10.0250 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-23 10:13:18.0148 - INFO - graphrag.cli.query - Local Search Response:
### GraphRAG OverviewGraphRAG is an innovative framework designed to integrate traditional graph database queries with advanced generative models to effectively extract knowledge graphs from extensive text data. This system is specifically tailored to process large documents or corpora, emphasizing the importance of entity recognition and relationship extraction to construct structured knowledge graphs [Data: Entities (6); Reports (1)].### Key Features and CapabilitiesGraphRAG operates by querying and retrieving information based on the knowledge graphs it builds, employing sophisticated algorithms and language models to enhance its capabilities. The framework not only focuses on the extraction of entities and their interrelations but also incorporates graph-driven semantic search, which allows for more nuanced and context-aware information retrieval. This dual approach significantly improves the generation of responses, making it a powerful tool for users seeking to navigate and utilize large volumes of information efficiently [Data: Entities (6); Reports (1)].### Integration with Advanced TechniquesThe framework utilizes various advanced techniques such as embeddings, semantic matching, and large language models (LLMs) to enhance its functionality. Embeddings provide vector representations of text units, entities, and relationships, enabling similarity calculations essential for effective information retrieval. Semantic matching is employed to rank and filter information based on vector similarity and graph structure, ensuring that the most relevant information is presented to users [Data: Reports (1); Entities (24, 30, 25); Relationships (30, 36, 31)].### Community and Knowledge ExtractionGraphRAG is central to a community focused on knowledge extraction, where techniques like NLP and community clustering play crucial roles. NLP enhances the system's reasoning capabilities, allowing for complex queries and better comprehension of user inputs. Community clustering groups entities and relationships based on similarities, optimizing information processing and retrieval [Data: Reports (1); Entities (8, 16); Relationships (5, 15)].In summary, GraphRAG stands out as a comprehensive system that merges the strengths of graph databases with generative modeling, facilitating the creation and utilization of knowledge graphs for enhanced information processing and retrieval [Data: Entities (6); Reports (1)].

5.2 全局搜索

graphrag query --root ./ --method global--query "什么是graphrag?"
(graphrag) C:\Users\kongyue\ragtest>graphrag query --root ./ --method global --query "什么是graphrag?"
2025-07-23 10:14:02.0385 - INFO - graphrag.storage.file_pipeline_storage - Creating file storage at C:\Users\kongyue\ragtest\output
2025-07-23 10:14:02.0389 - INFO - graphrag.utils.storage - reading table from storage: entities.parquet
2025-07-23 10:14:02.0409 - INFO - graphrag.utils.storage - reading table from storage: communities.parquet
2025-07-23 10:14:02.0419 - INFO - graphrag.utils.storage - reading table from storage: community_reports.parquet
2025-07-23 10:16:08.0209 - INFO - graphrag.cli.query - Global Search Response:
### GraphRAG Framework OverviewGraphRAG is an innovative framework designed to integrate natural language processing (NLP) and advanced generative models for the purpose of extracting knowledge graphs from extensive text data. It serves as a core entity within its community, combining traditional graph database queries with generative models to create structured knowledge graphs from large text corpora. This framework places a strong emphasis on entity recognition and relationship extraction, which are essential for constructing these graphs [Data: Reports (1)].### Enhancing Information RetrievalGraphRAG significantly enhances information retrieval and processing by utilizing NLP techniques. These techniques enable the system to effectively understand and process human language, facilitating the extraction of meaningful insights from text data. This interaction between NLP and GraphRAG is crucial for identifying entities and their interrelations, thereby improving the quality and relevance of the information retrieved [Data: Reports (1)].### Key Processes: Indexing and ClusteringThe framework focuses on processes such as Indexing and Clustering, which are vital for transforming raw text into structured knowledge graphs. Indexing organizes information, thereby enhancing retrieval and reasoning capabilities. Clustering, on the other hand, groups entities based on their relationships and attributes within the knowledge graph, allowing for more efficient data management and retrieval [Data: Reports (0)].### Integration with Large Language ModelsGraphRAG utilizes large language models (LLM) to generate natural language responses based on the context provided by user queries. This integration allows the system to produce coherent and contextually relevant outputs, significantly enhancing the user interaction experience. By leveraging LLMs, GraphRAG can provide more accurate and context-aware responses to user inquiries [Data: Reports (1)].### Optimization TechniquesTo optimize information retrieval and processing, GraphRAG employs techniques such as Community Clustering, Embeddings, and Semantic Matching. These techniques help in organizing the knowledge graph, calculating similarity and relevance, and ranking information based on vector similarity and graph structure. This optimization ensures that the most relevant information is retrieved and presented to the user, improving the overall efficiency and effectiveness of the framework [Data: Reports (1)].In summary, GraphRAG is a comprehensive framework that integrates NLP and generative models to create and manage knowledge graphs, enhancing information retrieval and processing through advanced techniques and processes.
http://www.lryc.cn/news/596411.html

相关文章:

  • 小白成长之路-部署Zabbix7
  • 使用react编写一个简单的井字棋游戏
  • 17.VRRP技术
  • 接口自动化测试种涉及到接口依赖怎么办?
  • 微调大语言模型(LLM)有多难?
  • Google Gemini 体验
  • 深入解析Hadoop中的推测执行:原理、算法与策略
  • kafka查看消息的具体内容 kafka-dump-log.sh
  • SDC命令详解:使用set_min_library命令进行约束
  • Unity笔记——事件中心
  • HTB赛季8靶场 - Mirage
  • 风险识别清单:构建动态化的风险管理体系
  • Java函数式编程深度解析:从基础到高阶应用
  • 技能系统详解(4)——运动表现
  • 哔哩哔哩视觉算法面试30问全景精解
  • 钢铁逆行者:Deepoc具身智能如何重塑消防机器人的“火场直觉”
  • 【中文翻译】SmolVLA:面向低成本高效机器人的视觉-语言-动作模型
  • Vue 3 响应式系统中的 effectScope、watchEffect、effect 和 watch 详解
  • 如何将iPad中的视频传输到电脑(6种简单方法)
  • 单片机学习笔记.单总线one-wire协议(这里以普中开发板DS18B20为例)
  • rabbitmq 03
  • uniapp 报错 Not found ... at view.umd.min.js:1的问题
  • LWIP学习记录2——MAC内核
  • Linux系统安装Bash自动补全(bash-completion)
  • 基于SpringBoot+Uniapp的非遗文化宣传小程序(AI问答、协同过滤算法、Echarts图形化分析)
  • uniapp请求封装上传
  • 最新植物大战僵尸杂交版最新版本2.5.1版,内置触屏+加速+全屏,附PC+安卓+iOS最全安装教程!
  • C#文件操作(创建、读取、修改)
  • Java学习-------事务失效
  • 从“点状用例”到“质量生态”:现代软件测试的演进、困局与破局