[报错解决] 运行MATCHA时需要在线下载Arial.TTF字体,但是无法连接huggingface
一、报错详情
requests.exceptions.ConnectTimeout:(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443):
Max retries exceeded with url: /ybelkada/fonts/resolve/main/Arial.TTF (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f5295722ce0>,
'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: a5b5b41d-c258-46b6-8e40-0200bc4cb62b)')The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/MATCHA/workdir/matcha_test.py", line 11, in <module>inputs = processor(images=image, text="Is the sum of all 4 places greater than Laos?", return_tensors="pt")File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/processing_pix2struct.py", line 109, in __call__encoding_image_processor = self.image_processor(File "/miniconda3/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 552, in __call__return self.preprocess(images, **kwargs)File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 437, in preprocessimages = [File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 438, in <listcomp>render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 169, in render_headerheader_image = render_text(header, **kwargs)File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 128, in render_textfont = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_freturn f(*args, **kwargs)File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fnreturn fn(*args, **kwargs)File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_downloadreturn _hf_hub_download_to_cache_dir(File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir_raise_on_head_call_error(head_call_error, force_download, local_files_only)File "/miniconda3/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1857, in _raise_on_head_call_errorraise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
二、报错分析
代码运行过程中需要从huggingface上下载“/ybelkada/fonts/resolve/main/Arial.TTF”,但是由于我是在服务器上运行项目,所以无法连接huggingface,导致连接超时报错。
具体导致报错的代码是:
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 128, in render_textfont = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")
三、问题解决
进入上述报错位置(image_processing_pix2struct.py)后,发现代码逻辑是:
if font_bytes is not None and font_path is None:font = io.BytesIO(font_bytes)elif font_path is not None:font = font_pathelse:font = hf_hub_download(DEFAULT_FONT_PATH, "Arial.TTF")font = ImageFont.truetype(font, encoding="UTF-8", size=text_size)
所以问题根源在于font_path == None。
经过逐层向上搜寻,发现font_path赋值位置
File "/miniconda3/lib/python3.10/site-packages/transformers/models/pix2struct/image_processing_pix2struct.py", line 438, in <listcomp>render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)
font_path = kwargs.pop("font_path", None)
if isinstance(header_text, str):header_text = [header_text] * len(images)images = [render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)for i, image in enumerate(images)]
但是打印kwargs发现是一个空字典,所以修改config.json文件并无法传入font_path参数,最终直接原地修改,Arial.ttf要直接从huggingface下载然后传到服务器上。
font_path = kwargs.pop("font_path", None)if font_path == None:font_path = "YOUR_Arial.ttf_PATH"if isinstance(header_text, str):header_text = [header_text] * len(images)images = [render_header(image, header_text[i], font_bytes=font_bytes, font_path=font_path)for i, image in enumerate(images)]