20240729 大模型评测
参考:
MMBench:基于ChatGPT的全方位多模能力评测体系_哔哩哔哩_bilibili
https://en.wikipedia.org/wiki/Levenshtein_distance
cider: https://zhuanlan.zhihu.com/p/698643372
GitHub - open-compass/opencompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.