Document upload to the Dify knowledge base does not support Chinese word segmentation.(dify知识库文档上传不支持中文分词) #15056
viphonestsong
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
版本Dify-1.0.0
Version: Dify-1.0.0
知识库如果用全文索引或者组合索引进行召回, 中文内容是不会被检索到的, 以至于召回率不正常. 全文检索召回不到内容, 混合索引影响召回结果.
If the knowledge base uses full-text indexing or composite indexing for retrieval, Chinese content cannot be retrieved, resulting in an abnormal recall rate. Full-text retrieval fails to retrieve the content, and composite indexing affects the retrieval results.
处理办法: 我目前将知识库文章进行预处理(用分词器现将文章分词), 然后上传知识库, 测试结果正常的了很多.
Solution: Currently, I pre - process the knowledge base articles (first segment the articles using a word segmenter) and then upload them to the knowledge base. The test results are much better.
目前想法: 准备今天研究一下源码, 想让dify可以在上传时候调用配置的分词器进行处理. 如果有此方面感兴趣的朋友也请留言交流下使用经验.
Current plan: I'm going to study the source code today. I hope to make Dify call the configured word segmenter for processing during the upload process. If there are friends interested in this area, please leave a message to share your usage experience.
Beta Was this translation helpful? Give feedback.
All reactions