Model multilingual-e5-small fails to start #583

joshdevins · 2023-08-31T09:01:39Z

At present, the model is processed and uploaded fine, but when starting the model, it fails:

forward() is missing value for argument 'token_type_ids'.

The model is trained from Multilingual-MiniLM which is a BERT model, but uses the XLM-RoBERTa tokenizer. Since we wrap models based on their architecture and not on the tokenizer type, the BERT model expects input that isn't coming from the XLM-RoBERTa tokenizer. We should consider changing how we decide which wrapper to use (three inputs or two) based on the tokenizer instead.

Note that the base and large model variants work fine because they are XLM-RoBERTa models, and use the corresponding tokenizer.

The text was updated successfully, but these errors were encountered:

serenachou · 2023-09-12T14:51:04Z

@srikanthmanvi @pquentin @technige hey ya'll, wanted to flag that this issue in eland is preventing users from using the e5 small model (easiest to use) in Elasticsearch. It would be amazing if the next version that released for Eland fixed this issue so we can provide support for the e5 small model.

srikanthmanvi · 2023-09-12T15:07:31Z

@srikanthmanvi @pquentin @technige hey ya'll, wanted to flag that this issue in eland is preventing users from using the e5 small model (easiest to use) in Elasticsearch. It would be amazing if the next version that released for Eland fixed this issue so we can provide support for the e5 small model.

@serenachou thanks for the ping. We will prioritize this.

joshdevins · 2023-09-12T15:56:55Z

Please note that this applies only to the multilingual E5 model. The normal e5-small-v2 works just fine.

serenachou · 2023-10-09T14:09:21Z

@srikanthmanvi if this isn't on your radar for 8.12 - we would love this to be included into 8.12, or any earlier version that you and @pquentin are cooking up because after 8.12, we would likely be looking to prepared models, so this work would be less effective as a way to encourage customers to use this model for multilingual use cases

joshdevins · 2023-10-11T14:53:52Z

Discussed f2f, @davidkyle will have a look and I will support.

ialdencoots · 2023-11-23T11:02:07Z

Any reason to support intfloat/multilingual-e5-small but not intfloat/multilingual-e5-base or intfloat/multilingual-e5-large? It would be nice if those models worked as well.

ialdencoots · 2023-11-23T11:51:53Z

Taking a look at the code updates, it seems this fix should affect the base and large models as well. My issue may be unrelated to this then, but I'm experiencing an issue where the first infer request after deployment works for the larger models, but the second request and onward returns a vector of all zeros. multilingual-e5-small works as expected though. Is this likely to be an eland bug or something better addressed in the elasticsearch repo?

davidkyle · 2023-11-23T14:53:33Z

@ialdencoots thanks for reporting the problem I have reproduced it myself. You can track the issue at elastic/elasticsearch#102541

The bug fix linked above only applies to the small model, the error you are seeing is a different issue.

intfloat/multilingual-e5-small works well Elastic, but note that the E5 models are trained with prefix strings which should be used for information retrieval. See https://huggingface.co/intfloat/multilingual-e5-base#faq. Prefix string support has been added to Elasticsearch in elastic/elasticsearch#102089 and will be available in the next release (8.12)

joshdevins added bug Something isn't working topic:NLP Issue or PR about NLP model support and eland_import_hub_model labels Aug 31, 2023

joshdevins self-assigned this Aug 31, 2023

joshdevins changed the title ~~Fix support of multilingual-e5-small~~ Model multilingual-e5-small fails to start Aug 31, 2023

joshdevins removed their assignment Aug 31, 2023

joshdevins assigned joshdevins and davidkyle Oct 11, 2023

joshdevins mentioned this issue Oct 19, 2023

[NLP] Support E5 small multi-lingual #625

Merged

davidkyle closed this as completed in #625 Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model multilingual-e5-small fails to start #583

Model multilingual-e5-small fails to start #583

joshdevins commented Aug 31, 2023 •

edited

Loading

serenachou commented Sep 12, 2023

srikanthmanvi commented Sep 12, 2023

joshdevins commented Sep 12, 2023

serenachou commented Oct 9, 2023

joshdevins commented Oct 11, 2023

ialdencoots commented Nov 23, 2023 •

edited

Loading

ialdencoots commented Nov 23, 2023 •

edited

Loading

davidkyle commented Nov 23, 2023

Model multilingual-e5-small fails to start #583

Model multilingual-e5-small fails to start #583

Comments

joshdevins commented Aug 31, 2023 • edited Loading

serenachou commented Sep 12, 2023

srikanthmanvi commented Sep 12, 2023

joshdevins commented Sep 12, 2023

serenachou commented Oct 9, 2023

joshdevins commented Oct 11, 2023

ialdencoots commented Nov 23, 2023 • edited Loading

ialdencoots commented Nov 23, 2023 • edited Loading

davidkyle commented Nov 23, 2023

joshdevins commented Aug 31, 2023 •

edited

Loading

ialdencoots commented Nov 23, 2023 •

edited

Loading

ialdencoots commented Nov 23, 2023 •

edited

Loading