-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bart model converted ONNX inference #14222
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey @ZiyueWangUoB by default, the This topology is not currently support for BART, but will be once #14358 is merged. This will allow you to run: python -m transformers.onnx --model=facebook/bart-large-cnn --features=seq2seq-lm onnx/ which will produce and ONNX model whose outputs are FYI you can find the model's output names from the ONNX config, e.g. from transformers import AutoConfig, AutoModelForSeq2SeqLM
from transformers.models.bart import BartOnnxConfig
model_ckpt = "facebook/bart-large-cnn"
config = AutoConfig.from_pretrained(model_ckpt)
onnx_config = BartOnnxConfig(config, task="default")
onnx_config.outputs
# OrderedDict([('last_hidden_state', {0: 'batch', 1: 'sequence'}),
# ('encoder_last_hidden_state', {0: 'batch', 1: 'sequence'})]) |
If i wish to use a distilbart model could i use the linked example directly for beam search? Also the linked issued #14358 has been merged, and i tried using the
|
For anyone in my position: I still have not tried this, but will give an update here when i have! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey @lewtun
Hello, @lewtun I am trying the same scenario, The example guide URL for beam_search is returning 404. (https://github.com/huggingface/transformers/tree/master/examples/onnx/pytorch/summarization) Can you please post the latest URL |
Hey @sorenmc, If you have tried this approach.. can you please attach a code snippet here. It will be mighty helpful |
Hi, @mohanvamsibu-kore summarization example was moved here: https://github.com/huggingface/transformers/tree/master/examples/research_projects/onnx/summarization |
Hi, @TonyMas Thank You. I have implemented summarization from the model "lidiya/bart-large-xsum-samsum", the ONNX model was extremely fast but, I see that the beam_search is very slow which is taking a major chunk of the time (~ 9 secs ) in CPU. I tried with greedy search as well, which is taking ~3-4 secs. so,
|
@TonyMas Can you please help me with the above concerns? I have also tried with the same example provided under https://github.com/huggingface/transformers/tree/master/examples/research_projects/onnx/summarization. It took ~10 secs on GPU. for the input of ~1000 characters. Please let me know if I can reduce the time |
Hey @mohanvamsibu-kore, I am also interested in exporting |
Sorry have been on vacation, and i have sadly not had the time |
I have been testing the Bart + Beam Search to ONNX example but it seems that the attention_mask layer is fixed to the sample input used when exporting the model. Setting it up like the inputs_ids in the dynamic_axes fix the issue. |
Hey @jspablo we're currently discussing internally on the best approach for supporting text generation and other inference tasks within |
any update? |
Found a fix for this yet? |
This allows to basically do the inference with ONNX Runtime, while still using the from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-xsum")
# instead of: `model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-xsum")`
# the argument `from_transformers=True` handles the ONNX export on the fly.
model = ORTModelForSeq2SeqLM.from_pretrained("facebook/bart-large-xsum", from_transformers=True, use_cache=True)
to_summarize = "The Bart model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019."
inputs = tokenizer(to_summarize, return_tensors="pt")
gen_tokens = model.generate(**inputs)
outputs = tokenizer.batch_decode(gen_tokens)
print(outputs)
# prints: ['</s>A new model for training artificial intelligence systems has been proposed by a group of researchers at the University of Oxford.</s>'] Alternatively, you can export the model offline and load it later:
|
thought the main selling point of using ONNX is speed. but the inference using ORTModelForSeq2SeqLM: |
Could you give me your transformers and optimum versions? There is a critical bug if you use transformers==4.26 and optimum==1.6.3, it has been fixed in the 1.6.4 release. If you would like to open an issue in Optimum repo with a reproducible script, I can have a look from there! |
Hi, I followed the instructions to convert BART-LARGE-CNN model to ONNX here (https://github.com/huggingface/transformers/blob/master/docs/source/serialization.rst) using transformers.onnx script. The model was exported fine and I can run inference.
However, the results of the inference, from the 'last_hideen_state' are in logits (I think)? How can I parse this output for summarization purposes?
Here are screenshots of what I've done.
This is the resulting output from those two states:

The text was updated successfully, but these errors were encountered: