Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error:AutoTokenizer.from_pretrained,UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

Closed
2 of 4 tasks
duweidongzju opened this issue Aug 30, 2023 · 6 comments

Comments

@duweidongzju
Copy link

System Info

  • transformers version: 4.32.1
  • Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.8.17
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.2
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

my code

import torch
from transformers import AutoTokenizer

model_name_or_path = 'llama-2-7b-hf'
use_fast_tokenizer = False
padding_side = "left"
config_kwargs = {'trust_remote_code': True, 'cache_dir': None, 'revision': 'main', 'use_auth_token': None}
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=use_fast_tokenizer, padding_side=padding_side, **config_kwargs)

the error is

Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 727, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 156, in init
self.sp_model = self.get_spm_processor()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 164, in get_spm_processor
model_pb2 = import_protobuf()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 40, in import_protobuf
return sentencepiece_model_pb2
UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment

Expected behavior

what I need to do to solve the problem

@duweidongzju
Copy link
Author

I have solved this problem via pip install google and pip install protobuf

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 30, 2023

@ArthurZucker It looks get_spm_processor is recent, and especially the import of protobuf inside it. If this import and usage of protobuf is necessary, could you add require_backend so the users could get a more precise error message if they are not installed. Thanks!

@polm-stability
Copy link

I also ran into this - probably related to #25224?

@ydshieh
Copy link
Collaborator

ydshieh commented Aug 30, 2023

@polm-stability While waiting @ArthurZucker to answer, you can follow this comment

@ArthurZucker
Copy link
Collaborator

Pretty sure #25684 was merged on the latest version, but if you are using a legacy = False sentencepiece` tokenizer you need protobuf

@ArthurZucker
Copy link
Collaborator

Closing as #25684 fixes the issue 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants