Error：AutoTokenizer.from_pretrained，UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

duweidongzju · 2023-08-30T02:50:59Z

System Info

transformers version: 4.32.1
Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
Python version: 3.8.17
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.2
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

my code

import torch
from transformers import AutoTokenizer

model_name_or_path = 'llama-2-7b-hf'
use_fast_tokenizer = False
padding_side = "left"
config_kwargs = {'trust_remote_code': True, 'cache_dir': None, 'revision': 'main', 'use_auth_token': None}
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=use_fast_tokenizer, padding_side=padding_side, **config_kwargs)

the error is

Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 727, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 156, in init
self.sp_model = self.get_spm_processor()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 164, in get_spm_processor
model_pb2 = import_protobuf()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 40, in import_protobuf
return sentencepiece_model_pb2
UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment

Expected behavior

what I need to do to solve the problem

The text was updated successfully, but these errors were encountered:

duweidongzju · 2023-08-30T07:08:36Z

I have solved this problem via pip install google and pip install protobuf

ydshieh · 2023-08-30T09:52:43Z

@ArthurZucker It looks get_spm_processor is recent, and especially the import of protobuf inside it. If this import and usage of protobuf is necessary, could you add require_backend so the users could get a more precise error message if they are not installed. Thanks!

polm-stability · 2023-08-30T10:26:15Z

I also ran into this - probably related to #25224?

ydshieh · 2023-08-30T10:35:14Z

@polm-stability While waiting @ArthurZucker to answer, you can follow this comment

ArthurZucker · 2023-08-30T10:40:05Z

Pretty sure #25684 was merged on the latest version, but if you are using a legacy = False sentencepiece` tokenizer you need protobuf

ArthurZucker · 2023-09-04T12:34:29Z

Closing as #25684 fixes the issue 😉

ArthurZucker closed this as completed Sep 4, 2023

pipparichter mentioned this issue Feb 28, 2024

Error when trying to load T5Tokenizer pipparichter/selenobot#6

Closed

machuofan mentioned this issue Jul 4, 2024

test , find error, local variable 'sentencepiece_model_pb2' referenced before assignment FoundationVision/Groma#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error：AutoTokenizer.from_pretrained，UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

Error：AutoTokenizer.from_pretrained，UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

duweidongzju commented Aug 30, 2023

duweidongzju commented Aug 30, 2023

ydshieh commented Aug 30, 2023

polm-stability commented Aug 30, 2023

ydshieh commented Aug 30, 2023

ArthurZucker commented Aug 30, 2023

ArthurZucker commented Sep 4, 2023

Error：AutoTokenizer.from_pretrained，UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

Error：AutoTokenizer.from_pretrained，UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

Comments

duweidongzju commented Aug 30, 2023

System Info

Who can help?

Information

Tasks

Reproduction

my code

the error is

Expected behavior

duweidongzju commented Aug 30, 2023

ydshieh commented Aug 30, 2023

polm-stability commented Aug 30, 2023

ydshieh commented Aug 30, 2023

ArthurZucker commented Aug 30, 2023

ArthurZucker commented Sep 4, 2023