We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protobuf
transformers
@ArthurZucker @sanchit-gandhi
examples
pip uninstall protobuf
T5Tokenizer
from transformers import T5Tokenizer tokenizer = T5Tokenizer.from_pretrained("t5-base")
Traceback:
UnboundLocalError Traceback (most recent call last) Cell In[2], line 1 ----> 1 tokenizer = T5Tokenizer.from_pretrained("t5-base") File ~/transformers/src/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs) 1851 else: 1852 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}") -> 1854 return cls._from_pretrained( 1855 resolved_vocab_files, 1856 pretrained_model_name_or_path, 1857 init_configuration, 1858 *init_inputs, 1859 token=token, 1860 cache_dir=cache_dir, 1861 local_files_only=local_files_only, 1862 _commit_hash=commit_hash, 1863 _is_local=is_local, 1864 **kwargs, 1865 ) File ~/transformers/src/transformers/tokenization_utils_base.py:2017, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs) 2015 # Instantiate tokenizer. 2016 try: -> 2017 tokenizer = cls(*init_inputs, **init_kwargs) 2018 except OSError: 2019 raise OSError( 2020 "Unable to load vocabulary from file. " 2021 "Please check that the provided vocabulary is accessible and not corrupted." 2022 ) File ~/transformers/src/transformers/models/t5/tokenization_t5.py:194, in T5Tokenizer.__init__(self, vocab_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, sp_model_kwargs, legacy, **kwargs) 191 self.vocab_file = vocab_file 192 self._extra_ids = extra_ids --> 194 self.sp_model = self.get_spm_processor() File ~/transformers/src/transformers/models/t5/tokenization_t5.py:200, in T5Tokenizer.get_spm_processor(self) 198 with open(self.vocab_file, "rb") as f: 199 sp_model = f.read() --> 200 model_pb2 = import_protobuf() 201 model = model_pb2.ModelProto.FromString(sp_model) 202 if not self.legacy: File ~/transformers/src/transformers/convert_slow_tokenizer.py:40, in import_protobuf() 38 else: 39 from transformers.utils import sentencepiece_model_pb2_new as sentencepiece_model_pb2 ---> 40 return sentencepiece_model_pb2 UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment
This is occurring because we do import_protobuf in the init:
import_protobuf
transformers/src/transformers/models/t5/tokenization_t5.py
Line 200 in 85cf90a
But import_protobuf is ill-defined in the case that protobuf is not available:
transformers/src/transformers/convert_slow_tokenizer.py
Lines 32 to 40 in cb8e3ee
=> if protobuf is not installed, then sentencepiece_model_pb2 will be un-defined
sentencepiece_model_pb2
Has protobuf been made a soft-dependency for T5Tokenizer inadvertently in #24622? Or can sentencepiece_model_pb2 be defined without protobuf?
Use T5Tokenizer without protobuf
The text was updated successfully, but these errors were encountered:
Sentencepiece
legacy
See the link PR for a fix
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
System Info
transformers
version: 4.32.0.dev0Who can help?
@ArthurZucker @sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
protobuf
is uninstalled:T5Tokenizer
:Traceback:
This is occurring because we do
import_protobuf
in the init:transformers/src/transformers/models/t5/tokenization_t5.py
Line 200 in 85cf90a
But
import_protobuf
is ill-defined in the case thatprotobuf
is not available:transformers/src/transformers/convert_slow_tokenizer.py
Lines 32 to 40 in cb8e3ee
=> if
protobuf
is not installed, thensentencepiece_model_pb2
will be un-definedHas
protobuf
been made a soft-dependency for T5Tokenizer inadvertently in #24622? Or cansentencepiece_model_pb2
be defined withoutprotobuf
?Expected behavior
Use T5Tokenizer without
protobuf
The text was updated successfully, but these errors were encountered: