Confusing Error Message (i think). #290

ashercn97 · 2023-07-18T01:09:42Z

I am getting thsi error:

2023-07-18 01:04:39,361] [WARNING] [axolotl.validate_config:16] [PID:640] batch_size is not recommended. Please use gradient_accumulation_steps instead.
To calculate the equivalent gradient_accumulation_steps, divide batch_size / micro_batch_size / number of gpus.
[2023-07-18 01:04:39,362] [INFO] [axolotl.scripts.train:219] [PID:640] loading tokenizer... openlm-research/open_llama_3b
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:55] [PID:640] EOS: 2 /
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:56] [PID:640] BOS: 1 /
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:57] [PID:640] PAD: None / None
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:58] [PID:640] UNK: 0 /
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:82] [PID:640] Unable to find prepared dataset in last_run_prepared/ba96bd8ae0099721227d1c8a23d6d7a4
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:83] [PID:640] Loading raw datasets...
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:88] [PID:640] No seed provided, using default seed of 42
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 320.57it/s]
[2023-07-18 01:04:39,971] [INFO] [axolotl.load_tokenized_prepared_datasets:264] [PID:640] tokenizing, merging, and shuffling master dataset
Traceback (most recent call last):
File "/home/studio-lab-user/axolotl/scripts/finetune.py", line 356, in
fire.Fire(train)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/studio-lab-user/axolotl/scripts/finetune.py", line 226, in train
train_dataset, eval_dataset = load_prepare_datasets(
File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 393, in load_prepare_datasets
dataset = load_tokenized_prepared_datasets(
File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 268, in load_tokenized_prepared_datasets
samples = samples + list(d)
File "/home/studio-lab-user/axolotl/src/axolotl/datasets.py", line 42, in iter
yield self.prompt_tokenizer.tokenize_prompt(example)
File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 116, in tokenize_prompt
tokenized_res_prompt = self._tokenize(
File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 64, in _tokenize
result = self.tokenizer(
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2571, in call
raise ValueError("You need to specify either text or text_target.")
ValueError: You need to specify either text or text_target.
Traceback (most recent call last):
File "/home/studio-lab-user/.conda/envs/python39/bin/accelerate", line 8, in
sys.exit(main())
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/studio-lab-user/.conda/envs/python39/bin/python3.9', 'scripts/finetune.py', 'config/OpenOrcaTest.yml']' returned non-zero exit status 1.

My config settings are:
`
base_model: openlm-research/open_llama_3b
base_model_config: openlm-research/open_llama_3b
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
push_dataset_to_hub:
datasets:

path: ashercn97/Testing
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
adapter: lora
lora_model_dir:
sequence_len: 256
max_packed_sequence_len:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:

gate_proj

down_proj

up_proj

q_proj

v_proj

k_proj

o_proj
lora_fan_in_fan_out:
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model:
output_dir: ./openorca-out
batch_size: 16
micro_batch_size: 4
num_epochs: 3
optimizer: adamw_bnb_8bit
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
eval_steps: 50
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: ""
eos_token: ""
unk_token: ""
'

The text was updated successfully, but these errors were encountered:

ashercn97 · 2023-07-18T01:09:57Z

Idk what happened to the formatting...

NanoCode012 · 2023-07-18T02:35:25Z

Can you check all your prompts have non-empty output?

I recall this You need to specify either text or text_target could mean the tokenizer has nothing to tokenize.

ashercn97 · 2023-07-18T16:11:55Z

@NanoCode012 One of them I think is empty. Is there a way to just drop columns with empty things?

NanoCode012 · 2023-07-18T20:32:38Z

@ashercn97 , unfortunately not with axolotl, I think you can simply load into pandas and drop it.

ashercn97 · 2023-07-18T23:09:32Z

@NanoCode012 Okay. I will try to do that now. Thanks so much!

ashercn97 · 2023-07-19T14:33:48Z

MY thing is fixed!

ashercn97 closed this as completed Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing Error Message (i think). #290

Confusing Error Message (i think). #290

ashercn97 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

NanoCode012 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

NanoCode012 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

ashercn97 commented Jul 19, 2023

Confusing Error Message (i think). #290

Confusing Error Message (i think). #290

Comments

ashercn97 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

NanoCode012 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

NanoCode012 commented Jul 18, 2023

ashercn97 commented Jul 18, 2023

ashercn97 commented Jul 19, 2023