-
-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing Error Message (i think). #290
Comments
Idk what happened to the formatting... |
Can you check all your prompts have non-empty output? I recall this |
@NanoCode012 One of them I think is empty. Is there a way to just drop columns with empty things? |
@ashercn97 , unfortunately not with axolotl, I think you can simply load into pandas and drop it. |
@NanoCode012 Okay. I will try to do that now. Thanks so much! |
MY thing is fixed! |
I am getting thsi error:
2023-07-18 01:04:39,361] [WARNING] [axolotl.validate_config:16] [PID:640] batch_size is not recommended. Please use gradient_accumulation_steps instead.
To calculate the equivalent gradient_accumulation_steps, divide batch_size / micro_batch_size / number of gpus.
[2023-07-18 01:04:39,362] [INFO] [axolotl.scripts.train:219] [PID:640] loading tokenizer... openlm-research/open_llama_3b
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:55] [PID:640] EOS: 2 /
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:56] [PID:640] BOS: 1 /
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:57] [PID:640] PAD: None / None
[2023-07-18 01:04:39,438] [DEBUG] [axolotl.load_tokenizer:58] [PID:640] UNK: 0 /
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:82] [PID:640] Unable to find prepared dataset in last_run_prepared/ba96bd8ae0099721227d1c8a23d6d7a4
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:83] [PID:640] Loading raw datasets...
[2023-07-18 01:04:39,438] [INFO] [axolotl.load_tokenized_prepared_datasets:88] [PID:640] No seed provided, using default seed of 42
100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 320.57it/s]
[2023-07-18 01:04:39,971] [INFO] [axolotl.load_tokenized_prepared_datasets:264] [PID:640] tokenizing, merging, and shuffling master dataset
Traceback (most recent call last):
File "/home/studio-lab-user/axolotl/scripts/finetune.py", line 356, in
fire.Fire(train)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/studio-lab-user/axolotl/scripts/finetune.py", line 226, in train
train_dataset, eval_dataset = load_prepare_datasets(
File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 393, in load_prepare_datasets
dataset = load_tokenized_prepared_datasets(
File "/home/studio-lab-user/axolotl/src/axolotl/utils/data.py", line 268, in load_tokenized_prepared_datasets
samples = samples + list(d)
File "/home/studio-lab-user/axolotl/src/axolotl/datasets.py", line 42, in iter
yield self.prompt_tokenizer.tokenize_prompt(example)
File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 116, in tokenize_prompt
tokenized_res_prompt = self._tokenize(
File "/home/studio-lab-user/axolotl/src/axolotl/prompt_tokenizers.py", line 64, in _tokenize
result = self.tokenizer(
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2571, in call
raise ValueError("You need to specify either
text
ortext_target
.")ValueError: You need to specify either
text
ortext_target
.Traceback (most recent call last):
File "/home/studio-lab-user/.conda/envs/python39/bin/accelerate", line 8, in
sys.exit(main())
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/home/studio-lab-user/.conda/envs/python39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/studio-lab-user/.conda/envs/python39/bin/python3.9', 'scripts/finetune.py', 'config/OpenOrcaTest.yml']' returned non-zero exit status 1.
My config settings are:
`
base_model: openlm-research/open_llama_3b
base_model_config: openlm-research/open_llama_3b
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
push_dataset_to_hub:
datasets:
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
adapter: lora
lora_model_dir:
sequence_len: 256
max_packed_sequence_len:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
lora_fan_in_fan_out:
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model:
output_dir: ./openorca-out
batch_size: 16
micro_batch_size: 4
num_epochs: 3
optimizer: adamw_bnb_8bit
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
eval_steps: 50
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "
""eos_token: "
unk_token: ""
'
The text was updated successfully, but these errors were encountered: