-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to apply BPE or convert BPE-encoded JSON to MIDI #137
Comments
Hi, this is due to a i/o format mismatch. Here is how you can fix it: tokens = tokenizer(midi) # list of TokSequence
tokenizer.save_tokens(tokens, "./test.json")
# load json
tokens = tokenizer.load_tokens(out_path) # dictionary
# here tokens["ids"]'s value is a list of list of integers (first dim is track, second tokens)
# Create the TokSequence from the ids of the first sequence in the dictionary
tokens = TokSequence(ids=tokens["ids"][0]) But now that I think of it, it would be better indeed if all of this could be handled directly within |
Thanks, another problem left. Now I want to convert BPE-encoded JSON to MIDI, but I'm unable to do so successfully no matter what I try.
Error 1
Error 2
|
Thanks, another problem left. Now I want to convert BPE-encoded JSON to MIDI, but I'm unable to do so successfully no matter what I try.
Error 1
Error 2
|
Could you provide the tokenizer config and tokens files? |
bpe json is created by tokenizer.tokenize_midi_dataset( # 2 velocity and 1 duration values
midi_aug_paths,
tokens_dir,
# midi_valid,
) MIDI-Unprocessed_SMF_02_R1_2004_01-05_ORIG_MID--AUDIO_02_R1_2004_05_Track05_wav.json tokenizer is created by augment_midi_dataset(
dataset_dir,
pitch_offsets=[-12, 12],
velocity_offsets=[-4, 5],
duration_offsets=[-0.5, 1],
out_path=midi_aug_path,
)
tokenizer.learn_bpe(vocab_size=30000, files_paths=midi_aug_paths) |
i/o format mismatch again. bpe_tokens_seq = miditok.TokSequence(ids=bpe_tokens["ids"][0], ids_bpe_encoded=True)
bpe_tokens_midi = tokenizer.tokens_to_midi([bpe_tokens_seq]) Here the tokenizer has a |
|
This should do it: bpe_tokens_seq = miditok.TokSequence(ids=bpe_tokens["ids"][0], ids_bpe_encoded=True)
tokenizer.decode_bpe(bpe_tokens_seq)
tokenizer.complete_sequence(bpe_tokens_seq) # Error 1 (show below)
bpe_tokens_midi = tokenizer.tokens_to_midi([bpe_tokens_seq]) Note for myself: automatic BPE decoding we preprocessing TokSequences in |
What would have work out of the box: bpe_tokens = tokenizer.load_tokens(
"MIDI-Unprocessed_SMF_02_R1_2004_01-05_ORIG_MID--AUDIO_02_R1_2004_05_Track05_wav.json"
)
bpe_tokens_midi = tokenizer.tokens_to_midi(bpe_tokens["ids"])
This should be fixed in the PR referencing this issue. |
Thanks, it's ok now |
The two i/o automations mentioned here have been merge, closing this! |
When I try to convert a BPE-encoded JSON file to MIDI, I keep encountering errors.
I suspect there might be some errors in the BPE encoding/decoding code, so I have run the code provided in the documentation(https://miditok.readthedocs.io/en/latest/bpe.html), but exceptions are still being thrown.
errors here:
Would you like me to provide a runnable code snippet to convert BPE-encoded JSON to MIDI?
The text was updated successfully, but these errors were encountered: