The Training Data Pipeline #10

lixu6-alt · 2024-10-30T06:06:08Z

Dear authors:

MoVA is a really impressive work! I am working on a similar idea of using the text instruction to guide the fusion of image tokens in MLLMs. However, I encountered an issue thesedays: the LLaVA-665K finutuning dataset contains a lot of multi-turn conversations which means one sample can involve multiple instructions . In this case, do we need to split each multi-turn conversation sample into multiple single-turn conversation samples (since we can only encode one text instruction for one sample in a forward computation)?

Thanks!

TempleX98 · 2024-11-17T07:21:08Z

During training, we keep the original data format and directly concatenate these multi-round questions into a single question for instruction-aware extraction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Training Data Pipeline #10

The Training Data Pipeline #10

lixu6-alt commented Oct 30, 2024

TempleX98 commented Nov 17, 2024

The Training Data Pipeline #10

The Training Data Pipeline #10

Comments

lixu6-alt commented Oct 30, 2024

TempleX98 commented Nov 17, 2024