Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on aguvis-stage2 dataset versions #16

Open
korbinian-hoermann opened this issue Jan 18, 2025 · 0 comments
Open

Clarification on aguvis-stage2 dataset versions #16

korbinian-hoermann opened this issue Jan 18, 2025 · 0 comments

Comments

@korbinian-hoermann
Copy link

Hey! Congrats for the great work and thank you for releasing the data!

Could you clarify the meaning of the different data files "xyz-l1", "xyz-l2, "xyz-l3" ?
It seems like all of them contain conversations for all images, with different "reasoning" variations, e.g. in guiact-web-multi the image uid_record_07674_step_00.png:

  • guiact-web-multi-l1.json
    Action: Click on the link labeled 'Judith Lauand: Brazilian 1922-2022' to explore more about her career and exhibitions.
    pyautogui.click(x=0.41, y=0.178)

  • guiact-web-multi-l2.json
    Thought: The goal is to gather information about Judith Lauand’s career, works, and exhibitions. The list provides various leads, each likely directing to detailed pages about specific aspects of her career. Selecting an option from the dropdown is essential to access more detailed information.
    Action: Click on the link labeled 'Judith Lauand: Brazilian 1922-2022' to explore more about her career and exhibitions.
    pyautogui.click(x=0.41, y=0.178)

  • guiact-web-multi-l3.json
    Observation: The screenshot shows a dropdown menu on MutualArt with search results for 'Judith Lauand'. There are multiple entries detailing her
    exhibitions and mentions, such as in São Paulo, London, and various art reviews.
    Thought: The goal is to gather information about Judith Lauand’s career, works, and exhibitions. The list provides various leads, each likely directing to detailed pages about specific aspects of her career. Selecting an option from the dropdown is essential to access more detailed information.
    Action: Click on the link labeled 'Judith Lauand: Brazilian 1922-2022' to explore more about her career and exhibitions.
    pyautogui.click(x=0.41, y=0.178)

Section C.1 TRAINING EXAMPLE SCHEMA in the paper describe the training data schema for stage 2 as

<|im_start|>assistant<|recipient|>all
Observation: {Observation}
Thought: {Planning}
Low-level Instruction: {Low-level Instruction}
<|im_end|>
<|im_start|>assistant<|recipient|>os
Action: {pyautogui function}
<|diff_marker|>

This is closest to the *-l3 version, but without the low-level instruction. What is the reason for this ?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant