-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make user-submitted data available for new tasks #44
Comments
Hi, I would like to contribute to this task, but before this I have a couple of questions to ask:
Also are there any other docs describing the algorithm, data scheme and interaction protocol you would like to implement besides the readmes and high level protocol architecture article on Notion? This would help a lot in understanding :) |
hey thanks so much for being here :)
the idea (which might not yet be fully reflected in the code) is that the interactions of the system with the users result in tree-structures. At the root is a prompt, which a user has entered. We then send out this prompt as a task to multiple users, each of which is tasked to "play" the assistant and give a response. Given those responses, we can do two things: First, we can have other people rank the responses, which will be useful for training the reward model. Second, we can take each response and form a thread from (initial prompt, assistant response), and give each of these threads as a task to multiple users. Thus, each thread results in multiple replies from users (who are now "playing" the user). Again, we can rank those and also combine each (initial prompt, assistant reply, user reply) as one thread to continue the tasks. So, in my view, a thread is a chain of messages between user and assistant. We build threads by going down one path of the tree we collect by people fulfilling their tasks. I hope that makes it a bit more clear and answers the second question: The result is always a linear conversation, but during data collection, we collect a tree.
Not yet, no, but you're right it's important we improve this! |
Hi there, thank you for an explanation, it helped a lot :) I guess PostgreSQL is capable of recursive queries to walk over the posts of one thread using parent_id, but for now I'd suggest a much simpler way to sample a conversation:
Later it might also be useful to have a more customizable way to pick the last message of a conversation (the one we choose at 3.) instead of a random one. I also have some suggestions for the data model:
As a bonus I've also updated the sequence diagram of the interaction between user and backend. Now it should better reflect what happens at the time of a request: sequenceDiagram
par Task Creation
User DMs / Channel->>Frontend: Asks for a new task
Frontend ->> Backend: Requests new task
Backend ->> DB: Inserts new task and links it<br/>to the thread and the parent post
DB ->> Backend: Generates task_id
Backend ->> Frontend: Returns the task with type and payload
alt Frontend displays the task
Frontend ->> User DMs / Channel: Presents the task in a message and asks the user to interact
Frontend ->> Backend: Sends ACK with presented "task description message" ID
Backend ->> DB: Updates task with "task description message" ID<br/>and set "ack"=True
else Frontend failed to present the task
Frontend ->> Backend: Sends NACK for a given task
Backend ->> DB: Updates the task with "ack"=False
end
end
par Task Fulfillment
User DMs / Channel ->> Frontend: Interacts with the post (reply, rating, etc.)
Frontend ->> Backend: Posts interaction along with "task description message" ID
Backend ->> DB: Updates the task with "done"=True
Backend ->> DB: Inserts new post with the interaction data<br/>and links it to the parent post using the data from the task
Backend ->> Frontend: Sends "Task Done"
Frontend ->> User DMs / Channel: Replies "thank you"
end
There may be an insignificant theoretical race condition when the user responds before ACK has been delivered, but I think backend should back this request off and frontend should handle it. |
Yes, you're right. The initial goal was to make the frontends as stateless as possible, but this seems less and less viable, so we'll have to live with a degree of state, i.e. for the frontend to at least be able to link some reply to the initial message of the task (could be done by traversing reply structures). The issue is that there might not always be the exact mapping of 1 task description <-> 1 message, but I guess when that breaks, we can think of a solution then.
Agree
I suggest we attach some "depth" level to posts, i.e. how far down the conversation they are, so always their parent's depth plus 1, then we can either have a hard cap on depth or we just sample in inverse proportion to depth or so.
The current idea is that threads are turn-taking, but does not distinguish whether the system created one message or a human. I.e. Thanks a lot for updating the diagram, very good thinking! |
Currently I think the main problem is the task generation. When the backend receives the user's reply it adds it nicely as child-post in |
The posts table has a |
@mjagkow I can quickly make some of your suggested changes. What is your state working on this? Are you on the OA discord? |
We want to be able to run fully on user-submitted data. Thus, when a user is asked to provide a prompt, and does so, we would like to be able to then re-use that prompt for other tasks, for example the "rank prompts" tasks, but also the "act as assistant" task, where a user is asked to answer as if they were the assistant to a given prompt. That prompt should come from the database of user-submitted prompts.
Extensions to this could be that the user-submitted prompts are sampled according to how well they are ranked against other submitted prompts in the ranking task, but it's not necessary for start.
The same requirement exists for when users submit user-answers or assistant-answers, we would also like to be able to re-use those as data for further tasks (so we can build up conversations over time, one message per task).
The text was updated successfully, but these errors were encountered: