Database schema

We will use MongoDB for our database needs. MongoDB being no-SQL natively supports JSON structured documents, which will make it straightfoward to import/export data locally as needed.

In MongoDB, documents are grouped and stored in "collections", which are equivalent to tables in a traditional RDBMS sense. For more details, see: https://www.mongodb.com/docs/manual/core/databases-and-collections/

Courses

This collection stores data about the courses that are present in the dataset. This can be used, for example, to filter documents by relevant course during training.

[
    {
        "_id": "11111111",  // [auto-generated]
        "name": "DS100",    // [unique]
        "uri": "https://ds100.org/fa21/",
        // [optional] other meta data, licenses, etc.
    },
    // ...
]

Materials

This collection stores all information pertaining to raw documents: HTML files, PDFs, PPTs, etc. Each material is linked to its course, and holds the raw original content for reference.

[
    {
        "_id": "22222222",                                       // [auto-generated]
        "course": "ObjectId(11111111)",                          // [foreign key]
        "name": "",                                              // [optional]
        "uri": "https://ds100.org/fa21/grad_proj/gradproject/",  // [unique]
        "type": "html",                                          // [html|pdf|ppt]
        "raw": "<!DOCTYPE html>\n<html lang=\"en-US\">\n<head>\n  <meta charset=\"UTF-8\">\n <title>Graduate Project - Data 100</title>\n\n ..."
    },
    // ...
]

Documents

This collection stores documents in a generic format that is most relevant for model training. Each document is titled and sectioned with tags, to allow flexible customization during pre-processing.

[
    {
        "_id": "33333333",                     // [auto-generated]
        "material": "ObjectId(22222222)",      // [foreign key]
        "title": "Graduate Project\nRubrics",  // [unique], \n separated if hierarchial
        "contents": [
            {
                "tags": "plain", // [code|table|list|plain]
                "text": "Each group will peer grade the projects from another group. The review will be graded out of a total of 15 points."
            },
            {
                "tags": "list",
                "text": "• A summary of the report (5 points). The summary should address at least the following: •• What research question does the group propose? Why is it important?"
            }
            // ...
        ]
    }
    // ...
]

Forums

This collection stores the raw data for question-answers, collected from forums such as Piazza.

[
    {
        "_id": "44444444",                                // [auto-generated]
        "course": "ObjectId(11111111)",                   // [foreign key]
        "uri": "https://piazza.com/class/ksqyjn4qfo7c5",  // [unique]
        "type": "piazza",                                 // [piazza|<future_extensions>]
        "raw": "[{\"folders\": [\"logistics\", \"other\"], \"nr\": 1421, \"created\": \"2019-12-30T23:20:16Z\", \"bucket_order\": 0, \"no_answer_followup\": 0}]"
    }
    // ...
]

Question-Answers

[
    {
        "course": "ObjectId(44444444)",                          // [Course ref]
        "_id": "55555555",                                       // [Piazza ID code, unique]
        "subject": "Regrade submissions did not go through",     // [Subject line of Piazza Post]
        "content": "Hey! I've noticed that my regrade..",        // [Parsed text of the question]
        "student_answer": "",                                    // [Either contains parsed text answer or empty]
        "instructor_answer": "Will look into it and...",         // [Either contains parsed text answer or empty]
        "folder" : ["other"]                                     // [The folder that the student put the question in]
    },
    // ...
]

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.vscode		.vscode
adapters		adapters
parsers		parsers
spec		spec
.gitignore		.gitignore
CONTRIB.md		CONTRIB.md
README.md		README.md
answerability_filter.py		answerability_filter.py
common.py		common.py
create_secrets.py		create_secrets.py
dlutil.py		dlutil.py
putil.py		putil.py
requirements.txt		requirements.txt
xutil.py		xutil.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database schema

Courses

Materials

Documents

Forums

Question-Answers

Datasets

About

Releases

Packages

Contributors 3

Languages

parrot-qa/dataset

Folders and files

Latest commit

History

Repository files navigation

Database schema

Courses

Materials

Documents

Forums

Question-Answers

Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages