This repository was archived by the owner on Aug 25, 2024. It is now read-only.
source: Labeled and Versioned datasets #9
Labels
enhancement
New feature or request
gsoc
Google Summer of Code related
project
Issues which will take a while to complete
Milestone
Assignee: @sudharsana-kjl
DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged
gsoc
andproject
are not generally available bugs, but related to project ideas for GSoC.Project Idea: Labeled and Versioned Datasets.
Project description:
DFFML's initial release includes sources which abstract the format in which the data is stored from the dataset generation and usage in models.
Add information allowing users to have different versions and datasets from the same source.
Skills: Python, git
Difficulty level: Intermediate
Related Readings/Links:
dffml/dffml/source/source.py
Lines 16 to 45 in dd8007d
dffml/dffml/repo.py
Lines 90 to 116 in dd8007d
Potential mentors: @pdxjohnny
Getting Started:
Source.__init__
probably needs another two arguments,label
andversion
, which should probably have defaults (say,default
andv0
). Since the same backend (aka, a csv file or json file) would be used to store all the data, you'll have to change the existing sources we have to understand how to deal with this. ForCSVSource
that might mean adding another column to each repo, forJSONSource
that might mean instead of one big array, the array of repos is stored like so:What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", maybe you'll implement a source using
sqlite
too or something. Don't forget to include some time for building appropriate tests.The text was updated successfully, but these errors were encountered: