Schema lifecycle #4

stuartpb · 2017-11-12T13:41:18Z

opws/opws-dataset#100 (comment) noted that there was a lot of clumsiness around the way domainprofiles issues talked about the schema, and suggested that this repository could have some body of text laying out a lifecycle that might make more sense.

I think that a prescriptive set of guidelines around a more flexible issue paradigm would be sensible, especially as opws/opws-dataset#147 entails migrating a bunch of the old repo's issues to the schema repo.

I'm thinking:

Issues should be raised just talking about the general "thing" that is not adequately documented, ie. "there's nothing to list if a site has a checkbox to sign me up for spam".
Ways to address said thing may be proposed in that issue, but they don't constitute an actionable proposal (one that can be "accepted" or "declined") until they're posed as a pull request against the current draft schema.

stuartpb · 2017-11-12T13:47:28Z

I guess that covers expansionary issues, but aesthetic ones, like "these two very similar things have vastly different structures for no reason"... I don't know, maybe those need to be posed as pull requests, too. (If you can't propose a clear refactor to solve the problem, it's not clear enough what the problem is.)

stuartpb · 2017-11-13T01:19:57Z

So yeah, my thinking for the schema lifecycle is:

There will be one standing "draft/v0.2" branch, and so on for each "next release"
Inside that branch, the next version is also kept under a "draft" directory
- This is mostly to enforce the idea that there should never be a substantive change proposed to a schema that lives outside of "draft"
Pull requests for proposals will be filed against the next draft
Proposals that need to be resolved (or kicked) before the draft can be inaugurated will be put on the milestone for that version

I'm not exactly sure how to pair this with the dataset and CI validation - changes to the draft need to kick off a new validation of the corresponding dataset branch. There's also the matter of making sure any dataset migration is up to date with the latest changes at the tip of master.

stuartpb · 2017-11-13T01:21:35Z

Another thing: versions stick at "vx.x.0" while they're in drafts - any change to the content of any schema after release (of a typo or strictness-changing nature) increments the patch version of all schemas in that major-minor

stuartpb · 2017-11-13T01:39:38Z

Okay, so as I was just describing in #7, I think what might make sense would be if the draft "next version" branch starts with an "Update SCHEMA_VERSION to draft/vX.X" commit, and then refactor commits get filed as git commit --squash=draft/v0.2 (or whatever)? Then, when the draft is finally merged, the commit that updates the SCHEMA_VERSION will be rewritten as "Update SCHEMA_VERSION to vX.X" without the draft/ prefix, and all the intermediate refactors are squashed into it.

Maybe it should be a different tag than squash!, though. Also, what about merge bubbles? Do we do them?

And what about writing scripts to auto-migrate? Should those live somewhere (or just in commit messages)?

stuartpb · 2017-11-13T14:27:27Z

I think the ultimate endpoint for pull requests, as I realized last time I got hyped up over this, is to just leave the validation as-is, as all it's good for is validating whether the tip commit is valid - there's no way to say "test that this is okay against a proposal, but not okay to merge into master". By forcing the validator to only recognize schema versions that have been pushed to master in the schemata, we force it to only pass tests that can and should be merged to master in the dataset.

stuartpb · 2017-11-13T14:31:32Z

You know, that being said, in CircleCI 2.0 there could be two sets of tests, one being a validation that can run against foreign schemata / branches, and one that tests that the repo does not have any such artifacts, and master could check for both, and other branches wouldn't be subject to the checks (either they'd be disabled in the config.yml, or they'd be present, but just not a blocker for protected branches).

stuartpb · 2017-11-13T14:36:09Z

Let's see, skipping branches works based on the name of the branch the commit is on, so if I had a rule like "skip the published-version test for branches called proposal/*", and then I had a rule for PRs on master that they had to receive a grade on the published-version test, that means a (misguided) PR from a proposal/ branch to master would be... permanently hung, if I understand correctly?

stuartpb · 2017-11-14T06:20:29Z

I just realized it'd be a lot easier to rebase against a rolling draft if it were just kept in a flat draft directory. (CI could have a test, like the one I just described for the dataset, that no proposal with a draft directory may get merged to master.)

stuartpb · 2017-11-14T21:42:17Z

Okay, here's what I'm thinking:

dataset CI will also find SCHEMA_PROPOSAL files with a repo and/or branch name in them, and shallow clone from there to test proposals.
dataset CI will note draft/ prefixes on SCHEMA_VERSION and, if that's listed, will check out the corresponding draft/ branch and read from the draft folder (for validating draft data).
dataset PRs will still need to have a non-draft-prefix in SCHEMA_VERSION to be valid to merge to master

stuartpb · 2017-11-14T21:53:30Z

Maybe better would be, for draft/ handling the validator, it checks to see if a final version of that version exists in the schemata, and if not it falls back to draft? That makes sense to me, as an elegant process of failovers:

The draft dataset is prepared in a branch against the draft schema.
The draft schema is merged and moved from draft to its version number. Tests of previously-constructed branches now test against the final version of the draft (until being updated to point to the next draft, their unmerged or reverted proposals will fail). A branch for the next merge window draft is opened.
The draft dataset is updated to point to the final version, signifying that the data is ready to match the published version.
A finalization branch is established to change the SCHEMA_VERSION to the released version, tidy up any remaining validation errors, and catch up with any unmerged (unrebased) profile changes.
The finalization branch is merged into master, and the lifecycle begins anew.

stuartpb · 2017-11-15T07:34:22Z

I'm still not sure if I should be merging drafts in as merges (what my gut tells me) or as squashes. (See the cacophony that is opws/opws-schemata#17 and opws/opws-schemata#18.)

stuartpb · 2017-11-15T07:34:57Z

You know what, I think I've got it, hang on.

stuartpb · 2017-11-15T07:39:32Z

I think the correct solution is to author the "Graduate v(whatever)" commit as git merge --no-commit draft/v0.2 && git mv draft v0.2 from master, then push that as a pull request that gets merged as a rebase?

IDK, maybe graduations can't fit in the pull request system, and maybe that's fine - cutting a release like this should use long-standing issues tracking the release's progress (the way Bootstrap does it) anyway, which then get closed via "Closes #X".

stuartpb · 2017-11-15T07:44:05Z

I just realized all of this should really live in opws-schemata's CONTRIBUTING.md, not opws-guidelines.

stuartpb · 2017-11-15T08:11:36Z

D'oh, I forgot to update the $id properties

Now that I realize that graduating a draft requires a change to the files in the tree, I'm back on board with the "do a pull request with a merge bubble" idea.

stuartpb · 2017-11-15T09:10:06Z

OK, one change I'm making in the 0.3 draft is that I'm putting "DRAFT" in ALL CAPS, so as to signify two things:

NOTICE THIS, IF YOU'RE SEEING THIS IN PRODUCTION SOMETHING WENT WRONG
the DRAFT in the $id should be replaced with the final version path

Also, keeping one shared $id for all drafts (on top of being one less thing to migrate when rebasing a proposal to a new draft) makes it extra-clear that DRAFTs aren't really versioned, the version for a schema with https://schemata.opws.org/DRAFT/ at the beginning of its $id is advisory

stuartpb mentioned this issue Nov 13, 2017

Dataset commit message style rules #7

Open

stuartpb mentioned this issue Nov 13, 2017

Versioning and migration #2

Open

stuartpb mentioned this issue Nov 14, 2017

[proposal] Replace "notes" field with "errata", "directions", and "documentation" opws/opws-schemata#5

Merged

stuartpb changed the title ~~Schema issue guidelines~~ Schema lifecycle Nov 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema lifecycle #4

Schema lifecycle #4

stuartpb commented Nov 12, 2017 •

edited

Loading

stuartpb commented Nov 12, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017 •

edited

Loading

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017 •

edited

Loading

stuartpb commented Nov 15, 2017 •

edited

Loading

Schema lifecycle #4

Schema lifecycle #4

Comments

stuartpb commented Nov 12, 2017 • edited Loading

stuartpb commented Nov 12, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 13, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 14, 2017

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017 • edited Loading

stuartpb commented Nov 15, 2017

stuartpb commented Nov 15, 2017 • edited Loading

stuartpb commented Nov 15, 2017 • edited Loading

stuartpb commented Nov 12, 2017 •

edited

Loading

stuartpb commented Nov 15, 2017 •

edited

Loading

stuartpb commented Nov 15, 2017 •

edited

Loading

stuartpb commented Nov 15, 2017 •

edited

Loading