Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema lifecycle #4

Open
stuartpb opened this issue Nov 12, 2017 · 16 comments
Open

Schema lifecycle #4

stuartpb opened this issue Nov 12, 2017 · 16 comments

Comments

@stuartpb
Copy link
Member

stuartpb commented Nov 12, 2017

opws/opws-dataset#100 (comment) noted that there was a lot of clumsiness around the way domainprofiles issues talked about the schema, and suggested that this repository could have some body of text laying out a lifecycle that might make more sense.

I think that a prescriptive set of guidelines around a more flexible issue paradigm would be sensible, especially as opws/opws-dataset#147 entails migrating a bunch of the old repo's issues to the schema repo.

I'm thinking:

  • Issues should be raised just talking about the general "thing" that is not adequately documented, ie. "there's nothing to list if a site has a checkbox to sign me up for spam".
  • Ways to address said thing may be proposed in that issue, but they don't constitute an actionable proposal (one that can be "accepted" or "declined") until they're posed as a pull request against the current draft schema.
@stuartpb
Copy link
Member Author

I guess that covers expansionary issues, but aesthetic ones, like "these two very similar things have vastly different structures for no reason"... I don't know, maybe those need to be posed as pull requests, too. (If you can't propose a clear refactor to solve the problem, it's not clear enough what the problem is.)

@stuartpb
Copy link
Member Author

So yeah, my thinking for the schema lifecycle is:

  • There will be one standing "draft/v0.2" branch, and so on for each "next release"
  • Inside that branch, the next version is also kept under a "draft" directory
    • This is mostly to enforce the idea that there should never be a substantive change proposed to a schema that lives outside of "draft"
  • Pull requests for proposals will be filed against the next draft
  • Proposals that need to be resolved (or kicked) before the draft can be inaugurated will be put on the milestone for that version

I'm not exactly sure how to pair this with the dataset and CI validation - changes to the draft need to kick off a new validation of the corresponding dataset branch. There's also the matter of making sure any dataset migration is up to date with the latest changes at the tip of master.

@stuartpb
Copy link
Member Author

Another thing: versions stick at "vx.x.0" while they're in drafts - any change to the content of any schema after release (of a typo or strictness-changing nature) increments the patch version of all schemas in that major-minor

@stuartpb
Copy link
Member Author

Okay, so as I was just describing in #7, I think what might make sense would be if the draft "next version" branch starts with an "Update SCHEMA_VERSION to draft/vX.X" commit, and then refactor commits get filed as git commit --squash=draft/v0.2 (or whatever)? Then, when the draft is finally merged, the commit that updates the SCHEMA_VERSION will be rewritten as "Update SCHEMA_VERSION to vX.X" without the draft/ prefix, and all the intermediate refactors are squashed into it.

Maybe it should be a different tag than squash!, though. Also, what about merge bubbles? Do we do them?

And what about writing scripts to auto-migrate? Should those live somewhere (or just in commit messages)?

@stuartpb
Copy link
Member Author

I think the ultimate endpoint for pull requests, as I realized last time I got hyped up over this, is to just leave the validation as-is, as all it's good for is validating whether the tip commit is valid - there's no way to say "test that this is okay against a proposal, but not okay to merge into master". By forcing the validator to only recognize schema versions that have been pushed to master in the schemata, we force it to only pass tests that can and should be merged to master in the dataset.

@stuartpb
Copy link
Member Author

You know, that being said, in CircleCI 2.0 there could be two sets of tests, one being a validation that can run against foreign schemata / branches, and one that tests that the repo does not have any such artifacts, and master could check for both, and other branches wouldn't be subject to the checks (either they'd be disabled in the config.yml, or they'd be present, but just not a blocker for protected branches).

@stuartpb
Copy link
Member Author

Let's see, skipping branches works based on the name of the branch the commit is on, so if I had a rule like "skip the published-version test for branches called proposal/*", and then I had a rule for PRs on master that they had to receive a grade on the published-version test, that means a (misguided) PR from a proposal/ branch to master would be... permanently hung, if I understand correctly?

@stuartpb
Copy link
Member Author

I just realized it'd be a lot easier to rebase against a rolling draft if it were just kept in a flat draft directory. (CI could have a test, like the one I just described for the dataset, that no proposal with a draft directory may get merged to master.)

@stuartpb
Copy link
Member Author

Okay, here's what I'm thinking:

  • dataset CI will also find SCHEMA_PROPOSAL files with a repo and/or branch name in them, and shallow clone from there to test proposals.
  • dataset CI will note draft/ prefixes on SCHEMA_VERSION and, if that's listed, will check out the corresponding draft/ branch and read from the draft folder (for validating draft data).
  • dataset PRs will still need to have a non-draft-prefix in SCHEMA_VERSION to be valid to merge to master

@stuartpb
Copy link
Member Author

Maybe better would be, for draft/ handling the validator, it checks to see if a final version of that version exists in the schemata, and if not it falls back to draft? That makes sense to me, as an elegant process of failovers:

  • The draft dataset is prepared in a branch against the draft schema.
  • The draft schema is merged and moved from draft to its version number. Tests of previously-constructed branches now test against the final version of the draft (until being updated to point to the next draft, their unmerged or reverted proposals will fail). A branch for the next merge window draft is opened.
  • The draft dataset is updated to point to the final version, signifying that the data is ready to match the published version.
  • A finalization branch is established to change the SCHEMA_VERSION to the released version, tidy up any remaining validation errors, and catch up with any unmerged (unrebased) profile changes.
  • The finalization branch is merged into master, and the lifecycle begins anew.

@stuartpb
Copy link
Member Author

I'm still not sure if I should be merging drafts in as merges (what my gut tells me) or as squashes. (See the cacophony that is opws/opws-schemata#17 and opws/opws-schemata#18.)

@stuartpb
Copy link
Member Author

You know what, I think I've got it, hang on.

@stuartpb
Copy link
Member Author

stuartpb commented Nov 15, 2017

I think the correct solution is to author the "Graduate v(whatever)" commit as git merge --no-commit draft/v0.2 && git mv draft v0.2 from master, then push that as a pull request that gets merged as a rebase?

IDK, maybe graduations can't fit in the pull request system, and maybe that's fine - cutting a release like this should use long-standing issues tracking the release's progress (the way Bootstrap does it) anyway, which then get closed via "Closes #X".

@stuartpb
Copy link
Member Author

I just realized all of this should really live in opws-schemata's CONTRIBUTING.md, not opws-guidelines.

@stuartpb
Copy link
Member Author

stuartpb commented Nov 15, 2017

D'oh, I forgot to update the $id properties

Now that I realize that graduating a draft requires a change to the files in the tree, I'm back on board with the "do a pull request with a merge bubble" idea.

@stuartpb
Copy link
Member Author

stuartpb commented Nov 15, 2017

OK, one change I'm making in the 0.3 draft is that I'm putting "DRAFT" in ALL CAPS, so as to signify two things:

  1. NOTICE THIS, IF YOU'RE SEEING THIS IN PRODUCTION SOMETHING WENT WRONG
  2. the DRAFT in the $id should be replaced with the final version path

Also, keeping one shared $id for all drafts (on top of being one less thing to migrate when rebasing a proposal to a new draft) makes it extra-clear that DRAFTs aren't really versioned, the version for a schema with https://schemata.opws.org/DRAFT/ at the beginning of its $id is advisory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant