Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide how to format Misc/NEWS entries #66

Closed
brettcannon opened this issue Mar 31, 2017 · 50 comments
Closed

Decide how to format Misc/NEWS entries #66

brettcannon opened this issue Mar 31, 2017 · 50 comments

Comments

@brettcannon
Copy link
Member

brettcannon commented Mar 31, 2017

Now that we have chosen what NEWS tools to go with, I'm starting a fresh discussion on how exactly we want to format the entries.


@larryhastings said:

I haven't found the time to sit down and write this properly, so here's a quick note on this topic before Brett makes up his mind. sorry if it's a bit long / messy.

First, I don't have a strong opinion about what the input format to blurb should look like. If there was a consensus about "it should look like X", then I'd make it look like X.

We don't seem to have a consensus about what the input format should look like, because I don't think we've reached consensus about what metadata the tool needs. We need to figure that out first.

Obviously necessary:

  1. the Misc/NEWS text
  2. the Misc/NEWS category

I would also like:
3. some datestamp/nonce that ensures the news entries remain in some sort of stable order (I prefer chronological ordering, sadly git doesn't maintain timestamps)

I believe Brett is also asking for:
4. an optional "please consider for the next What's New document" flag
5. a suggested category for "What's New"

By the way, towncrier's approach of pre-created directories named for the categories (2.) is a nice idea. That ensures people don't misspell the category name. blurb could easily switch to that. the only downside I know of is that iiuc git doesn't track directories as first-class objects, so we'd have to have an extra empty file in each directory.

blurb current supports 1-3. it uses the filename for two bits of metadata (stable sorting order and category), and the contents of the file are simply the news entry. but it's kind of reached my comfort level regarding storing metadata in the filename.

if we only want to add 4, a simple "consider for what's new please", then okay I think we could live with sticking that in the filename. Like we add .wn just before the extension, for example.

if we want to add 5, then my inclination is to add a simple metadata blob to the contents of the file:

  • simple name=value (or name: value) pairs
  • # is a line comment
  • empty line or some explicit marker line ("--") ends the metadata blob

If we do that, then my inclination is further to move all the metadata into that blob:

  • category=Library
  • nonce=20170513062235.ef4c88a1
  • # what's new = Improved Modules

(uncomment "what's new" to use it)

The "blurb" tool would make it easy to add these, but users could also create the file by hand using a web page form that formats the output for them. (Making the entry entirely by hand might be tricky, since the nonce should be in a standardized format. Maybe we could give them a short blob of Python they run to generate one?)

If all the metadata lives inside the file, then we don't care what the filename is, it just needs to be unique.


@ncoghlan said:

For metadata-in-the-file, I quite like the format that the Nikola static blog generator uses:

.. title: The Python Packaging Ecosystem
.. slug: python-packaging-ecosystem
.. date: 2016-09-17 03:46:31 UTC
.. tags: python
.. category: python
.. link: 
.. description: Overview of the Python Packaging Ecosystem
.. type: text

Post starts here...

As an added bonus, when the file has the .rst extension, my editor automatically grays out the metadata as line comments, and assuming we're planning to use ReST in the snippets for ease of Sphinx integration, that would also apply here.

Using structured metadata like that would also open up future options for acknowledgements that aren't directly tracked in the git metadata - cases where we built on a patch written by someone else, or someone contributed API design ideas that someone else implemented, etc. At the moment we put that in the snippet body ("Initial patch by ..." and so forth), but a metadata field could more easily feed into ideas like auto-generating Misc/ACKS in addition to Misc/NEWS.

As far as a stable sort algorithm for display goes, we could then define that as:

  • a date field in the snippet metadata (e.g. the date string Nikola uses is just datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S UTC"))
  • the filename used for the snippet (since that has to be non-conflicting or git will complain)
@brettcannon
Copy link
Member Author

I agree with @larryhastings that towncrier's directory for categories is nice, and so we should try and go with it. Larry is also right, though, that we will need some dummy file in the directory so git will track them when we start a new next/, but I think a simple README.rst that explains what the directory for will suffice (and it has the next side-effect of GitHub rendering the README.rst so it's easy to read through the web UI).

As for the metadata for What's New, I would like @1st1 and anyone else who has had to fill in that doc to say whether having the "why" something should be added to What's New is useful, or just simply knowing what needs inclusion? (Category versus flag, IOW, since a flag is much easier to manage than a category.) Another approach is to have a "What's New" and "boring" label for PRs and have a status check that requires one of those two labels be present before one can merge a PR. (Once again, I would want @1st1 or someone to say if seeing a list of PRs or grepping through files would be easier.)

For the file name, I'm fine with a chronological sort, but I don't think a nonce is truly necessary. The most common case is going to be data + issue number, so I think the format is YYYY-MM-DD.bpo-NNNN.rst will keep the vast majority of clashes from occurring and give a good enough chronological sort at the day level (I'm not about to ask people to add the time since if we expect people to do this by hand on occasion then they will inevitably screw up the UTC conversion, especially after a daylight saving switch). We can officially make the format have an optional nonce for actual clashes, but that will probably be extremely rare. So maybe the format should be r"(?P<date>\d{4}-\d{2}-\d{2})\.bpo-(?P<issues>\d+(,\d+)*)(?:\.\w+)?\.rst"?

I do like the idea from @ncoghlan of using reST comments as a way to store metadata. Question becomes what metadata do we want? 😉 If we go with directories for categories ala towncrier and we store the issue number(s) in the file name along with the date, then in the common case nothing is really necessary. We could store who to thank, but I'm assuming that for Misc/ACKS we are either going to continue to have people enter themselves and others manually and maybe supplement it with what Rust and Rails and extract the info from the git metadata (see #7 to discuss that topic). So that only leaves What's New and we need to first decide what we want for that and whether we want it with the news entry or if we want it on the PR.

But since what metadata we want to store is technically optional once we decide on a format for it as long as it's not always required, all we really have to decide upfront is the directory, file name, and file format; we can add metadata later. Do people think we can figure these formatting details in a week so we can have a decision made on Friday, April 7? Then if we are still discussing possible metadata we can just add them in later.

@westurner
Copy link

re: Nikola format
It's really the same issue as with field-lists.

  • STORY: Users can specify an issue record start (like a context manager) and or end in order to concatenate documents for sphinx without unnecessary transformation.

#6 (comment) :


RST :field-lists:
:versions: 2.7, 3.6.1
:tags: bugfix, security
:pr: 123,
:issue: 22
:cve: 2011-1015

@brettcannon
Copy link
Member Author

brettcannon commented Mar 31, 2017

Just an FYI, I just blocked Wes from the Python organization since he did not heed my warning. I would delete the messages he left but I'm using them as evidence as to why I put in the organization-level block.

@JelleZijlstra
Copy link
Member

Would PR contributors be expected to add the NEWS files in their PRs? As I understand it, one of the goals of this change is to minimize the amount of manual work core devs have to do to merge a PR.

But a PR probably won't be merged on the same day it's opened, so the date in the filename will be wrong if we use the date the PR was opened. I can't really think of a better solution than having the core dev edit the filename while merging, but maybe we can do better.

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 1, 2017

The exact date doesn't matter too much, as the main requirement is to have a robust sort key so regenerating NEWS isn't dependent on things like the directory listing order for the snippets directory.

Probably the simplest way to handle that is to say that formally the date field is just "the date the NEWS snippet was written", and we'd advise folks to leave writing it until relatively late in the PR process for complex patches.

@larryhastings
Copy link
Contributor

Yeah, the date field is there just to provide a unique filename and a reasonably accurate chronological ordering. Misc/NEWS has historically been roughly (but not perfectly) chronologically ordered, and I was trying to preserve that. But it doesn't have to be exact.

@brettcannon
Copy link
Member Author

So what's left to decide? Are we happy with using directories and the proposed file name format? Are we assuming if we decide to add metadata we can do that later? Am I missing anything?

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 4, 2017

Checking I understand what the current plans actually are:

  • Pending news entries will go into Misc/News/next/{category} and then automatically get moved to Misc/News/{version}/{category} once they're included in a release
  • The snippet name format will be YYYY-MM-DD.bpo-NNNN[.whatever].rst (where the square brackets indicate the human readable slug is optional)
  • If we later decide we need some additional metadata, it will use the .. Name: Value format and be separated from the snippet text by a blank line

If I've understood correctly, then that's a definite +1 from me.

@dhellmann
Copy link
Member

Placing the version information in the file path will make it harder to deal with backports of patches, since any patch with a blurb input file in it will have to be modified to move the file into the right version directory for the branch.

@larryhastings
Copy link
Contributor

If you cherry-pick the original checkin, you'll get the original path in the next directory, which has no version information.

@brettcannon
Copy link
Member Author

@ncoghlan Yep, you understood correctly. The only possible change is supporting multiple issue numbers in the file name, but I'm really not bothered by dropping that if we are going to still state that manually in the message itself.

One thing I just thought of: I guess we don't really need to care about line wrapping if we use textwrap or something in the final output? I bring this up because if we leave out the bullet point -- - -- then people will be off by 2 characters potentially if we want to explicitly wrap at 72 or 80 characters. So does it make sense to say "wrap at 80 characters" knowing full well we will re-wrap on output to take the bullet point and indent into account? What do you think @larryhastings ?

@elprans
Copy link

elprans commented Apr 4, 2017

@brettcannon Is bpo # mandatory? As a What's New editor I'd love to have an unambiguous and reliable connection between the NEWS entry and the actual code changes.

I don't think we should impose the addition of What's New-specific metadata on the committers at this point.

@brettcannon
Copy link
Member Author

@elprans Yes, the idea is to have the issue number be mandatory since anything that's worth having in What's New should have a relevant issue to track the discussion (and then indirectly track the PR/patch that made the change through the issue). Unless there was something else you had in mind?

@brettcannon
Copy link
Member Author

And thanks for the feedback on What's New, @elprans . Obviously if you think there's something that would be helpful just let us know.

@larryhastings
Copy link
Contributor

What I'd like is if users were forced to stay under N columns, where N is 72 or 74 or something. I want the tool ("blurb release" or whatever) to add the "- " at the front. I want the tool ("blurb add") to enforce these rules, but I also want them documented in case people create NEWS entries by hand.

@brettcannon
Copy link
Member Author

@larryhastings so you said "if users were forced". Are you suggesting not having a line length restriction?

@larryhastings
Copy link
Contributor

There's nothing stopping users from creating NEWS entries by hand, or editing them after the tool has written them to disk, or changing them later, or whatever. So barring some sort of github-side verification hook, I don't know how we'd enforce such a restriction.

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 5, 2017

@larryhastings NEWS snippet formatting can be checked in pre-merge CI, probably as part of the documentation build in Travis (since that renders the NEWS file).

@dhellmann Regarding the expected lifecycle of a bug fix NEWS snippet:

  • snippets will typically start in Misc/NEWS/next on master
  • they're then backported to Misc/NEWS/next on the relevant X.Y maintenance branches
  • for each branch, the snippet gets shuffled into the appropriate directory when a release is made
  • Misc/NEWS itself becomes entirely generated

The only time I'd expect that to get even slightly tricky is if we decided to edit an existing NEWS entry for a previously published release, but we tend not to do that.

@brettcannon
Copy link
Member Author

@larryhastings @ncoghlan Yep, I was actually planning on two checks: one would be to have a status check that a NEWS entry was created (or the PR was marked "trivial"), and two would be to build the NEWS file to verify formatting was accurate. So we could enforce whatever we want in the build step, hence why I'm asking if we think we should enforce anything like a line length or not worry about it since blurb can normalize on output and most news entries are only a sentence or two anyway.

@brettcannon
Copy link
Member Author

To keep this moving forward, I'm going to assume lack of responses means no one has strong feelings about line length, so let's not worry about it in the files and just deal with it in the final output.

@larryhastings is there anything else you need before you can move the code into the repo and we can start using blurb?

@larryhastings
Copy link
Contributor

I do want to restrict line length, yes. I want final result to be < 80 columns, and the input for "blurb" to not contain the two-column prefix ("- " on the first line, two spaces on subsequent lines). "blurb" will add that.

Just to confirm: if we accidentally fed "blurb" files to a ReST parser unedited, the ReST parser would ignore our metadata? ".. name : value" doesn't match any ReST syntax (it requires two colons without a space IIUC), and ReST says that any line starting wtih ".." that it doesn't otherwise syntactically recognize will be ignored, hence, a comment.

@brettcannon
Copy link
Member Author

@larryhastings Setting the line length to 80 seems fine to me.

And yes, I believe the .. name : value is interpreted as a comment due to the lack of double-colon.

@larryhastings
Copy link
Contributor

So, still to decide is exactly how we want to format the blurb input file:

  • Is the bpo issue number required metadata, as in ".. issue: "? Or is it required to be the first thing in the text? If so, how should it be formatted? "Issue #: "? "bpo-: "? Something else?
  • Should we ever allow the bpo issue number to be omitted? If so, is it simply optional, or should there be a separate metadata tag explicitly saying so (".. no-issue: true")?
  • Did we decide that a simple binary flag for "interesting to What's New" is sufficient? If so, should that flag be stored permanently internally as metadata (".. whats-new: true") or as a field in the filename (... .whatsnew.rst)?

I guess it's not a big deal if just I pick something, we decide we don't like it, and I have to redo it. But I'd prefer it if we could solidly nail down the spec before I start.

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 8, 2017

@larryhastings From #66 (comment), the preference is for all mandatory metadata to be in the path or filename (NEWS category, snippet creation date, bpo number), with leading ReST comments reserved for any future metadata extensions. See #66 (comment) for the exact details.

@larryhastings
Copy link
Contributor

Right, that's what you said, but Brett is the guy actually deciding this stuff.

@ncoghlan
Copy link
Contributor

ncoghlan commented Apr 8, 2017

Aye, that's why the first link is to the comment where Brett said I had correctly understood what he wanted (alas, GH doesn't provide any indication that the two links go to different comments).

@brettcannon
Copy link
Member Author

@larryhastings We're starting off with no metadata required; it's for future needs and right now we have no needs (even for What's New; @elprans said nothing is needed for now).

As for the issue number, at least one must be specified. I'm of the opinion they should be extracted from the file name but I know that @larryhastings thinks it should be in the text, so I don't know what the general opinion is on this (i.e. just file name or both; we aren't leaving them out of the file name as they are acting as a simple nonce for same-day entries).

@larryhastings
Copy link
Contributor

Then I propose that lines starting with ".." (no leading white space) be illegal in the news blurb itself.

And I further propose that the modern incarnation of "tidy" essentially combine all blurb files into one archive file, where the filename is broken out into discrete metadata, and we use ".." on a line by itself as an entry separator.

For example: if we have two files, Library/2017-08-13.bpo-63722.rst and IDLE/2017-08-19.bpo-64121.whatsnew.rst, and the release manager tags 3.6.2rc2, then the two files are combined together into "3.6.2rc2.blurb" (or something) which looks like this:

.. date: 2017-03-13
.. category: Library
.. bpo: 63722

News entry about the change to the library.

..
.. date: 2017-08-19
.. category: IDLE
.. bpo: 64121
.. whatsnew: True

News entry about IDLE.

@larryhastings
Copy link
Contributor

By the way, the reason I want the issue # in the text is for text margin reasons. If it's not in the text, then blurb would have to add it when it formats NEWS.rst. But if we're enforcing a text margin, then that "- Issue #xxxxx: " will make the first line overflow the 80 column margin most of the time.

I guess we have a couple choices:

  • Mandate the margin to be so small that we always have room for prefixing "- Issue #xxxxx: " to the first line. "- Issue #xxxx: " is 16 characters, so this would be a margin <= 64.
  • Let the first line frequently overflow 80 columns, which will be a mild eyesore when reading NEWS.rst.
  • Have blurb reflow the text, which means we needn't bother enforcing line margins in the first place.

If we require the bpo number to be in the filename, then I'd rather not have it in the text, which means we should pick one of these three options, and of those I guess I prefer the third.

@larryhastings
Copy link
Contributor

Brett said:

we aren't leaving [bpo numbers] out of the file name as they are acting as a simple nonce for same-day entries)

If I get to include a more complicated nonce (e.g. some sort of checksum), then we don't need this redundant simple nonce. If you're saying "don't use the checksum anymore, the bpo number is our nonce", then, uh... okay. (Again, I prefer having the bpo number in one place, and that one place being the text, but if it has to be in the filename then so be it.)

@ncoghlan
Copy link
Contributor

+1 for having blurb reflow the text to account for the injected issue numbers, as that also accounts for differences in where folks do their static word wrapping (as even if there was an enforced upper limit, it wouldn't stop people wrapping earlier).

@brettcannon
Copy link
Member Author

brettcannon commented Apr 27, 2017

  • I say leave issue numbers out of the entry and reflow the text (I'm going with DRY here and the fact that issue number(s) are the easiest way to create a unique filename by hand and to find an issue if you're looking at a list of filenames).
  • I think the file name should be date + "issue number(s)" + "whatever it takes to make the filename unique" (which can be a checksum/epoch-at-time-of-generation/something when the file is created by blurb; key point is the filename can still be written by hand for the simple case while blurb can do the better thing by including a forward guarantee of a unique filename with its added nonce).
  • @elprans said we don't have to worry about "What's New", so don't worry about it for now (you used it in your example so I'm just double-checking you don't put time into it unnecessarily).
  • +1 for the "tidy" suggestions by @larryhastings .

Hopefully that sounds reasonable to everyone and with somewhat coherent justifications for my thought process. If people disagree with my running assumption that doing news entries by hand is something worth supporting then speak up now as that means @larryhastings should get what he wants with a fully generated file name and just having the issue numbers in the news entry itself.

@larryhastings
Copy link
Contributor

What should we do for news entries that don't have a bpo number? Even if we enforce them going forward, blurb also needs to handle our legacy NEWS data, and there are plenty of old entries that don't have a bpo number.

I propose: mark those with bpo #0, which is an illegal issue number on bpo, and then act sensibly from there (e.g. suppress the "Issue #0: " that we would otherwise issue).

If we do that, what do we want to do about new entries where people try to use bpo #0? a) permit b) disallow c) scold/verify but permit

@ncoghlan
Copy link
Contributor

Perhaps a dedicated "historical-" or "legacy-" prefix would make sense for those extracted entries, rather than trying to fit them into the "bpo-NNNNN-" format?

That would make it clearer that future NEWS entries are expected to have an associated issue number, and this is just a way of managing the older ones that skipped that step.

@brettcannon
Copy link
Member Author

I'm fine with the bpo-0 solution since it's a historical artifact that won't be perpetuated going forward and with blurb creating the old entries it can add its extra nonce bit to avoid filename collisions. And I say go with option (b) from @larryhastings since we can always ease up later if necessary.

@brettcannon
Copy link
Member Author

While writing out my language summit slides I realized that we should just require a nonce in the filename, even when written by hand to prevent any chance of collision. People can bash on their keyboard for all I care, just as long as there's something to make sure collisions just won't happen.

@larryhastings
Copy link
Contributor

I agree, and I was going to write that in my directions for writing blurb files by hand. There'll be a slot for it in the filename and it should be non-empty.

FWIW I now generate the nonce as follows: compute the MD5 of the body text, convert to urlsafe base64, and use just the first six characters.

@larryhastings
Copy link
Contributor

Oh, and, one persnickety pedantic reply I meant to make @brettcannon:

We're starting off with no metadata required;

Well, we're starting off with no metadata specified inside the file. There are four pieces of metadata: section, bpo, date, and nonce. But we're storing all those in the filename.

The modern version of "tidy" will actually be a single blurb file containing multiple entries, each delimited by .. lines, with this metadata stored as actual metadata (e.g. .. section: Library).

@brettcannon
Copy link
Member Author

@larryhastings Glad we agree on the nonce bit. 😄 And the calculated value sounds 👍

And yes, you're right about no metadata in the file; allergies have hit me hard and so my mental capacity isn't 💯 at the moment.

@larryhastings
Copy link
Contributor

I've updated blurb on github ( https://github.com/larryhastings/blurb ). It's updated to this modern data format. Docs aren't updated yet, sorry.

Other changes:

  • Instead of a zillion files, it maintains one big file per version. All metadata is preserved.
  • When you blurb release a new version, it merges all the next files into a single file for that version.
  • blurb now aggressively reflows text. If you blurb split then blurb merge, you'll have a lot of text-reflowing diffs in Misc/NEWS.

One question: do we prefer "Issue #" or "bpo-"? Both are in use right now. blurb prints "Issue #" because it's old school, but I don't have a strong opinion.

@brettcannon
Copy link
Member Author

@larryhastings we're trying to move to "bpo-" to namespace issue numbers.

@vstinner
Copy link
Member

Hello, what's the progress on that issue? (Sorry, too lazy to read this super long issue.) Maybe send a report to python-dev?

@brettcannon
Copy link
Member Author

brettcannon commented May 16, 2017

@Haypo or just attend the language summit tomorrow. 😉 (And if you can't make it, then just trust that @larryhastings and I are working on rolling this out 😄).

@larryhastings
Copy link
Contributor

In Misc/NEWS a bunch of entries look like this::

- [Security] Issue #27278: Fix os.urandom() implementation using getrandom() on
  Linux.  Truncate size to INT_MAX and loop until we collected enough random
  bytes, instead of casting a directly Py_ssize_t to int.

Apart from screwing up my parser, these entries seem to think that tagging an entry with [Security] is meaningful somehow. I'm not sure it is. Maybe?

I'm going to have to detect the [Security] part in order to parse properly. I can just throw it away, or I could do any of the following:

  • Add a metadata field that says ".. security: True"
  • Add *(Security)* to the end of the entry in Misc/NEWS when I generate it (blurb merge).

My inclination is to add the metadata field now, and we can decide on formatting stuff later.

@brettcannon
Copy link
Member Author

@larryhastings adding at the end is fine by me. I think it's just a marker for Linux vendors and the like to especially pay attention to a specific news entry.

@ncoghlan
Copy link
Contributor

Aye, flagging things specifically as security issues serves as a "You should backport this even if none of your customers explicitly request it" marker for redistributors that standardise on a particular maintenance release as their baseline version. It also helps in keeping the list at http://python-security.readthedocs.io/ up to date.

However, the exact format doesn't matter, so long as its noted somewhere, and can be added after the fact if a bug fix is later determined to have security implications.

@vstinner
Copy link
Member

vstinner commented May 25, 2017 via email

@larryhastings
Copy link
Contributor

Can we close this? python/devguide PR #212 is blocking on it.

@brettcannon
Copy link
Member Author

Yep, we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants