On Oct 11, 2019, at 07:11, Lars Kurth <lars.kurth@xxxxxxxxxx> wrote: On 11/10/2019, 02:24, "Stefano Stabellini" <sstabellini@xxxxxxxxxx> wrote: On Thu, 10 Oct 2019, Lars Kurth wrote:* Would we ever include API docs generated from GPLv2 code? E.g. for safety use-cases?
@Stefano, @Artem: I guess this one is for you.
I suppose if we would have a similar issue for a safety manual
I am also assuming we would want to use sphinx docs and rst to generate a future safety manual
Hi Lars, Thanks for putting this email together. In terms of formats, I don't have a preference between rst and pandoc, but if we are going to use rst going forward, I'd say to try to use rst for everything, including converting all the old stuff. The fewer different formats, the better.I think the proposal that needs to follow on from this (which would at somepoint need to be voted on) would then be to go for rst. As I mentioned during the FuSa call, I agree with you, Andrew, and others that it would be best to have the docs under a CC license. I do expect that we'll end up copy/pasting snippets of in-code comments into the docs, so I think it is important that we are allowed to do that from a license perspective. It is great that GPLv2 allows it (we need to be sure about this).The GPL does *not* allow this, but (c) law and fair use clauses do. So typicallystuff such as* Referring to function names, signatures, etc. tend to be all fine* Copying large portions of in-line comments would not be fine, butIf they are large, they would in most cases be re-written in a more suitablelanguage. So, I think overall, we should be fine. It's a bit of a grey area though.And as you point out below, most of the code in question is typically BSD Yes, I expect that some docs might be automatically generated, but from header files, not from source code. Especailly public/ header files, which are typically BSD, not GPLv2. I cannot come up with examples of docs we need to generated from GPLv2-only code at the moment, hopefully there won't be any.That makes things a lot easier. I wasn't planning on reusing any of the markup, and wasn't expecting to
use much of the text either. I'm still considering the option of
defining that xen/public/* isn't the canonical description of the ABI,
because C is the wrong tool for the job.
Its fine to provide a C set of headers implementing an ABI, but there is
a very deliberate reason why the canonical migration v2 spec is in a
text document.
@Stefano: as you and I believe Brian will be spending time on improving the
ABI docs, I think we need to build some agreement here on what/how
to do it. I was assuming that generally the consensus was to have
docs close to the code in source, but this does not seem to be the case.
But if we do have stuff separately, ideally we would have a tool that helps
point people editing headers to also look at the relevant docs. Otherwise it will
be hard to keep them in sync.
In general, it is a good idea to keep the docs close to the code to make it easier to keep them up to date. But there is no one-size-fits-all here. For public ABI descriptions, I agree with Andrew that ideally they should not be defined as C header files. But it is not an issue: any work that we do here won't be wasted. For instance, we could start by adding more comments to the current header files. Then, as a second step, take all the comments and turn them into a proper ABI description document without any C function declarations. It is easy to move English text around, as long as the license allows it -- that is the only potential blocker I can see.This is likely to be problematic. First of all, we are talking about BSD-3-Clauseor BSD-2-Clause code (the latter is more dominant in headers I believe) inall known cases.The main properties of the BSD are1: Can be pretty much used anywhere for any purpose2: Can be modified for any purpose 3: But the original license header must be retained in derivates
This is equivalent to attribution of the copyright owner of the originally created file.
Does *not* have requirements around attribution as CC-BY-4: however, as we store everything in git attribution is handled by us by default
See above, the license header attributes copyright, since BSD was created for "software" and people who work on "software" would typically be looking at source code, hence the primary attribution takes place there, with secondary attribution in EULAs, "About" panels, etc.
CC-BY-4 also has properties 1-3 In addition: it does require that 4: Derived works are giving appropriate credit to authors We could clarify in a COPYING how we prefer to do this 4.1: We could say that "referring to the Xen Project community" is sufficient to comply with the attribution clause
One motivation for CC-BY (with attribution) is to create an incentive (credit) for the creation of documentation, which is not commonly a favorite pastime of developers. Credit typically goes at least to the original author of a section of documentation, with varying ways of crediting subsequent contributors. The documentation can be structured to make crediting easier. The mechanism for crediting can be designed to encourage specific outcomes, along our projected doc lifecycle for safety certification, contributors, evaluators and commercial investors.
4.2: We could require individual authors to be credited: in that case we probably ought to lead by example and list the authors in a credit/license section and extract the information from git logs when we generate it (at some point in the future) 5: You give an indication whether you made changes ... in practice this means you have to state significant changes made to the works
This is also helpful for provenance of changes, which is relevant in safety-oriented documentation. It can be used to clearly delineate CC-licensed content (which may be reused by many companies) from "All Rights Reserved" commercial content that may be added for a specific commercial audience or purpose.
There is a difference between "software" which "runs on machines" and "documentation" which "runs on humans". Combined software (e.g. BSD code from two origins) is executed identically, despite origin. Humans make value judgements based on the author/origin of content, hence the focus on attribution. Yes, there is a provenance graph in git (software/data), but that's not typically visible to human readers, except as a generated report, i.e. documentation.
As such, BSD-2/3-Clause in our context works similarly to CC-BY-4 from a downstream's perspective. In fact CC-BY-4 is somewhat stricter
If we don't want the incentives and provenance properties of CC-BY, there is the option of CC0, which is the equivalent of public domain. This would delegate the task of separating commercial vs CC content to each reader, without any license-required attribution or separation.
Some background on licenses designed for documentation, which has different legal requirements than software:
Rich |