public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: John Scott <jscott@posteo.net>
To: newlib@sourceware.org
Subject: Re: (was: Newlib copyright review) and SPDX tagging to REUSE spec RFC
Date: Sat, 12 Aug 2023 19:05:59 +0000	[thread overview]
Message-ID: <423d55baa97832044e72c53e7b1182e25f6f9196.camel@posteo.net> (raw)
In-Reply-To: <ec61074c-067b-5a4b-0039-f3e7d99971d5@Shaw.ca>

[-- Attachment #1: Type: text/plain, Size: 8180 bytes --]

On Fri, 2023-08-11 at 12:23 -0600, Brian Inglis wrote:
> You could provide a few links to REUSE (try web searching that!) and SPDX materials to explain what you are doing to those who have not yet encountered the REUSE and SPDX projects and tools.
You're right. The spec is pioneered by the Free Software Foundation Europe and is available at https://reuse.software/spec/ .

Before I elaborate, I'd like to clarify one thing. It's not that I want Newlib to be REUSE-compliant per se; it's that I want Newlib's copyright and license information to be machine-readable, and REUSE just so happens to be an existing specification for a robust way to do that. I have no objections to deviating from the specification if we so need, but I don't think that will be necessary. This is primarily all about making the lives of Newlib users and packagers easier and reducing duplicated effort.

In Debian, we have something similar to REUSE. We have machine-readable copyright files. These are files written in a simple format, typically written *by hand*, that documents the copyright and license notices in every single source code file. This isn't technically required, but it's the gold standard, and since I do not have uploading privileges in Debian (yet), having machine-readable license information is the best way to convince my mentors that I've done my homework and that we really have the right to redistribute Newlib.

Again, normally Debian Developers go through every single file by hand in the upstream source code, and they have to because there's no standardized format (until now) for what a copyright or license notice looks like that can be detected by tools.

The REUSE specification is very simple, you:
 1. add 'SPDX-License-Identifier: license-name' to every source code file. this doesn't have to replace standard license headers, you can leave those if you want, but a lot of projects like to get rid of them since they're unnecessary
 2. add something like 'SPDX-FileCopyrightText: 2023 John Scott <jscott@posteo.net>' to document who the copyright holder actually is
 3. put a copy of every license in a top-level LICENSES folder named appropriately. one problem this solves is that having a file just called 'COPYING' can be ambiguous if it, say, has a copy of the GPL v3: is it supposed to mean GPL v3 only or GPL v3 or any later version? REUSE says you're supposed to disambiguate the license names. 

> REUSE specifies the outdated 7 year old SPDX 2.1 spec: will newer versions (currently 2.3) be allowed and supported?
I'm sure it will someday, I'll ask.

> Are you okay with providing your changes, including any REUSE and SPDX cataloguing documents you may create which apply to the project, under some non-GPL licence attribution, that allows the library to continue to be used by contributing and other corps for their commercial purposes?
Absolutely, I could make it public domain. I don't think there's much originality in what I'd be doing, so I think adding a copyright notice for myself to every single file would be inappropriate.

> Could you please outline any changes that you contemplate making to the document tree, such as LICENSES, REUSE, SPDX, etc. directory additions and likely contents?
Sure, just let me know what the project would be most likely to accept, then I'll do the work, and then we can hash out any details that came out not as we imagined. Again, when a lot of projects adopt SPDX (which specifies the file tags) or REUSE (which specifies the LICENSES/ folder and other little details), they like to use the SPDX copyright and license notices to replace the license headers and notices that were there to eliminate redundancy. I think what would be most amenable, and what I suggest, is:
 * to add 'SPDX-FileCopyrightText: ' before existing copyright notices, retaining their substance
 * retain existing license headers and blurbs when adding the license identifiers,
 * and since we're going to have all the licenses in a folder called LICENSES/, maybe delete or move whole license files that are elsewhere in the tree

Files that are not in their source form at all like configure scripts will not be touched. This technically violates the REUSE specification (the goal of the REUSE spec is to specify copyright and license info for whole repositories), and I know as a Debian packager I sure hate it when autogenerated files are kept in a VCS like they are source code when they're not, but I know you probably want to keep them and so we'll agree to disagree on that.

> Are you using one of the SPDX tools to match the licence texts, as the variations in BSD, MIT, and Verbatim licences can be confusing, and even when it states a name, it may be called something else by SPDX?
I will not use a tool to match license texts: all of my work will be done completely by hand and laying my eyes on every last line of source code. It's tedious, but that's why I don't think Newlib consumers or packagers should have to do it again, when I am willing to do it for Debian anyway.

I will pay attention to variations in licenses. REUSE says a copyright notice can contain whatever as long as it's considered a copyright notice (you don't have to include a year range, author name, or email address, but you *should* have at least one or two of those just in my personal opinion), but REUSE specifies standard names for licenses. This is important for machine readability, so in the SPDX-License-Identifier tags, I intend to use those. For example, the "MIT license" can be kind of an ambiguous name; what most people call the "MIT" license we Debianites like to call the Expat license. However, SPDX standardized on the MIT name probably because it's what more people know it as, so when I sprinkle License-Identifiers, those are the names I would be using.,

Debian Policy (at least right now) is that even if Newlib is REUSE compliant, I am still responsible for maintaining a Debian-style copyright file. Fortunately if Newlib is REUSE-compliant or mostly so, that means all of its information is machine-readable, so it should be a trivial matter to automatically read every file and put it in the format Debian likes.

> Could you please document the sources of these tools and how you intend to use them?
I'm not going to use any tools; it's going to be just me and Vim, because if I want people to rely on this machine-readable information, I better guarantee I'm getting it right. I might use tools like licensecheck just as I would in Debian, but they will not be relied upon at all. In Debian, we're not allowed to rely on license scanners that are fuzzy because we are responsible for *guaranteeing* that our representation of the license information is correct. A REUSE-compliant project means that folks *are* safe to use automated tools, as we've done the job upstream of making information machine-readable *and* making sure it is correct.

> What do you plan to do about uncatalogued licence texts: submit them to SPDX for review and (re-)naming, and/or just create a LicenseRef-Debian-NAME or (preferably?) LicenseRef-newlib-NAME or ExceptionRef-newlib-NAME placeholder?
SPDX includes a lot of licenses. If I run across one they don't have, I don't know that it'd be worth submitting, but even if I did want to advocate for its inclusion (and to be honest I've got bigger fish to fry), I would not want that to hold up my work. I see nothing wrong in the slightest with doing LicenseRef-newlib-NAME; that's exactly what that syntax is for, is when SPDX hasn't assigned an identifier to your precise license.

If I don't hear back with an authoritative description of how you guys have decided you would like me to annotate the files, then instead of letting that hold me up I'll just use my best judgment within a couple days on a private Git branch. And if you guys want me to change it, I'll change it, because I want this to happen.

I'm just trying to do a service to the Free Software community, that's why I contribute to Debian, and that's why I want to contribute upstream to the Newlib + Cygwin project, so everyone can benefit.

Happy hacking,
John


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      parent reply	other threads:[~2023-08-12 19:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11 12:14 Newlib copyright review John Scott
2023-08-11 18:23 ` (was: Newlib copyright review) and SPDX tagging to REUSE spec RFC Brian Inglis
2023-08-11 22:18   ` Joel Sherrill
2023-08-11 23:29     ` Brian Inglis
2023-08-12 19:05   ` John Scott [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=423d55baa97832044e72c53e7b1182e25f6f9196.camel@posteo.net \
    --to=jscott@posteo.net \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).