public inbox for cygwin-apps@cygwin.com
 help / color / mirror / Atom feed
* [PATCH cygport] Add check of SPDX expression provided by LICENSE variable
@ 2024-04-30 17:45 Christian Franke
  2024-04-30 18:10 ` Brian Inglis
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Franke @ 2024-04-30 17:45 UTC (permalink / raw)
  To: cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Jon Turney via Cygwin-apps wrote (thread "[PATCH cygport] Add 
repro-finish command"):
> ...
>> PS: I have a local script which checks SPDX Identifiers and 
>> expressions. Any interest to add this to cygport and then check 
>> LICENSE settings?
>
> Oh, yes please. That sounds like a good idea.
>

Attached.

The new script uses the SPDX webpages to create the license file. I 
didn't find a usable single license list at https://github.com/spdx

The data/spdx-licenses file is not included in the patch. It could be 
generated from the source dir with:

$ tools/spdx-check -f data/spdx-licenses -m
...
data/spdx-licenses: created

$ sha1sum data/spdx-licenses
80a19d6891d08bf34113464464ee12308374c792 *data/spdx-licenses

The changes to the meson files are guessed. I didn't test the meson 
build yet.

-- 
Regards,
Christian


[-- Attachment #2: 0001-Add-check-of-SPDX-expression-provided-by-LICENSE-var.patch --]
[-- Type: text/plain, Size: 7345 bytes --]

From 61f75757fa8e9118207cc09cf4a621aac8a4da78 Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.franke@t-online.de>
Date: Tue, 30 Apr 2024 19:28:01 +0200
Subject: [PATCH] Add check of SPDX expression provided by LICENSE variable

The new script 'tools/spdx-checks' checks a SPDX license expression.
License identifiers are provided by the new file 'spdx-licenses'
which could be created by the script from the related SPDX webpages.
---
 bin/cygport.in    |  17 ++++
 data/meson.build  |   1 +
 tools/meson.build |   1 +
 tools/spdx-check  | 198 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 217 insertions(+)
 create mode 100644 tools/spdx-check

diff --git a/bin/cygport.in b/bin/cygport.in
index 15bd559e..3166beba 100755
--- a/bin/cygport.in
+++ b/bin/cygport.in
@@ -41,6 +41,7 @@ declare -r  _cygport_version=@VERSION@;
 declare -r _privdatadir=@pkgdatadir@;
 declare -r _privclassdir=@cygclassdir@;
 declare -r _privlibdir=@cygpartdir@;
+declare -r _privtoolsdir=@pkgdatadir@/tools;
 declare -r _privgnuconfigdir=@gnuconfigdir@;
 declare -r _privsysconfdir=@sysconfdir@;
 
@@ -489,6 +490,22 @@ do
 	fi
 done
 
+if [ "${LICENSE+y}" = "y" ]
+then
+	if ! _out=$(${_privtoolsdir}/spdx-check -f ${_privdatadir}/spdx-licenses "${LICENSE}" 2>&1)
+	then
+		warning "LICENSE='${LICENSE}' is invalid:"
+		echo "${_out}"
+	elif [ "${_out:+y}" = "y" ]
+	then
+		warning "LICENSE='${LICENSE}' has warnings:"
+		echo "${_out}"
+	else
+		inform "LICENSE='${LICENSE}' is valid"
+	fi
+	unset _out
+fi
+
 for restrict in ${RESTRICT//,/ }
 do
 	declare _CYGPORT_RESTRICT_${restrict//-/_}_=1
diff --git a/data/meson.build b/data/meson.build
index 51c6a5fd..e83a90fe 100644
--- a/data/meson.build
+++ b/data/meson.build
@@ -2,6 +2,7 @@ datadocs = files('cygport.conf', 'mirrors')
 
 install_data('mirrors',
              'sample.cygport',
+             'spdx-licenses',
              install_dir: pkgdatadir)
 
 install_data('gnuconfig/config.guess',
diff --git a/tools/meson.build b/tools/meson.build
index acd83926..96d8d19e 100644
--- a/tools/meson.build
+++ b/tools/meson.build
@@ -1,6 +1,7 @@
 tools = files(
     'deb2targz',
     'pkgrip',
+    'spdx-check',
     'sysrootize'
 )
 
diff --git a/tools/spdx-check b/tools/spdx-check
new file mode 100644
index 00000000..bffcaae0
--- /dev/null
+++ b/tools/spdx-check
@@ -0,0 +1,198 @@
+#! /bin/bash
+###############################################################################
+#
+# spdx-check - check SPDX license expression
+#
+# Copyright (C) 2024 Christian Franke
+#
+# SPDX-License-Identifier: BSD-3-Clause
+#
+################################################################################
+
+set -e -o pipefail
+myname=$0
+
+# SPDX license list web pages
+spdx_url_lic="https://spdx.org/licenses/index.html"
+spdx_url_exc="https://spdx.org/licenses/exceptions-index.html"
+
+# Default license file
+def_spdx_file="$(dirname "$myname")/spdx-licenses"
+
+usage()
+{
+  cat <<EOF
+Check SPDX license expression.
+
+Usage: $myname [-f FILE] [-mu] 'SPDX_EXPR'
+       $myname [-f FILE] -mu
+
+  -f          read license identifiers from FILE
+              [default: $def_spdx_file]
+  -m          create missing license file from SPDX webpages
+  -u          always update the license file
+
+  SPDX_EXPR   check this SPDX license expression
+EOF
+  exit 1
+}
+
+error()
+{
+  echo "Error:" "$@" >&2
+  exit 1
+}
+
+warning()
+{
+  echo "Warning:" "$@" >&2
+}
+
+check_spdx_id()
+{
+  local id=$1
+  local m m_id
+
+  if ! [ -f "$spdx_file" ]; then
+    warning "Missing '$spdx_file' - SPDX identifier '$1' not checked"
+    return 0
+  fi
+
+  # SPDX identifiers are case insensitive but the correct case is recommended
+  m=$(grep -Ei -m 1 "^!?&?${id//+/\\+}\$" "$spdx_file" 2>/dev/null) \
+    || error "Unknown SPDX identifier '$id'"
+
+  # TODO: Distinguish licenses and exceptions
+  m_id=${m#!}; m_id=${m_id#&}
+
+  [ "$m_id" = "$id" ] || warning "It is recommended to use '$m_id' instead of '$id'"
+  [ "$m" = "${m#!}" ] || warning "SPDX identifier '$m_id' is deprecated"
+}
+
+check_spdx_expr()
+{
+  local x=$1
+  local f s t
+
+  # Insert spaces around tokens to simplify parsing
+  x=" $x "; x=${x//(/ ( }; x=${x//)/ ) }
+
+  # Check tokens
+  f=false
+  for t in $x; do
+    f=true
+    case $t in
+      AND|OR|WITH|[\(\)])
+        ;;
+      [Aa][Nn][Dd]|[Oo][Rr]|[Ww][Ii][Tt][Hh])
+        error "Invalid token '$t' - use '${t@U}' instead" ;;
+      [0-9A-Za-z]*)
+        s=${t%+}; s=${s//[-.0-9A-Za-z]/}
+        [ -z "$s" ] || error "Invalid character(s) '$s' in '$t'" ;;
+      *)
+        error "Invalid token '$t'" ;;
+    esac
+  done
+  $f || error "Expression is empty"
+
+  # Check expression syntax heuristically using these replacements:
+  # - all operators -> '%'
+  # - all operands -> '@'
+  # - remove all spaces
+  # - '(@...%@)' -> '@' -- in a 'loop' to restart from the beginning
+  # - '@...%@' -> '' -- syntax error if nonempty
+  s=$(
+    sed -E \
+      -e 's/ (AND|OR|WITH) / % /g' \
+      -e 's/ [^ %()@]+ / @ /g' \
+      -e 's/ //g' \
+      -e ':loop' \
+      -e   's/\(@(%@)*\)/@/g' \
+      -e   't loop' \
+      -e 's/^@(%@)*$//' \
+      <<<"$x"
+  )
+  [ -z "$s" ] || error "Invalid syntax of SPDX expression"
+
+  # Check license identifiers
+  for t in $x; do
+    case $t in
+      AND|OR|WITH|[\(\)]) ;;
+      *) check_spdx_id "$t"
+    esac
+  done
+}
+
+# Extract identifiers from SPDX webpage and prepend flags
+html2id()
+{
+  sed -n \
+      -e '1,/<h2[^>]*>Deprecated/s/^.*<code property="spdx:licenseId">\([^>]*\)<.*$/\1/p' \
+      -e '/<h2[^>]*>Deprecated/,$s/^.*<code property="spdx:licenseId">\([^>]*\)<.*$/!\1/p' \
+      -e '1,/<h2[^>]*>Deprecated/s/^.*<code property="spdx:licenseExceptionId">\([^>]*\)<.*$/\&\1/p' \
+      -e '/<h2[^>]*>Deprecated/,$s/^.*<code property="spdx:licenseExceptionId">\([^>]*\)<.*$/!\&\1/p'
+}
+
+get_spdx_file()
+{
+  local f="$spdx_file.new"
+  local n1 n2
+
+  cat <<EOF >>"$f"
+# List of SPDX identifiers
+#
+# Created by '${myname##*/}' from these web pages:
+#   $spdx_url_lic
+#   $spdx_url_exc
+#
+# Flags: '!' - deprecated, '&' - license exception
+#
+EOF
+
+  # Download and extract identifiers
+  wget -O - "$spdx_url_lic"  | html2id >>"$f"
+  n1=$(wc -l <"$f")
+  echo "#" >>"$f"
+  wget -O - "$spdx_url_exc"  | html2id >>"$f"
+  n2=$(wc -l <"$f")
+
+  # Check length to detect download problems
+  { [[ n1 -gt 500 ]] && [[ $((n2-n1)) -gt 50 ]]; } || error "$f: File too small"
+
+  if [ -f "$spdx_file" ]; then
+    # Keep old file if unchanged otherwise keep it as backup
+    if cmp -s "$spdx_file" "$f"; then
+      echo "$spdx_file: unchanged"
+      rm -f "$f"
+    else
+      mv -bf "$f" "$spdx_file"
+      echo "$spdx_file: updated"
+    fi
+  else
+    mv -f "$f" "$spdx_file"
+    echo "$spdx_file: created"
+  fi
+}
+
+# Parse options
+spdx_file=$def_spdx_file
+m_opt=false; u_opt=false
+
+while true; do case $1 in
+  -f) [ $# -gt 1 ] || usage; shift; spdx_file=$1 ;;
+  -m) m_opt=true ;;
+  -u) u_opt=true; m_opt=true ;;
+  -*) usage ;;
+  *) break ;;
+esac; shift; done
+{ [ $# = 0 ] && $m_opt; } || [ $# = 1 ] || usage
+
+# Create or update the license file if requested
+if $u_opt || { $m_opt && ! [ -f "$spdx_file" ]; }; then
+  get_spdx_file
+fi
+
+# Check license expression
+if [ $# = 1 ]; then
+  check_spdx_expr "$1"
+fi
-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH cygport] Add check of SPDX expression provided by LICENSE variable
  2024-04-30 17:45 [PATCH cygport] Add check of SPDX expression provided by LICENSE variable Christian Franke
@ 2024-04-30 18:10 ` Brian Inglis
  2024-04-30 21:07   ` Christian Franke
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Inglis @ 2024-04-30 18:10 UTC (permalink / raw)
  To: cygwin-apps

On 2024-04-30 11:45, Christian Franke via Cygwin-apps wrote:
> Jon Turney via Cygwin-apps wrote:
>>> PS: I have a local script which checks SPDX Identifiers and expressions. Any 
>>> interest to add this to cygport and then check LICENSE settings?
>> Oh, yes please. That sounds like a good idea.

> Attached.
> The new script uses the SPDX webpages to create the license file. I didn't find 
> a usable single license list at https://github.com/spdx

What about:

	https://spdx.github.io/license-list-data/

and everything under:

	https://github.com/spdx/license-list-data

> The data/spdx-licenses file is not included in the patch. It could be generated 
> from the source dir with:
> $ tools/spdx-check -f data/spdx-licenses -m
> ...
> data/spdx-licenses: created
> $ sha1sum data/spdx-licenses
> 80a19d6891d08bf34113464464ee12308374c792 *data/spdx-licenses
> The changes to the meson files are guessed. I didn't test the meson build yet.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH cygport] Add check of SPDX expression provided by LICENSE variable
  2024-04-30 18:10 ` Brian Inglis
@ 2024-04-30 21:07   ` Christian Franke
  2024-04-30 22:12     ` Brian Inglis
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Franke @ 2024-04-30 21:07 UTC (permalink / raw)
  To: cygwin-apps

Brian Inglis via Cygwin-apps wrote:
> On 2024-04-30 11:45, Christian Franke via Cygwin-apps wrote:
> ...
>> Attached.
>> The new script uses the SPDX webpages to create the license file. I 
>> didn't find a usable single license list at https://github.com/spdx
>
> What about:
>
>     https://spdx.github.io/license-list-data/
>

This is apparently a draft version of 
https://spdx.org/licenses/index.html which is used by the script to 
generate the local license file.


> and everything under:
>
>     https://github.com/spdx/license-list-data

I didn't find a single file which lists the licenses there.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH cygport] Add check of SPDX expression provided by LICENSE variable
  2024-04-30 21:07   ` Christian Franke
@ 2024-04-30 22:12     ` Brian Inglis
  2024-05-01 10:56       ` Christian Franke
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Inglis @ 2024-04-30 22:12 UTC (permalink / raw)
  To: cygwin-apps

On 2024-04-30 15:07, Christian Franke via Cygwin-apps wrote:
> Brian Inglis via Cygwin-apps wrote:
>> On 2024-04-30 11:45, Christian Franke via Cygwin-apps wrote:
>>> The new script uses the SPDX webpages to create the license file. I didn't 
>>> find a usable single license list at https://github.com/spdx

As usual, it is easier if you clearly state the purpose of the file you want, 
and its desired properties, like data content, format, etc.

>> What about:
>>     https://spdx.github.io/license-list-data/

> This is apparently a draft version of https://spdx.org/licenses/index.html which 
> is used by the script to generate the local license file.

Strip out the table entries and create what you want with a command or script.

>> and everything under:
>>     https://github.com/spdx/license-list-data

> I didn't find a single file which lists the licenses there.

GH does not always make access easy, with its limited online displays and fixed 
display orders, and searches return a lot of junk, without easy access to better 
searching in context, but try:

	https://github.com/spdx/license-list-data/blob/main/licenses.md

which also has xrefs to the text files; also there are:

	https://github.com/spdx/license-list-data/blob/main/json/licenses.json
	https://github.com/spdx/license-list-data/blob/main/json/exceptions.json

which can be easily processed using `jq`.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH cygport] Add check of SPDX expression provided by LICENSE variable
  2024-04-30 22:12     ` Brian Inglis
@ 2024-05-01 10:56       ` Christian Franke
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Franke @ 2024-05-01 10:56 UTC (permalink / raw)
  To: cygwin-apps

Brian Inglis via Cygwin-apps wrote:
> On 2024-04-30 15:07, Christian Franke via Cygwin-apps wrote:
>> Brian Inglis via Cygwin-apps wrote:
>>> On 2024-04-30 11:45, Christian Franke via Cygwin-apps wrote:
>>>> The new script uses the SPDX webpages to create the license file. I 
>>>> didn't find a usable single license list at https://github.com/spdx
>
> As usual, it is easier if you clearly state the purpose of the file 
> you want, and its desired properties, like data content, format, etc.
>
>>> What about:
>>>     https://spdx.github.io/license-list-data/
>
>> This is apparently a draft version of 
>> https://spdx.org/licenses/index.html which is used by the script to 
>> generate the local license file.
>
> Strip out the table entries and create what you want with a command or 
> script.

The spdx-check script from the patch optionally (-m, -u) downloads 
https://spdx.org/licenses/index.html and creates the local spdx-licenses 
file intended to distribute with cygport. The file is grep'able.and 
reduced to the bare minimum for this use case.


>
>>> and everything under:
>>>     https://github.com/spdx/license-list-data
>
>> I didn't find a single file which lists the licenses there.
>
> GH does not always make access easy, ...

... including that github.com is still unreachable via IPv6 without 
NAT64 (except for downloads from raw.githubusercontent.com) ...


> ... with its limited online displays and fixed display orders, and 
> searches return a lot of junk, without easy access to better searching 
> in context, but try:
>
>     https://github.com/spdx/license-list-data/blob/main/licenses.md
>
> which also has xrefs to the text files; also there are:
>
>     https://github.com/spdx/license-list-data/blob/main/json/licenses.json 
>
>     https://github.com/spdx/license-list-data/blob/main/json/exceptions.json 
>
>
> which can be easily processed using `jq`.
>

Indeed, thanks. I obviously missed these files when I wrote the 
spdx-check script some month ago.

The current file format used by the script could then be created with:

url="https://raw.githubusercontent.com/spdx/license-list-data/main/json"

wget -O - "$url/licenses.json" \
| jq -j '
     .licenses[] | (
       if .isDeprecatedLicenseId then "!" else "" end,
       .licenseId,
       "\n"
     )'

wget -O - "$url/exceptions.json" \
| jq -j '
     .exceptions[] | (
       if .isDeprecatedLicenseId then "!&" else "&" end,
       .licenseExceptionId,
       "\n"
     )'

This adds these license ids not yet mentioned at 
https://spdx.org/licenses/index.html:
AMD-newlib, BSD-2-clause-first-lines, Catharon, HPND-UC-export-US,
MIT-Khronos-old, NCL, OAR, Sun-PPP-2000, pkgconf, threeparttable, xzoom

I could provide a new patch with an updated script if desired.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-01 10:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-30 17:45 [PATCH cygport] Add check of SPDX expression provided by LICENSE variable Christian Franke
2024-04-30 18:10 ` Brian Inglis
2024-04-30 21:07   ` Christian Franke
2024-04-30 22:12     ` Brian Inglis
2024-05-01 10:56       ` Christian Franke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).