updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided

public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed

* updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
       [not found] <986736274.144968.1630167325057.ref@mail.yahoo.com>
@ 2021-08-28 16:15 ` Dan Harkless
  2021-08-28 16:23   ` Dan Harkless
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Harkless @ 2021-08-28 16:15 UTC (permalink / raw)
  To: cygwin

Howdy.  Quick bug to report that I was surprised not to be able to find prior discussion of.  My nightly updatedb job stopped working after 2021-08-03, but I'd been too busy to do much troubleshooting until today.
Looks like it's because in findutils 4.8.0-1, the bigram.exe program is no longer provided, but the /usr/bin/updatedb script (still) depends on it being there:
    [...]
    + for binary in $find $frcode $bigram $code     + checkbinary /usr/libexec/frcode 
    + test -x /usr/libexec/frcode 
    + : ok 
    + for binary in $find $frcode $bigram $code 
    + checkbinary /usr/libexec/bigram 
    + test -x /usr/libexec/bigram 
    + eval echo 'updatedb needs to be able to execute /usr/libexec/bigram, but cannot.' 
    ++ echo updatedb needs to be able to execute /usr/libexec/bigram, but cannot. 
    updatedb needs to be able to execute /usr/libexec/bigram, but cannot. 
    + exit 1 

Reverting to findutils 4.6.0-1 brings back bigram, and updatedb works again. Looking at the 4.6.0 updatedb script, it seems computation of bigrams is an essential step in creating the filename DB (with no bigram man page, I was at first misinterpreting it as "big-ram.exe", rather than "bi-gram.exe").
I thought perhaps bigram.exe had been moved to a different package, but https://cygwin.com/cgi-bin2/package-grep.cgi?grep=bigram.exe&arch=x86_64 finds it only in findutils 4.5.12-1 and 4.6.0-1.
I highly depend on the ability to do quick filename greps across all drives, so hopefully the findutils package will be repaired so that updatedb works again.  In the meantime I'll manually keep findutils unupdated on Cygwin32 and Cygwin64 on my various systems.

Thanks, as always, for maintaining the lifeline to UNIX-esque sanity on Windows that is Cygwin!
--Dan Harkless

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
  2021-08-28 16:15 ` updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided Dan Harkless
@ 2021-08-28 16:23   ` Dan Harkless
  2021-08-29 11:02     ` Hans-Bernhard Bröker
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Harkless @ 2021-08-28 16:23 UTC (permalink / raw)
  To: cygwin

On 8/28/2021 9:15 AM, Dan Harkless wrote:
[...]

Sorry for the Webmail-mangled spacing in that last message. Trying again 
here:

Howdy.  Quick bug to report that I was surprised not to be able to find prior discussion of.  My nightly updatedb job stopped working after 2021-08-03, but I'd been too busy to do much troubleshooting until today.

Looks like it's because in findutils 4.8.0-1, the bigram.exe program is no longer provided, but the /usr/bin/updatedb script (still) depends on it being there:

     [...]
     + for binary in $find $frcode $bigram $code
     + checkbinary /usr/libexec/frcode
     + test -x /usr/libexec/frcode
     + : ok
     + for binary in $find $frcode $bigram $code
     + checkbinary /usr/libexec/bigram
     + test -x /usr/libexec/bigram
     + eval echo 'updatedb needs to be able to execute /usr/libexec/bigram, but cannot.'
     ++ echo updatedb needs to be able to execute /usr/libexec/bigram, but cannot.
     updatedb needs to be able to execute /usr/libexec/bigram, but cannot.
     + exit 1

Reverting to findutils 4.6.0-1 brings back bigram, and updatedb works again. Looking at the 4.6.0 updatedb script, it seems computation of bigrams is an essential step in creating the filename DB (with no bigram man page, I was at first misinterpreting it as "big-ram.exe", rather than "bi-gram.exe").

I thought perhaps bigram.exe had been moved to a different package, buthttps://cygwin.com/cgi-bin2/package-grep.cgi?grep=bigram.exe&arch=x86_64  finds it only in findutils 4.5.12-1 and 4.6.0-1.

I highly depend on the ability to do quick filename greps across all drives, so hopefully the findutils package will be repaired so that updatedb works again.  In the meantime I'll manually keep findutils unupdated on Cygwin32 and Cygwin64 on my various systems.

Thanks, as always, for maintaining the lifeline to UNIX-esque sanity on Windows that is Cygwin!

--
Dan Harkless

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
  2021-08-28 16:23   ` Dan Harkless
@ 2021-08-29 11:02     ` Hans-Bernhard Bröker
  2021-08-29 12:06       ` Dan Harkless
  0 siblings, 1 reply; 8+ messages in thread
From: Hans-Bernhard Bröker @ 2021-08-29 11:02 UTC (permalink / raw)
  To: cygwin

Am 28.08.2021 um 18:23 schrieb Dan Harkless:

> Looks like it's because in findutils 4.8.0-1, the bigram.exe program is 
> no longer provided, but the /usr/bin/updatedb script (still) depends on 
> it being there:

> 
>      [...]
>      + for binary in $find $frcode $bigram $code
>      + checkbinary /usr/libexec/frcode

The version of updatedb in the 4.8.0-1 package does not actually contain 
those lines.  Mention of both $bigram and $code has been removed from 
the loop construct (and from everywhere else in the script).

That's because the old format of find databases, which is the only one 
actually using bigram and code, was removed from updatedb as of 
findutils version 4.7, so there really cannot be a need for the bigram 
tool any more.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
  2021-08-29 11:02     ` Hans-Bernhard Bröker
@ 2021-08-29 12:06       ` Dan Harkless
  2021-08-30  0:06         ` Brian Inglis
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Harkless @ 2021-08-29 12:06 UTC (permalink / raw)
  To: cygwin; +Cc: bug-findutils

On 8/29/2021 4:02 AM, Hans-Bernhard Bröker wrote:
> Am 28.08.2021 um 18:23 schrieb Dan Harkless:
>> Looks like it's because in findutils 4.8.0-1, the bigram.exe program 
>> is no longer provided, but the /usr/bin/updatedb script (still) 
>> depends on it being there:
>      [...]
>>      + for binary in $find $frcode $bigram $code
>>      + checkbinary /usr/libexec/frcode
>
> The version of updatedb in the 4.8.0-1 package does not actually 
> contain those lines.  Mention of both $bigram and $code has been 
> removed from the loop construct (and from everywhere else in the script).
>
> That's because the old format of find databases, which is the only one 
> actually using bigram and code, was removed from updatedb as of 
> findutils version 4.7, so there really cannot be a need for the bigram 
> tool any more.

Argh!  So sorry for the false report!  I completely forgot that years 
back I had made a locally patched version (which is earlier in my path) 
of Cygwin updatedb 4.6.0-1 to troubleshoot and work around problems I 
was having with the tool.

I have 12M+ pathnames on my main Windows system, and I suddenly started 
having issues with the updatedb going from taking less than an hour, to 
taking more than 24 hours, and running into the next job.

It was very awkward to try to troubleshoot what was going on without a 
'find' log to 'tail', so I patched my  local copy of updatedb to write 
to an intermediate file, rather than going direct to 'sort' over a pipe.

Another problem I was having was that though I have 24 GB of RAM on my 
system, I would get low-memory popup warnings from the OS when the sort 
would go off.  (The warnings mislay the blame on Firefox, because I 
usually have big sessions running that take even more RAM than the sort.)

I was hoping running sort on a _file_ rather than stdin might allow it 
to reduce the RAM use enough to not get the warning, but unfortunately 
(and unsurprisingly) I still get it with the intermediate file.  This is 
just a warning, though — I haven't had it actually run out of RAM so 
far, I don't think.

The final problem I was addressing in my patched version was some 
missing error-checking, which was causing me to be left with _no_ 
filename DB, when the update would fail, rather than at least being left 
with the one from last time.

I could send along my patches, but I don't know that I've solved these 
issues in a general enough way.  For instance, my 12 million+ pathnames 
come out to about 1.4 GiB of UNIX-linefeed-separated UTF-8 strings.  
Writing that much to my HD is not a concern, but obviously some people 
might not want to write that much every time to, say, a small 
flash-based device.

Thoughts?

-- 
Dan Harkless

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
  2021-08-29 12:06       ` Dan Harkless
@ 2021-08-30  0:06         ` Brian Inglis
  2022-02-24 16:32           ` Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc Dan Harkless
  0 siblings, 1 reply; 8+ messages in thread
From: Brian Inglis @ 2021-08-30  0:06 UTC (permalink / raw)
  To: cygwin

On 2021-08-29 06:06, Dan Harkless wrote:
> On 8/29/2021 4:02 AM, Hans-Bernhard Bröker wrote:
>> Am 28.08.2021 um 18:23 schrieb Dan Harkless:
>>> Looks like it's because in findutils 4.8.0-1, the bigram.exe program 
>>> is no longer provided, but the /usr/bin/updatedb script (still) 
>>> depends on it being there:
>>      [...]
>>>      + for binary in $find $frcode $bigram $code
>>>      + checkbinary /usr/libexec/frcode

>> The version of updatedb in the 4.8.0-1 package does not actually 
>> contain those lines.  Mention of both $bigram and $code has been 
>> removed from the loop construct (and from everywhere else in the script).
>>
>> That's because the old format of find databases, which is the only one 
>> actually using bigram and code, was removed from updatedb as of 
>> findutils version 4.7, so there really cannot be a need for the bigram 
>> tool any more.

> Argh!  So sorry for the false report!  I completely forgot that years 
> back I had made a locally patched version (which is earlier in my path) 
> of Cygwin updatedb 4.6.0-1 to troubleshoot and work around problems I 
> was having with the tool.
> 
> I have 12M+ pathnames on my main Windows system, and I suddenly started 
> having issues with the updatedb going from taking less than an hour, to 
> taking more than 24 hours, and running into the next job.
> 
> It was very awkward to try to troubleshoot what was going on without a 
> 'find' log to 'tail', so I patched my  local copy of updatedb to write 
> to an intermediate file, rather than going direct to 'sort' over a pipe.
> 
> Another problem I was having was that though I have 24 GB of RAM on my 
> system, I would get low-memory popup warnings from the OS when the sort 
> would go off.  (The warnings mislay the blame on Firefox, because I 
> usually have big sessions running that take even more RAM than the sort.)
> 
> I was hoping running sort on a _file_ rather than stdin might allow it 
> to reduce the RAM use enough to not get the warning, but unfortunately 
> (and unsurprisingly) I still get it with the intermediate file.  This is 
> just a warning, though — I haven't had it actually run out of RAM so 
> far, I don't think.
> 
> The final problem I was addressing in my patched version was some 
> missing error-checking, which was causing me to be left with _no_ 
> filename DB, when the update would fail, rather than at least being left 
> with the one from last time.
> 
> I could send along my patches, but I don't know that I've solved these 
> issues in a general enough way.  For instance, my 12 million+ pathnames 
> come out to about 1.4 GiB of UNIX-linefeed-separated UTF-8 strings. 
> Writing that much to my HD is not a concern, but obviously some people 
> might not want to write that much every time to, say, a small 
> flash-based device.
> 
> Thoughts?

Thanks for the analysis Hans-Bernhard.

Please recheck the announcement for 4.8 and change info for 4.7: as of 
4.8 locate should still work on old format dbs, but from 4.7 updatedb 
will no longer generate or update them, and in some future release, 
locate will no longer work on them.
The old (pre-GNU Unix) format was deprecated from 4.0 (~25 years ago!) 
and each run of updatedb should have warned you to upgrade, unless you 
patched that out.

See:

	$ info finding databases 'database formats' old

or:

<https://gnu.org/software/findutils/manual/html_node/find_html/Old-Database-Format.html>

I searched for more info on the discussion list archive at:

	<https://lists.gnu.org/archive/html/bug-findutils/>

but could find nothing obviously related to upgrading or migrating, 
although that archive goes back only ~20 years! ;^>

Migration appears to require running the previous 4.6 updatedb without 
--old-format to regenerate the new database in LOCATE02 format?
You should then be able to upgrade to the latest 4.8 findutils and use 
that going forward.

You could email the discussion list <mailto:bug-findutils@gnu.org> about 
your situation, file sizes, timings, migration path, and issues, and 
cross-post here about anything in the replies we may be able to help you 
with.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc.
  2021-08-30  0:06         ` Brian Inglis
@ 2022-02-24 16:32           ` Dan Harkless
  2022-02-27 11:54             ` Bernhard Voelker
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Harkless @ 2022-02-24 16:32 UTC (permalink / raw)
  To: bug-findutils, cygwin

[-- Attachment #1: Type: text/plain, Size: 17521 bytes --]

Howdy.  I posted last August to the Cygwin list about some problems I 
was having with the updatedb script, including that it was taking more 
than 24 hours to complete, and then colliding with the next cron run, 
and that I had no way to monitor progress (short of using Sysinternals 
Process Explorer's very awkward GUI for this).

I'm finally getting around to sending in a patch (to bug-findutils and 
the Cygwin list, to which I'm currently subscribed) to address these 
issues, along with some others, a few of which represent small changes 
in behavior:

1. Changed the direct 'find' -> 'sort' pipelining of the file list to 
instead go to a temporary file first (and then a second one, when 
sorting).  This allows monitoring of progress with 'tail -f'.

2. By default, these .txt and .txt.sort files get deleted on exit (or 
fatal signal), but I've added a --keeptext option which can be set to 
'sort' or 'both' to preserve them, either for debugging purposes, or for 
running special-purpose text-munging/matching scripts on.

3. The script now does locking, outputting its PID to (by default) 
/var/locatedb.running_updatedb_pid (I was first just going to just call 
it locatedb.lock, but I decided to be more user-friendly).  If that file 
exists when another instance starts to run, the script will abort with 
an error.

4. Improved signal-handling, catching more fatal signals.  I also 
changed SIGHUP to not be treated as fatal, in line with most programs.

5. /dev was not being excluded by default on Cygwin, since there we 
can't recognize it by filesystem type.  This was causing some 
time-consuming looping problems for me, with falsely self-nested paths.

6. I tried a whole bunch of different quoting variations, but I couldn't 
get --prunepaths to work with paths containing spaces (which are of 
course very common on Windows).  The PRUNEREGEX would always end up 
incorrectly splitting on the spaces.  I tried to improve the regexp to 
treat '\ ' differently than ' ', but I couldn't get it to work (I'm more 
expert with Perl regexps than POSIX ones).  In the end, I used a 
not-TOO-ugly kludge: in the first sed -e expression, I change '\ ' to 
'///', and then in the last one I change '///' back to ' ' (the 
backslash isn't needed in the regexp).  Of course '///' should never 
appear in a path; I didn't use simply '//' because it's a relatively 
common artifact of path concatenation.

7. make_tempdir() wasn't being used in the current version of findutils, 
so I removed it.

8. Previously, the only protection the script did of the prior version 
of the database was to write to locatedb.n, and then overwrite locatedb 
with it at the end.  This was a problem for me when trying to debug the 
updatedb issues I was having, since, for instance, if you killed the 
'find', it'd still overwrite the old DB with a partial file list.  The 
script now saves the previous version of the DB as locatedb.prev.  This 
is also quite handy when a file you expect to be able to find goes 
missing, in case it disappeared since the last updatedb run.  (I didn't 
address the underlying problem of the script not aborting if 'find' did; 
the bizarre 4-way 'find' construction with file redirection from the 
middle of the if {} is unlike anything I've seen in a shell script before.)

9. I also made some minor changes, including not outputting the full 
script path on errors, standardizing error formatting, putting quotes 
around arguments in errors, and various cleanups and comment improvements.

10. The most significant of the minor changes I made was standardizing 
the indentation, which was all over the place, making it difficult to 
understand some of the code while debugging. Because of that, in 
addition to the attached 'diff -u' patch, here's the output of 'diff 
-uw' to make it easier to review my changes.  My Linux systems use a 
different version of locate, thus I didn't test there, but I think I 
remained platform-agnostic (and also didn't write any code that wouldn't 
work under Bourne/POSIX shell).  BTW, I assign my copyright to GNU (I 
filled out the official form some years ago when I was co-maintainer of 
Wget), and if you feel my self-credit in the author list comment is 
unwarranted, of course feel free to get rid of it.

Here's that 'diff -uw' output (again, 'diff -u' patch attached):
--- updatedb.orig    2022-02-05 09:37:55.000000000 -0800
+++ updatedb    2022-02-24 03:27:10.749175300 -0800
@@ -15,13 +15,20 @@
  # You should have received a copy of the GNU General Public License
  # along with this program.  If not, see <https://www.gnu.org/licenses/>.

-# csh original by James Woods; sh conversion by David MacKenzie.
+# csh original by James Woods; sh conversion by David MacKenzie;
+# cleanup and enhancements by Dan Harkless.

  #exec 2> /tmp/updatedb-trace.txt
  #set -x

+ourname=`basename $0`  # don't verbosely report path to the script in 
errors
+
+stderr() {
+    echo "$ourname: $*" >&2
+}
+
  version='
-updatedb (GNU findutils) 4.9.0
+updatedb (GNU findutils) 4.9.0+patches
  Copyright (C) 1994-2022 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
<https://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
@@ -47,11 +54,12 @@
  # (correctly) points to https://www.gnu.org/software/findutils/ instead
  # of the bug reporting page.
  usage="\
-Usage: $0 [--findoptions='-option1 -option2...']
+Usage: $ourname [--findoptions='-option1 -option2...']
         [--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
         [--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
         [--output=dbfile] [--netuser=user] [--localuser=user]
-       [--dbformat] [--version] [--help]
+       [--dbformat=(LOCATE02|slocate)] [--keeptxt=(sort|both)]
+       [--version] [--help]

  Please see also the documentation at 
https://www.gnu.org/software/findutils/.
  Report (and track progress on fixing) bugs in the updatedb
@@ -61,8 +69,7 @@
  "
  changeto=/

-for arg
-do
+for arg; do
    # If we are unable to fork, the back-tick operator will
    # fail (and the shell will emit an error message).  When
    # this happens, we exit with error value 71 (EX_OSERR).
@@ -80,10 +87,11 @@
      --localuser) LOCALUSER="$val" ;;
      --changecwd)  changeto="$val" ;;
      --dbformat)   dbformat="$val" ;;
-    --version) fail=0; echo "$version" || fail=1; exit $fail ;;
-    --help)    fail=0; echo "$usage"   || fail=1; exit $fail ;;
-    *) echo "updatedb: invalid option $opt
-Try '$0 --help' for more information." >&2
+        --keeptxt)     keeptxt="$val" ;;
+        --version) fail=0; echo "$version" >&2 || fail=1; exit $fail ;;
+        --help)    fail=0; echo "$usage"   >&2 || fail=1; exit $fail ;;
+        *) stderr 'Invalid option "'$opt'".'
+           echo "          Try '$ourname --help' for more information." >&2
         exit 1 ;;
    esac
  done
@@ -100,13 +108,87 @@
          ;;
      *)
          # The "old" database format is no longer supported.
-        echo "Unsupported locate database format ${dbformat}: Supported 
formats are:" >&2
-        echo "LOCATE02, slocate" >&2
+        stderr 'Unsupported locate database format "'$dbformat'".'
+        echo '          Supported formats are "LOCATE02" or "slocate".' >&2
          exit 1
  esac

+# The database file to build (overridable via commandline or 
environment var.).
+: ${LOCATE_DB=/var/locatedb}
+LOCATE_DB_DIR=`dirname $LOCATE_DB`
+
+# Prevent overlapping with ourselves.  Large filesystem collections can 
easily
+# take over 24 hours to complete, even on pretty speedy systems / hard 
drives.
+# Ideally this would go in /var/run on systems that have that, but this 
is OK.
+lockfile=$LOCATE_DB.running_updatedb_pid
+
+if [ -e $lockfile ]; then
+    stderr "Aborting since prior run's lockfile still exists:"
+    ls -lF $lockfile >&2
+    exit 1
+fi
+
+keeptxt=neither
+reported_lockfile_failure=0
+
+cleanup_on_exit_or_signal() {
+    rm -f $LOCATE_DB.n
+
+    if [ $reported_lockfile_failure -ne 1 ]; then
+        # We didn't already have a failure trying to initially create the
+        # lockfile, so we can assume the temporary .txt files are ours 
(not
+        # saved on a previous run with --keeptxt), and it's safe to
+        # (optionally) delete them.
+        if [ x"$keeptxt" = x"sort" ]; then
+            rm -f $LOCATE_DB.txt
+        elif [ x"$keeptxt" != x"both" ]; then
+            # TBD: Report undefined values of --keeptxt?
+            rm -f $LOCATE_DB.txt $LOCATE_DB.txt.sort
+        fi
+    fi
+
+    if ! rm -f $lockfile; then
+        report_lockfile_failure "remove"
+    fi
+}
+
+report_lockfile_failure() {
+    if [ $reported_lockfile_failure -ne 1 ]; then
+        echo -n "$ourname: Failed to $1 lockfile $lockfile" >&2
+
+        if [ -e $lockfile ]; then
+            echo ":" >&2
+            ls -lF $lockfile >&2
+        else
+            echo " in dir:" >&2
+            ls -dlF $LOCATE_DB_DIR >&2
+        fi
+
+        reported_lockfile_failure=1
+    fi
+}
+
+# Now that we've checked for a previous lockfile above, it's safe to 
install
+# cleanup signal handler.  We'll try to catch all potentially fatal 
signals,
+# along with exit.  From CentOS 7's /usr/include/asm/signal.h:
+#[shell exit=0] SIGHUP=1        SIGINT=2        SIGQUIT=3 SIGILL=4
+#SIGTRAP=5      SIGABRT=6       SIGIOT=6        SIGBUS=7 SIGFPE=8
+#SIGKILL=9      SIGUSR1=10      SIGSEGV=11      SIGUSR2=12 SIGPIPE=13
+#SIGALRM=14     SIGTERM=15      SIGSTKFLT=16    SIGCHLD=17 SIGCONT=18
+#SIGSTOP=19     SIGTSTP=20      SIGTTIN=21      SIGTTOU=22 SIGURG=23
+#SIGXCPU=24     SIGXFSZ=25      SIGVTALRM=26    SIGPROF=27 SIGWINCH=28
+#SIGIO=29       SIGPOLL=SIGIO   SIGLOST=29      SIGPWR=30 SIGSYS=31
+trap cleanup_on_exit_or_signal 0 2 3 4 6 7 8 9 11 15 16 30 31
+
+# Now that we've installed the signal handler, it's safe to create 
lockfile.
+if ! echo $$ > $lockfile; then
+    report_lockfile_failure "write to"
+    exit 1
+fi

-if true
+# Don't use NUL as a path separator, now that we write to a temporary 
text file
+# (may want to make that controllable with a commandline option in the 
future).
+if false
  then
      sort="/usr/bin/sort -z"
      print_option="-print0"
@@ -123,7 +205,7 @@
      id | cut -d'(' -f 1 | cut -d'=' -f2
  }

-# figure out if su supports the -s option
+# Figure out if su supports the -s option.
  select_shell() {
      if su "$1" -s $SHELL -c false < /dev/null  ; then
      # No.
@@ -140,8 +222,7 @@
      fi
  }

-
-# You can set these in the environment, or use command-line options,
+# You can set these in the environment, or use command-line options
  # to override their defaults:

  # Any global options for find?
@@ -156,10 +237,13 @@
  # Network (NFS, AFS, RFS, etc.) directories to put in the database.
  : ${NETPATHS=}

-# Directories to not put in the database, which would otherwise be.
+# Default list of directories (overridable with options) to be omitted 
from the
+# database.  Note that /dev and /proc need to be specified 
"redundantly" here,
+# since on Cygwin, they can't be detected based on filesystem type.
  : ${PRUNEPATHS="
  /afs
  /amd
+/dev
  /proc
  /sfs
  /tmp
@@ -167,24 +251,26 @@
  /var/tmp
  "}

-# Trailing slashes result in regex items that are never matched, which
-# is not what the user will expect.   Therefore we now reject such
-# constructs.
+# Trailing slashes result in regex items that are never matched, which is
+# not what the user will expect.  Therefore we now reject such constructs.
+# TBD: Just remove any trailing slashes instead?
  for p in $PRUNEPATHS; do
      case "$p" in
-    /*/)   echo "$0: $p: pruned paths should not contain trailing 
slashes" >&2
+        /*/) stderr "Prune path '$p' has a trailing slash, which isn't 
allowed."
             exit 1
      esac
  done

-# The same, in the form of a regex that find can use.
+# Convert $PRUNEPATHS to a regex that find can use.  Note that to allow 
paths
+# containing spaces, the first -e changes '\ ' to '///' ('//' isn't 
used since
+# it's a semi-common artifact of path concatenation), and then the last -e
+# changes '///' back to ' ' (it doesn't need backslashing in the regex).
  test -z "$PRUNEREGEX" &&
-  PRUNEREGEX=`echo $PRUNEPATHS|sed -e 's,^,\\\(^,' -e 's, 
,$\\\)\\\|\\\(^,g' -e 's,$,$\\\),'`
+  PRUNEREGEX=`echo $PRUNEPATHS | sed -e 's,\\\ ,///,g' -e 's,^,\\\(^,' 
-e 's, ,$\\\)\\\|\\\(^,g' -e 's,$,$\\\),' -e 's,///, ,g'`

-# The database file to build.
-: ${LOCATE_DB=/var/locatedb}
-
-# Directory to hold intermediate files.
+# Directory for sort (& possibly other executables) to hold 
intermediate files.
+# The script's own temporary files go in the same directory as the 
database,
+# since they aren't always temporary (--keeptxt or left-behind lockfile).
  if test -z "$TMPDIR"; then
    if test -d /var/tmp; then
      : ${TMPDIR=/var/tmp}
@@ -217,42 +303,19 @@
  : ${find:=${BINDIR}/find}
  : ${frcode:=${LIBEXECDIR}/frcode}

-make_tempdir () {
-    # This implementation is adapted from the GNU Autoconf manual.
-    {
-        tmp=`
-    (umask 077 && mktemp -d "$TMPDIR/updatedbXXXXXX") 2>/dev/null
-    ` &&
-        test -n "$tmp" && test -d "$tmp"
-    } || {
-    # This method is less secure than mktemp -d, but it's a fallback.
-    #
-    # We use $$ as well as $RANDOM since $RANDOM may not be available.
-    # We also add a time-dependent suffix.  This is actually somewhat
-    # predictable, but then so is $$.  POSIX does not require date to
-    # support +%N.
-    ts=`date +%N%S || date +%S 2>/dev/null`
-        tmp="$TMPDIR"/updatedb"$$"-"${RANDOM:-}${ts}"
-        (umask 077 && mkdir "$tmp")
-    }
-    echo "$tmp"
-}
-
  checkbinary () {
      if test -x "$1" ; then
      : ok
      else
-      eval echo "updatedb needs to be able to execute $1, but cannot." >&2
+        stderr "We need to be able to execute $1, but cannot."
        exit 1
      fi
  }

-for binary in $find $frcode
-do
+for binary in $find $frcode; do
    checkbinary $binary
  done

-
  : ${PRUNEFS="
  9P
  NFS
@@ -283,10 +346,20 @@
  fi

  # Make and code the file list.
-# Sort case insensitively for users' convenience.

-rm -f $LOCATE_DB.n
-trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
+if ! echo test > $LOCATE_DB.n; then
+    stderr "Failed to write to temporary database file $LOCATE_DB.n."
+    exit 1
+fi
+
+# We now write to a temporary text file instead of going direct over a 
pipe, as
+# the latter makes it very difficult to monitor progress and to debug 
failures.
+if ! echo test > $LOCATE_DB.txt; then
+    stderr "Failed to write to text list of files $LOCATE_DB.txt."
+    exit 1
+fi
+
+failed_to_generate_locate_db=0

  if {
  cd "$changeto"
@@ -314,29 +387,43 @@
      exit $?
    else
      # : A4
-    $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune 
\) -o $print_option ||
+            $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" \
+              -prune \) -o $print_option ||
      exit $?
    fi
  fi
-} | $sort | $frcode $frcode_options > $LOCATE_DB.n
+} > $LOCATE_DB.txt
  then
-    : OK so far
-    true
+    # OK, find completed.  Going through all the files is very 
time-consuming
+    # on some systems, so (try to) save a copy of the previous DB in case
+    # something goes wrong at this point.
+    cp -fp $LOCATE_DB $LOCATE_DB.prev
+
+    # Now sort results, case-insensitively for user convenience, then 
generate
+    # the new DB.
+    if ! $sort -f $LOCATE_DB.txt > $LOCATE_DB.txt.sort; then
+        failed_return_value=$?
+        failed_to_generate_locate_db=1
+    elif ! $frcode $frcode_options < $LOCATE_DB.txt.sort > 
$LOCATE_DB.n; then
+        failed_return_value=$?
+        failed_to_generate_locate_db=1
+    fi
  else
-    rv=$?
-    echo "Failed to generate $LOCATE_DB.n" >&2
+    failed_to_generate_locate_db=1
+fi
+
+if [ $failed_to_generate_locate_db -eq 1 ]; then
+    stderr "Failed to generate new database temp file $LOCATE_DB.n."
      rm -f $LOCATE_DB.n
-    exit $rv
+    exit $failed_return_value
  fi

-# To avoid breaking locate while this script is running, put the
+# To avoid breaking locate while this script is running, we put the
  # results in a temp file, then rename it atomically.
  if test -s $LOCATE_DB.n; then
-  chmod 644 ${LOCATE_DB}.n
-  mv ${LOCATE_DB}.n $LOCATE_DB
+    chmod 644 $LOCATE_DB.n
+    mv -f $LOCATE_DB.n $LOCATE_DB
  else
-  echo "updatedb: new database would be empty" >&2
+    stderr "New database would be empty, so not creating it."
    rm -f $LOCATE_DB.n
  fi
-
-exit 0

Dan Harkless
http://harkless.org/dan/

[-- Attachment #2: updatedb.patch --]
[-- Type: text/plain, Size: 16632 bytes --]

--- updatedb.orig	2022-02-05 09:37:55.000000000 -0800
+++ updatedb	2022-02-24 03:27:10.749175300 -0800
@@ -15,13 +15,20 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see <https://www.gnu.org/licenses/>.
 
-# csh original by James Woods; sh conversion by David MacKenzie.
+# csh original by James Woods; sh conversion by David MacKenzie;
+# cleanup and enhancements by Dan Harkless.
 
 #exec 2> /tmp/updatedb-trace.txt
 #set -x
 
+ourname=`basename $0`  # don't verbosely report path to the script in errors
+
+stderr() {
+    echo "$ourname: $*" >&2
+}
+
 version='
-updatedb (GNU findutils) 4.9.0
+updatedb (GNU findutils) 4.9.0+patches
 Copyright (C) 1994-2022 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
@@ -47,11 +54,12 @@
 # (correctly) points to https://www.gnu.org/software/findutils/ instead
 # of the bug reporting page.
 usage="\
-Usage: $0 [--findoptions='-option1 -option2...']
+Usage: $ourname [--findoptions='-option1 -option2...']
        [--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
        [--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
        [--output=dbfile] [--netuser=user] [--localuser=user]
-       [--dbformat] [--version] [--help]
+       [--dbformat=(LOCATE02|slocate)] [--keeptxt=(sort|both)] 
+       [--version] [--help]
 
 Please see also the documentation at https://www.gnu.org/software/findutils/.
 Report (and track progress on fixing) bugs in the updatedb
@@ -61,31 +69,31 @@
 "
 changeto=/
 
-for arg
-do
-  # If we are unable to fork, the back-tick operator will
-  # fail (and the shell will emit an error message).  When
-  # this happens, we exit with error value 71 (EX_OSERR).
-  # Alternative candidate - 75, EX_TEMPFAIL.
-  opt=`echo $arg|sed 's/^\([^=]*\).*/\1/'`  || exit 71
-  val=`echo $arg|sed 's/^[^=]*=\(.*\)/\1/'` || exit 71
-  case "$opt" in
-    --findoptions) FINDOPTIONS="$val" ;;
-    --localpaths) SEARCHPATHS="$val" ;;
-    --netpaths) NETPATHS="$val" ;;
-    --prunepaths) PRUNEPATHS="$val" ;;
-    --prunefs) PRUNEFS="$val" ;;
-    --output) LOCATE_DB="$val" ;;
-    --netuser) NETUSER="$val" ;;
-    --localuser) LOCALUSER="$val" ;;
-    --changecwd)  changeto="$val" ;;
-    --dbformat)   dbformat="$val" ;;
-    --version) fail=0; echo "$version" || fail=1; exit $fail ;;
-    --help)    fail=0; echo "$usage"   || fail=1; exit $fail ;;
-    *) echo "updatedb: invalid option $opt
-Try '$0 --help' for more information." >&2
-       exit 1 ;;
-  esac
+for arg; do
+    # If we are unable to fork, the back-tick operator will
+    # fail (and the shell will emit an error message).  When
+    # this happens, we exit with error value 71 (EX_OSERR).
+    # Alternative candidate - 75, EX_TEMPFAIL.
+    opt=`echo $arg | sed 's/^\([^=]*\).*/\1/'`  || exit 71
+    val=`echo $arg | sed 's/^[^=]*=\(.*\)/\1/'` || exit 71
+    case "$opt" in
+        --findoptions) FINDOPTIONS="$val" ;;
+        --localpaths) SEARCHPATHS="$val" ;;
+        --netpaths) NETPATHS="$val" ;;
+        --prunepaths) PRUNEPATHS="$val" ;;
+        --prunefs) PRUNEFS="$val" ;;
+        --output) LOCATE_DB="$val" ;;
+        --netuser) NETUSER="$val" ;;
+        --localuser) LOCALUSER="$val" ;;
+        --changecwd)  changeto="$val" ;;
+        --dbformat)   dbformat="$val" ;;
+        --keeptxt)     keeptxt="$val" ;;
+        --version) fail=0; echo "$version" >&2 || fail=1; exit $fail ;;
+        --help)    fail=0; echo "$usage"   >&2 || fail=1; exit $fail ;;
+        *) stderr 'Invalid option "'$opt'".'
+           echo "          Try '$ourname --help' for more information." >&2
+           exit 1 ;;
+    esac
 done
 
 frcode_options=""
@@ -100,13 +108,87 @@
         ;;
     *)
         # The "old" database format is no longer supported.
-        echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
-        echo "LOCATE02, slocate" >&2
+        stderr 'Unsupported locate database format "'$dbformat'".'
+        echo '          Supported formats are "LOCATE02" or "slocate".' >&2
         exit 1
 esac
 
+# The database file to build (overridable via commandline or environment var.).
+: ${LOCATE_DB=/var/locatedb}
+LOCATE_DB_DIR=`dirname $LOCATE_DB`
+
+# Prevent overlapping with ourselves.  Large filesystem collections can easily 
+# take over 24 hours to complete, even on pretty speedy systems / hard drives.
+# Ideally this would go in /var/run on systems that have that, but this is OK.
+lockfile=$LOCATE_DB.running_updatedb_pid
+
+if [ -e $lockfile ]; then
+    stderr "Aborting since prior run's lockfile still exists:"
+    ls -lF $lockfile >&2
+    exit 1
+fi
+
+keeptxt=neither
+reported_lockfile_failure=0
+
+cleanup_on_exit_or_signal() {
+    rm -f $LOCATE_DB.n
+
+    if [ $reported_lockfile_failure -ne 1 ]; then
+        # We didn't already have a failure trying to initially create the 
+        # lockfile, so we can assume the temporary .txt files are ours (not 
+        # saved on a previous run with --keeptxt), and it's safe to 
+        # (optionally) delete them.
+        if [ x"$keeptxt" = x"sort" ]; then
+            rm -f $LOCATE_DB.txt
+        elif [ x"$keeptxt" != x"both" ]; then
+            # TBD: Report undefined values of --keeptxt?
+            rm -f $LOCATE_DB.txt $LOCATE_DB.txt.sort
+        fi
+    fi
 
-if true
+    if ! rm -f $lockfile; then
+        report_lockfile_failure "remove"
+    fi
+}
+
+report_lockfile_failure() {
+    if [ $reported_lockfile_failure -ne 1 ]; then
+        echo -n "$ourname: Failed to $1 lockfile $lockfile" >&2
+
+        if [ -e $lockfile ]; then
+            echo ":" >&2
+            ls -lF $lockfile >&2
+        else
+            echo " in dir:" >&2
+            ls -dlF $LOCATE_DB_DIR >&2
+        fi
+
+        reported_lockfile_failure=1
+    fi
+}
+
+# Now that we've checked for a previous lockfile above, it's safe to install 
+# cleanup signal handler.  We'll try to catch all potentially fatal signals, 
+# along with exit.  From CentOS 7's /usr/include/asm/signal.h:
+#[shell exit=0] SIGHUP=1        SIGINT=2        SIGQUIT=3       SIGILL=4
+#SIGTRAP=5      SIGABRT=6       SIGIOT=6        SIGBUS=7        SIGFPE=8
+#SIGKILL=9      SIGUSR1=10      SIGSEGV=11      SIGUSR2=12      SIGPIPE=13
+#SIGALRM=14     SIGTERM=15      SIGSTKFLT=16    SIGCHLD=17      SIGCONT=18
+#SIGSTOP=19     SIGTSTP=20      SIGTTIN=21      SIGTTOU=22      SIGURG=23
+#SIGXCPU=24     SIGXFSZ=25      SIGVTALRM=26    SIGPROF=27      SIGWINCH=28
+#SIGIO=29       SIGPOLL=SIGIO   SIGLOST=29      SIGPWR=30       SIGSYS=31
+trap cleanup_on_exit_or_signal 0 2 3 4 6 7 8 9 11 15 16 30 31 
+
+# Now that we've installed the signal handler, it's safe to create lockfile.
+if ! echo $$ > $lockfile; then
+    report_lockfile_failure "write to"
+    exit 1
+fi
+
+# Don't use NUL as a path separator, now that we write to a temporary text file
+# (may want to make that controllable with a commandline option in the future).
+if false
 then
     sort="/usr/bin/sort -z"
     print_option="-print0"
@@ -123,25 +205,24 @@
     id | cut -d'(' -f 1 | cut -d'=' -f2
 }
 
-# figure out if su supports the -s option
+# Figure out if su supports the -s option.
 select_shell() {
-    if su "$1" -s $SHELL -c false < /dev/null  ; then
-	# No.
-	echo ""
+    if su "$1" -s $SHELL -c false < /dev/null; then
+        # No.
+        echo ""
     else
-	if su "$1" -s $SHELL -c true < /dev/null  ; then
-	    # Yes.
-	    echo "-s $SHELL"
+        if su "$1" -s $SHELL -c true < /dev/null; then
+            # Yes.
+            echo "-s $SHELL"
         else
-	    # su is unconditionally failing.  We won't be able to
-	    # figure out what is wrong, so be conservative.
-	    echo ""
-	fi
+            # su is unconditionally failing.  We won't be able to
+            # figure out what is wrong, so be conservative.
+            echo ""
+        fi
     fi
 }
 
-
-# You can set these in the environment, or use command-line options,
+# You can set these in the environment, or use command-line options
 # to override their defaults:
 
 # Any global options for find?
@@ -156,10 +237,13 @@
 # Network (NFS, AFS, RFS, etc.) directories to put in the database.
 : ${NETPATHS=}
 
-# Directories to not put in the database, which would otherwise be.
+# Default list of directories (overridable with options) to be omitted from the
+# database.  Note that /dev and /proc need to be specified "redundantly" here,
+# since on Cygwin, they can't be detected based on filesystem type.
 : ${PRUNEPATHS="
 /afs
 /amd
+/dev
 /proc
 /sfs
 /tmp
@@ -167,32 +251,34 @@
 /var/tmp
 "}
 
-# Trailing slashes result in regex items that are never matched, which
-# is not what the user will expect.   Therefore we now reject such
-# constructs.
+# Trailing slashes result in regex items that are never matched, which is
+# not what the user will expect.  Therefore we now reject such constructs.
+# TBD: Just remove any trailing slashes instead?
 for p in $PRUNEPATHS; do
     case "$p" in
-	/*/)   echo "$0: $p: pruned paths should not contain trailing slashes" >&2
-	       exit 1
+        /*/) stderr "Prune path '$p' has a trailing slash, which isn't allowed."
+             exit 1
     esac
 done
 
-# The same, in the form of a regex that find can use.
+# Convert $PRUNEPATHS to a regex that find can use.  Note that to allow paths
+# containing spaces, the first -e changes '\ ' to '///' ('//' isn't used since
+# it's a semi-common artifact of path concatenation), and then the last -e
+# changes '///' back to ' ' (it doesn't need backslashing in the regex).
 test -z "$PRUNEREGEX" &&
-  PRUNEREGEX=`echo $PRUNEPATHS|sed -e 's,^,\\\(^,' -e 's, ,$\\\)\\\|\\\(^,g' -e 's,$,$\\\),'`
+  PRUNEREGEX=`echo $PRUNEPATHS | sed -e 's,\\\ ,///,g' -e 's,^,\\\(^,' -e 's, ,$\\\)\\\|\\\(^,g' -e 's,$,$\\\),' -e 's,///, ,g'`
 
-# The database file to build.
-: ${LOCATE_DB=/var/locatedb}
-
-# Directory to hold intermediate files.
+# Directory for sort (& possibly other executables) to hold intermediate files.
+# The script's own temporary files go in the same directory as the database,
+# since they aren't always temporary (--keeptxt or left-behind lockfile).
 if test -z "$TMPDIR"; then
-  if test -d /var/tmp; then
-    : ${TMPDIR=/var/tmp}
-  elif test -d /usr/tmp; then
-    : ${TMPDIR=/usr/tmp}
-  else
-    : ${TMPDIR=/tmp}
-  fi
+    if test -d /var/tmp; then
+        : ${TMPDIR=/var/tmp}
+    elif test -d /usr/tmp; then
+        : ${TMPDIR=/usr/tmp}
+    else
+        : ${TMPDIR=/tmp}
+    fi
 fi
 export TMPDIR
 
@@ -200,14 +286,14 @@
 : ${NETUSER=daemon}
 
 # The directory containing the subprograms.
-if test -n "$LIBEXECDIR" ; then
+if test -n "$LIBEXECDIR"; then
     : LIBEXECDIR already set, do nothing
 else
     : ${LIBEXECDIR=/usr/libexec}
 fi
 
 # The directory containing find.
-if test -n "$BINDIR" ; then
+if test -n "$BINDIR"; then
     : BINDIR already set, do nothing
 else
     : ${BINDIR=/usr/bin}
@@ -217,42 +303,19 @@
 : ${find:=${BINDIR}/find}
 : ${frcode:=${LIBEXECDIR}/frcode}
 
-make_tempdir () {
-    # This implementation is adapted from the GNU Autoconf manual.
-    {
-        tmp=`
-    (umask 077 && mktemp -d "$TMPDIR/updatedbXXXXXX") 2>/dev/null
-    ` &&
-        test -n "$tmp" && test -d "$tmp"
-    } || {
-	# This method is less secure than mktemp -d, but it's a fallback.
-	#
-	# We use $$ as well as $RANDOM since $RANDOM may not be available.
-	# We also add a time-dependent suffix.  This is actually somewhat
-	# predictable, but then so is $$.  POSIX does not require date to
-	# support +%N.
-	ts=`date +%N%S || date +%S 2>/dev/null`
-        tmp="$TMPDIR"/updatedb"$$"-"${RANDOM:-}${ts}"
-        (umask 077 && mkdir "$tmp")
-    }
-    echo "$tmp"
-}
-
-checkbinary () {
-    if test -x "$1" ; then
-	: ok
+checkbinary() {
+    if test -x "$1"; then
+        : ok
     else
-      eval echo "updatedb needs to be able to execute $1, but cannot." >&2
-      exit 1
+        stderr "We need to be able to execute $1, but cannot."
+        exit 1
     fi
 }
 
-for binary in $find $frcode
-do
-  checkbinary $binary
+for binary in $find $frcode; do
+    checkbinary $binary
 done
 
-
 : ${PRUNEFS="
 9P
 NFS
@@ -276,67 +339,91 @@
 "}
 
 if test -n "$PRUNEFS"; then
-prunefs_exp=`echo $PRUNEFS |sed -e 's/\([^ ][^ ]*\)/-o -fstype \1/g' \
- -e 's/-o //' -e 's/$/ -o/'`
+    prunefs_exp=`echo $PRUNEFS | sed -e 's/\([^ ][^ ]*\)/-o -fstype \1/g' \
+      -e 's/-o //' -e 's/$/ -o/'`
 else
-  prunefs_exp=''
+    prunefs_exp=''
 fi
 
 # Make and code the file list.
-# Sort case insensitively for users' convenience.
 
-rm -f $LOCATE_DB.n
-trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
+if ! echo test > $LOCATE_DB.n; then
+    stderr "Failed to write to temporary database file $LOCATE_DB.n."
+    exit 1
+fi
 
-if {
-cd "$changeto"
-if test -n "$SEARCHPATHS"; then
-  if [ "$LOCALUSER" != "" ]; then
-    # : A1
-    su $LOCALUSER `select_shell $LOCALUSER` -c \
-    "$find $SEARCHPATHS $FINDOPTIONS \
-     \\( $prunefs_exp \
-     -type d -regex '$PRUNEREGEX' \\) -prune -o $print_option"
-  else
-    # : A2
-    $find $SEARCHPATHS $FINDOPTIONS \
-     \( $prunefs_exp \
-     -type d -regex "$PRUNEREGEX" \) -prune -o $print_option
-  fi
-fi
-
-if test -n "$NETPATHS"; then
-myuid=`getuid`
-if [ "$myuid" = 0 ]; then
-    # : A3
-    su $NETUSER `select_shell $NETUSER` -c \
-     "$find $NETPATHS $FINDOPTIONS \\( -type d -regex '$PRUNEREGEX' -prune \\) -o $print_option" ||
-    exit $?
-  else
-    # : A4
-    $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o $print_option ||
-    exit $?
-  fi
+# We now write to a temporary text file instead of going direct over a pipe, as
+# the latter makes it very difficult to monitor progress and to debug failures.
+if ! echo test > $LOCATE_DB.txt; then
+    stderr "Failed to write to text list of files $LOCATE_DB.txt."
+    exit 1
 fi
-} | $sort | $frcode $frcode_options > $LOCATE_DB.n
+
+failed_to_generate_locate_db=0
+
+if {
+    cd "$changeto"
+    if test -n "$SEARCHPATHS"; then
+        if [ "$LOCALUSER" != "" ]; then
+            # : A1
+            su $LOCALUSER `select_shell $LOCALUSER` -c \
+              "$find $SEARCHPATHS $FINDOPTIONS \
+              \\( $prunefs_exp \
+              -type d -regex '$PRUNEREGEX' \\) -prune -o $print_option"
+        else
+            # : A2
+            $find $SEARCHPATHS $FINDOPTIONS \
+              \( $prunefs_exp \
+              -type d -regex "$PRUNEREGEX" \) -prune -o $print_option
+        fi
+    fi
+
+    if test -n "$NETPATHS"; then
+        myuid=`getuid`
+        if [ "$myuid" = 0 ]; then
+            # : A3
+            su $NETUSER `select_shell $NETUSER` -c \
+              "$find $NETPATHS $FINDOPTIONS \\( -type d -regex '$PRUNEREGEX' -prune \\) -o $print_option" ||
+              exit $?
+        else
+            # : A4
+            $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" \
+              -prune \) -o $print_option ||
+            exit $?
+        fi
+    fi
+} > $LOCATE_DB.txt
 then
-    : OK so far
-    true
+    # OK, find completed.  Going through all the files is very time-consuming 
+    # on some systems, so (try to) save a copy of the previous DB in case
+    # something goes wrong at this point.
+    cp -fp $LOCATE_DB $LOCATE_DB.prev
+
+    # Now sort results, case-insensitively for user convenience, then generate
+    # the new DB.
+    if ! $sort -f $LOCATE_DB.txt > $LOCATE_DB.txt.sort; then
+        failed_return_value=$?
+        failed_to_generate_locate_db=1
+    elif ! $frcode $frcode_options < $LOCATE_DB.txt.sort > $LOCATE_DB.n; then
+        failed_return_value=$?
+        failed_to_generate_locate_db=1
+    fi
 else
-    rv=$?
-    echo "Failed to generate $LOCATE_DB.n" >&2
+    failed_to_generate_locate_db=1
+fi
+
+if [ $failed_to_generate_locate_db -eq 1 ]; then
+    stderr "Failed to generate new database temp file $LOCATE_DB.n."
     rm -f $LOCATE_DB.n
-    exit $rv
+    exit $failed_return_value
 fi
 
-# To avoid breaking locate while this script is running, put the
+# To avoid breaking locate while this script is running, we put the
 # results in a temp file, then rename it atomically.
 if test -s $LOCATE_DB.n; then
-  chmod 644 ${LOCATE_DB}.n
-  mv ${LOCATE_DB}.n $LOCATE_DB
+    chmod 644 $LOCATE_DB.n
+    mv -f $LOCATE_DB.n $LOCATE_DB
 else
-  echo "updatedb: new database would be empty" >&2
-  rm -f $LOCATE_DB.n
+    stderr "New database would be empty, so not creating it."
+    rm -f $LOCATE_DB.n
 fi
-
-exit 0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc.
  2022-02-24 16:32           ` Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc Dan Harkless
@ 2022-02-27 11:54             ` Bernhard Voelker
  2022-02-27 12:06               ` Dan Harkless
  0 siblings, 1 reply; 8+ messages in thread
From: Bernhard Voelker @ 2022-02-27 11:54 UTC (permalink / raw)
  To: Dan Harkless, bug-findutils, cygwin

On 2/24/22 17:32, Dan Harkless wrote:
> I'm finally getting around to sending in a patch (to bug-findutils and 
> the Cygwin list, to which I'm currently subscribed) to address these 
> issues, along with some others, a few of which represent small changes 
> in behavior:

Thanks for the patch ... but:

a) The patch does not cleanly apply:

  ~/findutils/locate> patch -t < /tmp/updatedb.patch
  patching file updatedb
  Reversed (or previously applied) patch detected!  Assuming -R.
  Hunk #2 succeeded at 47 with fuzz 2.
  Hunk #7 FAILED at 167.
  Hunk #8 FAILED at 202.
  Hunk #9 succeeded at 217 (offset -2 lines).
  Hunk #10 succeeded at 276 (offset -2 lines).
  2 out of 10 hunks FAILED -- saving rejects to file updatedb.rej

b) The patch changes the file 'updatedb' which is created at build time
instead of the file 'updatedb.sh' which is under version control.

c) The description says that there are 10 more or less non-trivial
changes in it.  A squashed diff of 500 lines on a file with 342 lines
makes reviewing and discussing of each topic impossible.

Would you mind re-sending as separate Git patches?

Have a nice day,
Berny

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc.
  2022-02-27 11:54             ` Bernhard Voelker
@ 2022-02-27 12:06               ` Dan Harkless
  0 siblings, 0 replies; 8+ messages in thread
From: Dan Harkless @ 2022-02-27 12:06 UTC (permalink / raw)
  To: Bernhard Voelker, bug-findutils, cygwin

On 2/27/2022 3:54 AM, Bernhard Voelker wrote:
> On 2/24/22 17:32, Dan Harkless wrote:
>> I'm finally getting around to sending in a patch (to bug-findutils and
>> the Cygwin list, to which I'm currently subscribed) to address these
>> issues, along with some others, a few of which represent small changes
>> in behavior:
> Thanks for the patch ... but:

Thanks for taking a look at it.

> a) The patch does not cleanly apply:
>
>    ~/findutils/locate> patch -t < /tmp/updatedb.patch
>    patching file updatedb
>    Reversed (or previously applied) patch detected!  Assuming -R.
>    Hunk #2 succeeded at 47 with fuzz 2.
>    Hunk #7 FAILED at 167.
>    Hunk #8 FAILED at 202.
>    Hunk #9 succeeded at 217 (offset -2 lines).
>    Hunk #10 succeeded at 276 (offset -2 lines).
>    2 out of 10 hunks FAILED -- saving rejects to file updatedb.rej

Ah.  As I mentioned, my patch was against Cygwin's findutils 4.9.0-1, 
and since my Linux systems use a different version of locate, I hadn't 
tested there (nor did I have time to look at the original 4.9.0 
source).  I'd been hoping any Cygwin patches wouldn't invalidate it; pity.

> b) The patch changes the file 'updatedb' which is created at build time
> instead of the file 'updatedb.sh' which is under version control.

Gotcha.

> c) The description says that there are 10 more or less non-trivial
> changes in it.  A squashed diff of 500 lines on a file with 342 lines
> makes reviewing and discussing of each topic impossible.

Impossible?  Since the bulk of the changes are spacing changes to 
standardize code indentation across the file, I was hoping the separate 
'diff -uw' listing would be sufficient to enable easy review and discussion.

> Would you mind re-sending as separate Git patches?

It'll be awhile before I'll have time to get set up to do that, but will do.

> Have a nice day,
> Berny

Thanks again,
Dan Harkless
http://harkless.org/dan/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-27 12:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <986736274.144968.1630167325057.ref@mail.yahoo.com>
2021-08-28 16:15 ` updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided Dan Harkless
2021-08-28 16:23   ` Dan Harkless
2021-08-29 11:02     ` Hans-Bernhard Bröker
2021-08-29 12:06       ` Dan Harkless
2021-08-30  0:06         ` Brian Inglis
2022-02-24 16:32           ` Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc Dan Harkless
2022-02-27 11:54             ` Bernhard Voelker
2022-02-27 12:06               ` Dan Harkless

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).