public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Dan Harkless <cygwin-list21@harkless.org>
To: "cygwin@cygwin.com" <cygwin@cygwin.com>
Cc: bug-findutils@gnu.org
Subject: Re: updatedb broken as of findutils 4.8.0-1 due to bigram.exe no longer being provided
Date: Sun, 29 Aug 2021 05:06:37 -0700	[thread overview]
Message-ID: <525a832a-78fd-5a32-e195-5747120da922@harkless.org> (raw)
In-Reply-To: <3457cee1-18b5-2916-adee-afdfaf9769ea@t-online.de>

On 8/29/2021 4:02 AM, Hans-Bernhard Bröker wrote:
> Am 28.08.2021 um 18:23 schrieb Dan Harkless:
>> Looks like it's because in findutils 4.8.0-1, the bigram.exe program 
>> is no longer provided, but the /usr/bin/updatedb script (still) 
>> depends on it being there:
>      [...]
>>      + for binary in $find $frcode $bigram $code
>>      + checkbinary /usr/libexec/frcode
>
> The version of updatedb in the 4.8.0-1 package does not actually 
> contain those lines.  Mention of both $bigram and $code has been 
> removed from the loop construct (and from everywhere else in the script).
>
> That's because the old format of find databases, which is the only one 
> actually using bigram and code, was removed from updatedb as of 
> findutils version 4.7, so there really cannot be a need for the bigram 
> tool any more.

Argh!  So sorry for the false report!  I completely forgot that years 
back I had made a locally patched version (which is earlier in my path) 
of Cygwin updatedb 4.6.0-1 to troubleshoot and work around problems I 
was having with the tool.

I have 12M+ pathnames on my main Windows system, and I suddenly started 
having issues with the updatedb going from taking less than an hour, to 
taking more than 24 hours, and running into the next job.

It was very awkward to try to troubleshoot what was going on without a 
'find' log to 'tail', so I patched my  local copy of updatedb to write 
to an intermediate file, rather than going direct to 'sort' over a pipe.

Another problem I was having was that though I have 24 GB of RAM on my 
system, I would get low-memory popup warnings from the OS when the sort 
would go off.  (The warnings mislay the blame on Firefox, because I 
usually have big sessions running that take even more RAM than the sort.)

I was hoping running sort on a _file_ rather than stdin might allow it 
to reduce the RAM use enough to not get the warning, but unfortunately 
(and unsurprisingly) I still get it with the intermediate file.  This is 
just a warning, though — I haven't had it actually run out of RAM so 
far, I don't think.

The final problem I was addressing in my patched version was some 
missing error-checking, which was causing me to be left with _no_ 
filename DB, when the update would fail, rather than at least being left 
with the one from last time.

I could send along my patches, but I don't know that I've solved these 
issues in a general enough way.  For instance, my 12 million+ pathnames 
come out to about 1.4 GiB of UNIX-linefeed-separated UTF-8 strings.  
Writing that much to my HD is not a concern, but obviously some people 
might not want to write that much every time to, say, a small 
flash-based device.

Thoughts?

-- 
Dan Harkless



  reply	other threads:[~2021-08-29 12:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <986736274.144968.1630167325057.ref@mail.yahoo.com>
2021-08-28 16:15 ` Dan Harkless
2021-08-28 16:23   ` Dan Harkless
2021-08-29 11:02     ` Hans-Bernhard Bröker
2021-08-29 12:06       ` Dan Harkless [this message]
2021-08-30  0:06         ` Brian Inglis
2022-02-24 16:32           ` Patches to findutils 4.9.0-1's updatedb to do locking, allow filenames with spaces & progress monitoring, exclude /dev on Cygwin, etc Dan Harkless
2022-02-27 11:54             ` Bernhard Voelker
2022-02-27 12:06               ` Dan Harkless

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525a832a-78fd-5a32-e195-5747120da922@harkless.org \
    --to=cygwin-list21@harkless.org \
    --cc=bug-findutils@gnu.org \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).