public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* locate and updatedb
@ 2016-02-11 14:01 Byron Boulton
  2016-02-11 18:17 ` cyg Simple
  0 siblings, 1 reply; 10+ messages in thread
From: Byron Boulton @ 2016-02-11 14:01 UTC (permalink / raw)
  To: cygwin

Does anyone here have success using `updatedb` and `locate` in cygwin? I 
use `locate` heavily on my Linux machines, but everytime I've tried to 
run `updatedb` on cygwin I've given up and killed the process because it 
is taking too long. Is there something wrong with cygwin's 
implementation of `updatedb` making it not work at all or making it 
slower that on my Linux machines? Or are there others who have success 
using it on cygwin?

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-11 14:01 locate and updatedb Byron Boulton
@ 2016-02-11 18:17 ` cyg Simple
  2016-02-11 18:34   ` Byron Boulton
  0 siblings, 1 reply; 10+ messages in thread
From: cyg Simple @ 2016-02-11 18:17 UTC (permalink / raw)
  To: cygwin

On 2/11/2016 9:00 AM, Byron Boulton wrote:
> Does anyone here have success using `updatedb` and `locate` in cygwin? I
> use `locate` heavily on my Linux machines, but everytime I've tried to
> run `updatedb` on cygwin I've given up and killed the process because it
> is taking too long. Is there something wrong with cygwin's
> implementation of `updatedb` making it not work at all or making it
> slower that on my Linux machines? Or are there others who have success
> using it on cygwin?

Processing every file on the drive will be slow just because it's
Windows.  Initializing the database with updatedb will require a large
amount of time.  There are processes such as AntiVirus intrusion
protection that might make it even slower.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-11 18:17 ` cyg Simple
@ 2016-02-11 18:34   ` Byron Boulton
  2016-02-11 22:39     ` Marco Atzeri
  0 siblings, 1 reply; 10+ messages in thread
From: Byron Boulton @ 2016-02-11 18:34 UTC (permalink / raw)
  To: cygwin

On 2/11/2016 1:18 PM, cyg Simple wrote:
> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>> Does anyone here have success using `updatedb` and `locate` in cygwin? I
>> use `locate` heavily on my Linux machines, but everytime I've tried to
>> run `updatedb` on cygwin I've given up and killed the process because it
>> is taking too long. Is there something wrong with cygwin's
>> implementation of `updatedb` making it not work at all or making it
>> slower that on my Linux machines? Or are there others who have success
>> using it on cygwin?
>
> Processing every file on the drive will be slow just because it's
> Windows.  Initializing the database with updatedb will require a large
> amount of time.  There are processes such as AntiVirus intrusion
> protection that might make it even slower.
>
Hmmm, the reason the slowness is particuarly strange to me is that in 
place of using `locate` from my cygwin terminal, I have to use a program 
called "Everything Search Engine" available at www.voidtools.com. The 
first time I install it, it takes maybe a few minutes to index the hard 
drive, then every once in a while when I open the program it takes a few 
seconds to update the index, but in general the performance for indexing 
and searching the index if comparable to `updatedb` and `locate` on a 
Linux machine, so it's possible to do on Windows.

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-11 18:34   ` Byron Boulton
@ 2016-02-11 22:39     ` Marco Atzeri
  2016-02-13 12:15       ` Linda Walsh
  0 siblings, 1 reply; 10+ messages in thread
From: Marco Atzeri @ 2016-02-11 22:39 UTC (permalink / raw)
  To: cygwin

On 11/02/2016 19:33, Byron Boulton wrote:
> On 2/11/2016 1:18 PM, cyg Simple wrote:
>> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>>> Does anyone here have success using `updatedb` and `locate` in cygwin? I
>>> use `locate` heavily on my Linux machines, but everytime I've tried to
>>> run `updatedb` on cygwin I've given up and killed the process because it
>>> is taking too long. Is there something wrong with cygwin's
>>> implementation of `updatedb` making it not work at all or making it
>>> slower that on my Linux machines? Or are there others who have success
>>> using it on cygwin?
>>
>> Processing every file on the drive will be slow just because it's
>> Windows.  Initializing the database with updatedb will require a large
>> amount of time.  There are processes such as AntiVirus intrusion
>> protection that might make it even slower.
>>
> Hmmm, the reason the slowness is particuarly strange to me is that in
> place of using `locate` from my cygwin terminal, I have to use a program
> called "Everything Search Engine" available at www.voidtools.com. The
> first time I install it, it takes maybe a few minutes to index the hard
> drive, then every once in a while when I open the program it takes a few
> seconds to update the index, but in general the performance for indexing
> and searching the index if comparable to `updatedb` and `locate` on a
> Linux machine, so it's possible to do on Windows.
>
> Byron
>

the time taken from updatedb is mainly due to
the execution time of "find" on the disks.

It takes ~ 70 minutes for my 500 GB of data,
and likely the AV is impacting the execution.

I suspect voidtools is using MS disk indexing
to speed up the things for it.


Regards
Marco




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-11 22:39     ` Marco Atzeri
@ 2016-02-13 12:15       ` Linda Walsh
  2016-02-16 22:55         ` Buchbinder, Barry (NIH/NIAID) [E]
  0 siblings, 1 reply; 10+ messages in thread
From: Linda Walsh @ 2016-02-13 12:15 UTC (permalink / raw)
  To: cygwin

Marco Atzeri wrote:
> On 11/02/2016 19:33, Byron Boulton wrote:
>> On 2/11/2016 1:18 PM, cyg Simple wrote:
>>> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>>>> Does anyone here have success using `updatedb` and `locate` in 
>>>> cygwin? I
>>>> use `locate` heavily on my Linux machines, but everytime I've tried to
>>>> run `updatedb` on cygwin I've given up and killed the process 
>>>> because it
>>>> is taking too long.
---
	There's a reason why on linux it is usually set to run
when you are asleep.  ;-)

>>>>  Is there something wrong with cygwin's
>>>> implementation of `updatedb` making it not work at all or making it
>>>> slower that on my Linux machines? Or are there others who have success
>>>> using it on cygwin?

But it might have to do with disk speed and memory.  Laptop drives
are usually among the slowest.


I ran it just now (this is with MS's Home Essentials
real-time protection turned on).

law.Bliss/bin> time index_files.sh
670592 (process ID) old priority 0, new priority 19
44.21sec 15.06usr 28.30sys (98.09% cpu)
> locate / >/tmp/all
> wc /tmp/all
  1479146   4014375 133322318 /tmp/all
> df .
Filesystem      Size  Used Avail Use% Mounted on
C:              949G  585G  365G  62% /
----
 

So ~1.4 million files... Using the following exclusions:

---(index_files.sh)----
renice +19 $$
Local="/"
if [[ -d /windows/sysnative/. ]]; then 
  Local+=" /windows/sysnative/."
fi
Prunepaths='/.usr /proc /C /B /H /I /M /D /P /System[[:space:]]Volume[[:space:]]Information /Windows/CSC /pagefile.sys /Music /Pictures /Share /Media /home /Doc /$RECYCLE.BIN /cygdrive'

/bin/updatedb --findoptions=-noleaf  --localpaths="$Local" --prunepaths="$Prunepaths" --netpaths="$Net"
----
Most of those pruned files are pruned either due to redundancy
or being on a local network server...

That's fairly fast vs. the MS-Home Essentials, full malware
scan I run once a week that takes ~ 8-16 hours (It scans a 
few of my network directories,as well).






>>>
>>> Processing every file on the drive will be slow just because it's
>>> Windows.  Initializing the database with updatedb will require a large
>>> amount of time.  There are processes such as AntiVirus intrusion
>>> protection that might make it even slower.
>>>
>> Hmmm, the reason the slowness is particuarly strange to me is that in
>> place of using `locate` from my cygwin terminal, I have to use a program
>> called "Everything Search Engine" available at www.voidtools.com. The
>> first time I install it, it takes maybe a few minutes to index the hard
>> drive, then every once in a while when I open the program it takes a few
>> seconds to update the index, but in general the performance for indexing
>> and searching the index if comparable to `updatedb` and `locate` on a
>> Linux machine, so it's possible to do on Windows.
>>
>> Byron
>>
> 
> the time taken from updatedb is mainly due to
> the execution time of "find" on the disks.
> 
> It takes ~ 70 minutes for my 500 GB of data,
> and likely the AV is impacting the execution.
> 
> I suspect voidtools is using MS disk indexing
> to speed up the things for it.
> 
> 
> Regards
> Marco
> 
> 
> 
> 
> -- 
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> 

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: locate and updatedb
  2016-02-13 12:15       ` Linda Walsh
@ 2016-02-16 22:55         ` Buchbinder, Barry (NIH/NIAID) [E]
  2016-02-17 13:43           ` Byron Boulton
  0 siblings, 1 reply; 10+ messages in thread
From: Buchbinder, Barry (NIH/NIAID) [E] @ 2016-02-16 22:55 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 4771 bytes --]

Linda Walsh sent the following at Saturday, February 13, 2016 7:15 AM
>Marco Atzeri wrote: ---
>> On 11/02/2016 19:33, Byron Boulton wrote:
>>> On 2/11/2016 1:18 PM, cyg Simple wrote:
>>>> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>>>>> Does anyone here have success using `updatedb` and `locate` in
>>>>> cygwin? I use `locate` heavily on my Linux machines, but everytime
>>>>> I've tried to run `updatedb` on cygwin I've given up and killed the
>>>>> process because it is taking too long.
> There's a reason why on linux it is usually set to run when you are asleep.  ;-)
>
>>>>>  Is there something wrong with cygwin's implementation of
>>>>> `updatedb` making it not work at all or making it slower that on my
>>>>> Linux machines? Or are there others who have success using it on
>>>>> cygwin?
>
>But it might have to do with disk speed and memory. Laptop drives are
>usually among the slowest.
>
>I ran it just now (this is with MS's Home Essentials real-time
>protection turned on).
>> locate / >/tmp/all
>> wc /tmp/all
>  1479146   4014375 133322318 /tmp/all
>> df .
>
>law.Bliss/bin> time index_files.sh 670592 (process ID) old priority 0,
>new priority 19 44.21sec 15.06usr 28.30sys (98.09% cpu) Filesystem Size
>Used Avail Use% Mounted on C: 949G 585G 365G 62% / ----
>
>So ~1.4 million files... Using the following exclusions:
>  Local+=" /windows/sysnative/."
>
>---(index_files.sh)---- renice +19 $$ Local="/" if [[ -d
>/windows/sysnative/. ]]; then fi Prunepaths='/.usr /proc /C /B /H /I
>/M /D /P /System[[:space:]]Volume[[:space:]]Information /Windows/CSC
>/pagefile.sys /Music /Pictures /Share /Media /home /Doc /$RECYCLE.BIN
>/cygdrive'
>
>/bin/updatedb --findoptions=-noleaf --localpaths="$Local"
>--prunepaths="$Prunepaths" --netpaths="$Net" ---- Most of those pruned
>files are pruned either due to redundancy or being on a local network
>server...
>
>That's fairly fast vs. the MS-Home Essentials, full malware scan I
>run once a week that takes ~ 8-16 hours (It scans a few of my network
>directories,as well).
>
>>>> Processing every file on the drive will be slow just because it's
>>>> Windows.  Initializing the database with updatedb will require a large
>>>> amount of time.  There are processes such as AntiVirus intrusion
>>>> protection that might make it even slower.
>>>>
>>> Hmmm, the reason the slowness is particuarly strange to me is that in
>>> place of using `locate` from my cygwin terminal, I have to use a program
>>> called "Everything Search Engine" available at www.voidtools.com. The
>>> first time I install it, it takes maybe a few minutes to index the hard
>>> drive, then every once in a while when I open the program it takes a few
>>> seconds to update the index, but in general the performance for indexing
>>> and searching the index if comparable to `updatedb` and `locate` on a
>>> Linux machine, so it's possible to do on Windows.
>>>
>>> Byron
>>>
>>
>> the time taken from updatedb is mainly due to
>> the execution time of "find" on the disks.
>>
>> It takes ~ 70 minutes for my 500 GB of data,
>> and likely the AV is impacting the execution.
>>
>> I suspect voidtools is using MS disk indexing
>> to speed up the things for it.

This is technically OT since this involved a non-cygwin tool.

find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).

Compare find with cmd.exe's dir.  Note that even with the benefit of
caching (compare the 1st and 3rd times), find takes twice as long as dir.
Comparing cached times (2nd vs 3rd), dir is 3X faster.

$ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \
time find /c/usr > /dev/null ; \
time cmd /c dir /s /b 'C:\usr' > /dev/null

real    0m1.326s
user    0m0.000s
sys     0m0.047s

real    0m2.465s
user    0m0.280s
sys     0m2.184s

real    0m0.874s
user    0m0.000s
sys     0m0.031s

(Note: c:\usr has nothing to do with /usr.)

Here's how I use dir *in the abstract* for drives C: and D:.  (Note: the
/a: option of dir lists all files, including hidden ones; /o:n sorts by
name.)

for D in /c /d
do
    "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
done | \
tr -s '\r\n' '\n' | \
cygpath -u -f - | \
sed -e '/^$/d' -e 's,/\+,/,g' \
sort -u \
/usr/libexec/frcode > /tmp/updatedb.tmp
chmod --reference /var/locatedb /tmp/updatedb.tmp
mv /tmp/updatedb.tmp /var/locatedb

What I actually do (attached) is more complicated.  My script chooses
which directories are scanned, does them in parallel, and prints pretty
messages.  I get error message for very long paths (> ~250 bytes).  It
works well enough for me; YMMV.

- Barry
  Disclaimer: Statements made herein are not made on behalf of NIAID.


[-- Attachment #2: updatedb.sh --]
[-- Type: application/octet-stream, Size: 6116 bytes --]

[-- Attachment #3: Type: text/plain, Size: 218 bytes --]

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-16 22:55         ` Buchbinder, Barry (NIH/NIAID) [E]
@ 2016-02-17 13:43           ` Byron Boulton
  2016-02-17 16:01             ` Buchbinder, Barry (NIH/NIAID) [E]
  0 siblings, 1 reply; 10+ messages in thread
From: Byron Boulton @ 2016-02-17 13:43 UTC (permalink / raw)
  To: cygwin

On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
> Linda Walsh sent the following at Saturday, February 13, 2016 7:15 AM
>> Marco Atzeri wrote: ---
>>> On 11/02/2016 19:33, Byron Boulton wrote:
>>>> On 2/11/2016 1:18 PM, cyg Simple wrote:
>>>>> On 2/11/2016 9:00 AM, Byron Boulton wrote:
>>>>>> Does anyone here have success using `updatedb` and `locate` in
>>>>>> cygwin? I use `locate` heavily on my Linux machines, but everytime
>>>>>> I've tried to run `updatedb` on cygwin I've given up and killed the
>>>>>> process because it is taking too long.
>> There's a reason why on linux it is usually set to run when you are asleep.  ;-)
>>
>>>>>>   Is there something wrong with cygwin's implementation of
>>>>>> `updatedb` making it not work at all or making it slower that on my
>>>>>> Linux machines? Or are there others who have success using it on
>>>>>> cygwin?
>>
>> But it might have to do with disk speed and memory. Laptop drives are
>> usually among the slowest.
>>
>> I ran it just now (this is with MS's Home Essentials real-time
>> protection turned on).
>>> locate / >/tmp/all
>>> wc /tmp/all
>>   1479146   4014375 133322318 /tmp/all
>>> df .
>>
>> law.Bliss/bin> time index_files.sh 670592 (process ID) old priority 0,
>> new priority 19 44.21sec 15.06usr 28.30sys (98.09% cpu) Filesystem Size
>> Used Avail Use% Mounted on C: 949G 585G 365G 62% / ----
>>
>> So ~1.4 million files... Using the following exclusions:
>>   Local+=" /windows/sysnative/."
>>
>> ---(index_files.sh)---- renice +19 $$ Local="/" if [[ -d
>> /windows/sysnative/. ]]; then fi Prunepaths='/.usr /proc /C /B /H /I
>> /M /D /P /System[[:space:]]Volume[[:space:]]Information /Windows/CSC
>> /pagefile.sys /Music /Pictures /Share /Media /home /Doc /$RECYCLE.BIN
>> /cygdrive'
>>
>> /bin/updatedb --findoptions=-noleaf --localpaths="$Local"
>> --prunepaths="$Prunepaths" --netpaths="$Net" ---- Most of those pruned
>> files are pruned either due to redundancy or being on a local network
>> server...
>>
>> That's fairly fast vs. the MS-Home Essentials, full malware scan I
>> run once a week that takes ~ 8-16 hours (It scans a few of my network
>> directories,as well).
>>
>>>>> Processing every file on the drive will be slow just because it's
>>>>> Windows.  Initializing the database with updatedb will require a large
>>>>> amount of time.  There are processes such as AntiVirus intrusion
>>>>> protection that might make it even slower.
>>>>>
>>>> Hmmm, the reason the slowness is particuarly strange to me is that in
>>>> place of using `locate` from my cygwin terminal, I have to use a program
>>>> called "Everything Search Engine" available at www.voidtools.com. The
>>>> first time I install it, it takes maybe a few minutes to index the hard
>>>> drive, then every once in a while when I open the program it takes a few
>>>> seconds to update the index, but in general the performance for indexing
>>>> and searching the index if comparable to `updatedb` and `locate` on a
>>>> Linux machine, so it's possible to do on Windows.
>>>>
>>>> Byron
>>>>
>>>
>>> the time taken from updatedb is mainly due to
>>> the execution time of "find" on the disks.
>>>
>>> It takes ~ 70 minutes for my 500 GB of data,
>>> and likely the AV is impacting the execution.
>>>
>>> I suspect voidtools is using MS disk indexing
>>> to speed up the things for it.
>
> This is technically OT since this involved a non-cygwin tool.
>
> find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).
>
> Compare find with cmd.exe's dir.  Note that even with the benefit of
> caching (compare the 1st and 3rd times), find takes twice as long as dir.
> Comparing cached times (2nd vs 3rd), dir is 3X faster.
>
> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \
> time find /c/usr > /dev/null ; \
> time cmd /c dir /s /b 'C:\usr' > /dev/null
>
> real    0m1.326s
> user    0m0.000s
> sys     0m0.047s
>
> real    0m2.465s
> user    0m0.280s
> sys     0m2.184s
>
> real    0m0.874s
> user    0m0.000s
> sys     0m0.031s
>
> (Note: c:\usr has nothing to do with /usr.)
>
> Here's how I use dir *in the abstract* for drives C: and D:.  (Note: the
> /a: option of dir lists all files, including hidden ones; /o:n sorts by
> name.)
>
> for D in /c /d
> do
>      "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
> done | \
> tr -s '\r\n' '\n' | \
> cygpath -u -f - | \
> sed -e '/^$/d' -e 's,/\+,/,g' \
> sort -u \
> /usr/libexec/frcode > /tmp/updatedb.tmp
> chmod --reference /var/locatedb /tmp/updatedb.tmp
> mv /tmp/updatedb.tmp /var/locatedb
>
> What I actually do (attached) is more complicated.  My script chooses
> which directories are scanned, does them in parallel, and prints pretty
> messages.  I get error message for very long paths (> ~250 bytes).  It
> works well enough for me; YMMV.
>
> - Barry
>    Disclaimer: Statements made herein are not made on behalf of NIAID.
>
>
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>

Barry,

Are you using dir in some sort of custom way to build the database used 
by locate? Or are you saying that rather than ever using the find 
command to find files, you use a custom script which uses dir?

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: locate and updatedb
  2016-02-17 13:43           ` Byron Boulton
@ 2016-02-17 16:01             ` Buchbinder, Barry (NIH/NIAID) [E]
  2016-02-17 16:21               ` Byron Boulton
  0 siblings, 1 reply; 10+ messages in thread
From: Buchbinder, Barry (NIH/NIAID) [E] @ 2016-02-17 16:01 UTC (permalink / raw)
  To: cygwin; +Cc: 'Byron Boulton'

Byron Boulton sent the following at Wednesday, February 17, 2016 8:43 AM
>On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
>>
>> This is technically OT since this involved a non-cygwin tool.
>>
>> find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).
>>
>> Compare find with cmd.exe's dir.  Note that even with the benefit of
>> caching (compare the 1st and 3rd times), find takes twice as long as dir.
>> Comparing cached times (2nd vs 3rd), dir is 3X faster.
>>
>> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \ time find /c/usr >
>> /dev/null ; \ time cmd /c dir /s /b 'C:\usr' > /dev/null
>>
>> real    0m1.326s
>> user    0m0.000s
>> sys     0m0.047s
>>
>> real    0m2.465s
>> user    0m0.280s
>> sys     0m2.184s
>>
>> real    0m0.874s
>> user    0m0.000s
>> sys     0m0.031s
>>
>> (Note: c:\usr has nothing to do with /usr.)
>>
>> Here's how I use dir *in the abstract* for drives C: and D:.  (Note:
>> the
>> /a: option of dir lists all files, including hidden ones; /o:n sorts
>> by
>> name.)
>>
>> for D in /c /d
>> do
>>      "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
>> done | \
>> tr -s '\r\n' '\n' | \
>> cygpath -u -f - | \
>> sed -e '/^$/d' -e 's,/\+,/,g' \
>> sort -u \
>> /usr/libexec/frcode > /tmp/updatedb.tmp chmod --reference
>> /var/locatedb /tmp/updatedb.tmp mv /tmp/updatedb.tmp /var/locatedb
>>
>> What I actually do (attached) is more complicated.  My script chooses
>> which directories are scanned, does them in parallel, and prints
>> pretty messages.  I get error messages for very long paths (> ~250
>> bytes).  It works well enough for me; YMMV.
>
>Are you using dir in some sort of custom way to build the database
>used by locate? Or are you saying that rather than ever using the find
>command to find files, you use a custom script which uses dir?

I use dir only to generate the locate database, because scanning the
better part of several disks takes so long.  I do not substitute dir for
find for other purposes.  One could, but usually locate does what I need,
and when it doesn't, I use find.

Best wishes,

- Barry
  Disclaimer: Statements made herein are not made on behalf of NIAID.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: locate and updatedb
  2016-02-17 16:01             ` Buchbinder, Barry (NIH/NIAID) [E]
@ 2016-02-17 16:21               ` Byron Boulton
  2016-02-17 16:49                 ` Buchbinder, Barry (NIH/NIAID) [E]
  0 siblings, 1 reply; 10+ messages in thread
From: Byron Boulton @ 2016-02-17 16:21 UTC (permalink / raw)
  To: cygwin

On 2/17/2016 11:00 AM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
> Byron Boulton sent the following at Wednesday, February 17, 2016 8:43 AM
>> On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
>>>
>>> This is technically OT since this involved a non-cygwin tool.
>>>
>>> find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).
>>>
>>> Compare find with cmd.exe's dir.  Note that even with the benefit of
>>> caching (compare the 1st and 3rd times), find takes twice as long as dir.
>>> Comparing cached times (2nd vs 3rd), dir is 3X faster.
>>>
>>> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \ time find /c/usr >
>>> /dev/null ; \ time cmd /c dir /s /b 'C:\usr' > /dev/null
>>>
>>> real    0m1.326s
>>> user    0m0.000s
>>> sys     0m0.047s
>>>
>>> real    0m2.465s
>>> user    0m0.280s
>>> sys     0m2.184s
>>>
>>> real    0m0.874s
>>> user    0m0.000s
>>> sys     0m0.031s
>>>
>>> (Note: c:\usr has nothing to do with /usr.)
>>>
>>> Here's how I use dir *in the abstract* for drives C: and D:.  (Note:
>>> the
>>> /a: option of dir lists all files, including hidden ones; /o:n sorts
>>> by
>>> name.)
>>>
>>> for D in /c /d
>>> do
>>>       "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
>>> done | \
>>> tr -s '\r\n' '\n' | \
>>> cygpath -u -f - | \
>>> sed -e '/^$/d' -e 's,/\+,/,g' \
>>> sort -u \
>>> /usr/libexec/frcode > /tmp/updatedb.tmp chmod --reference
>>> /var/locatedb /tmp/updatedb.tmp mv /tmp/updatedb.tmp /var/locatedb
>>>
>>> What I actually do (attached) is more complicated.  My script chooses
>>> which directories are scanned, does them in parallel, and prints
>>> pretty messages.  I get error messages for very long paths (> ~250
>>> bytes).  It works well enough for me; YMMV.
>>
>> Are you using dir in some sort of custom way to build the database
>> used by locate? Or are you saying that rather than ever using the find
>> command to find files, you use a custom script which uses dir?
>
> I use dir only to generate the locate database, because scanning the
> better part of several disks takes so long.  I do not substitute dir for
> find for other purposes.  One could, but usually locate does what I need,
> and when it doesn't, I use find.
>
> Best wishes,
>
> - Barry
>    Disclaimer: Statements made herein are not made on behalf of NIAID.
>
> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
>
locate understands how to read this custom database? If I read you 
updatedb.sh script properly, it produces a file which is just a sorted 
text file with one line per file found by updatedb.sh.

Byron


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: locate and updatedb
  2016-02-17 16:21               ` Byron Boulton
@ 2016-02-17 16:49                 ` Buchbinder, Barry (NIH/NIAID) [E]
  0 siblings, 0 replies; 10+ messages in thread
From: Buchbinder, Barry (NIH/NIAID) [E] @ 2016-02-17 16:49 UTC (permalink / raw)
  To: cygwin; +Cc: 'Byron Boulton'

Byron Boulton sent the following at Wednesday, February 17, 2016 11:21 AM
>On 2/17/2016 11:00 AM, Buchbinder, Barry (NIH/NIAID) [E] wrote: locate
>> Byron Boulton sent the following at Wednesday, February 17, 2016 8:43
>> AM
>>> On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote:
>>>>
>>>> This is technically OT since this involved a non-cygwin tool.
>>>>
>>>> find is slow compared with a non-Cygwin tool, specifically dir (cmd.exe).
>>>>
>>>> Compare find with cmd.exe's dir.  Note that even with the benefit of
>>>> caching (compare the 1st and 3rd times), find takes twice as long as dir.
>>>> Comparing cached times (2nd vs 3rd), dir is 3X faster.
>>>>
>>>> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \ time find /c/usr >
>>>> /dev/null ; \ time cmd /c dir /s /b 'C:\usr' > /dev/null
>>>>
>>>> real    0m1.326s
>>>> user    0m0.000s
>>>> sys     0m0.047s
>>>>
>>>> real    0m2.465s
>>>> user    0m0.280s
>>>> sys     0m2.184s
>>>>
>>>> real    0m0.874s
>>>> user    0m0.000s
>>>> sys     0m0.031s
>>>>
>>>> (Note: c:\usr has nothing to do with /usr.)
>>>>
>>>> Here's how I use dir *in the abstract* for drives C: and D:.  (Note:
>>>> the
>>>> /a: option of dir lists all files, including hidden ones; /o:n sorts
>>>> by
>>>> name.)
>>>>
>>>> for D in /c /d
>>>> do
>>>>       "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$D")"
>>>> done | \
>>>> tr -s '\r\n' '\n' | \
>>>> cygpath -u -f - | \
>>>> sed -e '/^$/d' -e 's,/\+,/,g' \
>>>> sort -u \
>>>> /usr/libexec/frcode > /tmp/updatedb.tmp chmod --reference
>>>> /var/locatedb /tmp/updatedb.tmp mv /tmp/updatedb.tmp /var/locatedb
>>>>
>>>> What I actually do (attached) is more complicated.  My script
>>>> chooses which directories are scanned, does them in parallel, and
>>>> prints pretty messages.  I get error messages for very long paths (>
>>>> ~250 bytes).  It works well enough for me; YMMV.
>>>
>>> Are you using dir in some sort of custom way to build the database
>>> used by locate? Or are you saying that rather than ever using the
>>> find command to find files, you use a custom script which uses dir?
>>
>> I use dir only to generate the locate database, because scanning the
>> better part of several disks takes so long.  I do not substitute dir
>> for find for other purposes.  One could, but usually locate does what
>> I need, and when it doesn't, I use find.
>
>understands how to read this custom database? If I read you updatedb.sh
>script properly, it produces a file which is just a sorted text file
>with one line per file found by updatedb.sh.

Sorry.  In the example in the email text I forgot a pipe sign after sort
and feeding into /usr/libexec/frcode, which convert to located format.
That fragment should have been as follows.

sort -u | \
/usr/libexec/frcode > /tmp/updatedb.tmp

It's really been so long since I put updated.sh together that I would need
to study it to make detailed comments.  Indeed, my memories of putting it
together are lost in the mists of time.

What I'd advise is to use the script that comes with findutils,
/usr/bin/updated, as your model.  Substitute dir for find, adjust start
Points 9drives or directories), convert line endings, etc., and running
through cygpath, and making other necessary changes before running through
frcode.

Sorry that I cannot be of more help.

Good luck.

- Barry
  Disclaimer: Statements made herein are not made on behalf of NIAID.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-17 16:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11 14:01 locate and updatedb Byron Boulton
2016-02-11 18:17 ` cyg Simple
2016-02-11 18:34   ` Byron Boulton
2016-02-11 22:39     ` Marco Atzeri
2016-02-13 12:15       ` Linda Walsh
2016-02-16 22:55         ` Buchbinder, Barry (NIH/NIAID) [E]
2016-02-17 13:43           ` Byron Boulton
2016-02-17 16:01             ` Buchbinder, Barry (NIH/NIAID) [E]
2016-02-17 16:21               ` Byron Boulton
2016-02-17 16:49                 ` Buchbinder, Barry (NIH/NIAID) [E]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).