From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 64683 invoked by alias); 17 Feb 2016 16:49:56 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 64672 invoked by uid 89); 17 Feb 2016 16:49:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=findutils, H*MI:sk:56C49DF, H*f:sk:56C49DF, H*i:sk:56C49DF X-HELO: nihxwayst04out.hub.nih.gov Received: from nihxwayst04out.hub.nih.gov (HELO nihxwayst04out.hub.nih.gov) (165.112.13.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 17 Feb 2016 16:49:54 +0000 X-SBRS-Extended: Low X-IronPortListener: Outbound_SMTP X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2D7AwD2o8RW/4FHKJxegzpSLEEGuhUBDYFnGYV0AoFGOBQBAQEBAQEBZBwLhEEBAQEEEig/EAIBCA0LChQQHxMlAgQODRqHeAWiDZkcAQEBAQEBAQECAQEBAQEBGoYThDqEBREBHoMrgQ8FjSqFSYQRARKFQIgGgVxKhxIMhS+ORx4BAUKBfxyBSGqHMTQBewEBAQ X-IPAS-Result: A2D7AwD2o8RW/4FHKJxegzpSLEEGuhUBDYFnGYV0AoFGOBQBAQEBAQEBZBwLhEEBAQEEEig/EAIBCA0LChQQHxMlAgQODRqHeAWiDZkcAQEBAQEBAQECAQEBAQEBGoYThDqEBREBHoMrgQ8FjSqFSYQRARKFQIgGgVxKhxIMhS+ORx4BAUKBfxyBSGqHMTQBewEBAQ Received: from unknown (HELO msgb11.nih.gov) ([156.40.71.129]) by nihxwayst04out.hub.nih.gov with ESMTP/TLS/AES256-SHA; 17 Feb 2016 11:49:52 -0500 Received: from MSGB09.nih.gov ([169.254.9.140]) by msgb11.nih.gov ([169.254.1.25]) with mapi id 14.03.0235.001; Wed, 17 Feb 2016 11:49:52 -0500 From: "Buchbinder, Barry (NIH/NIAID) [E]" To: "cygwin@cygwin.com" CC: 'Byron Boulton' Subject: RE: locate and updatedb Date: Wed, 17 Feb 2016 16:49:00 -0000 Message-ID: <6CF2FC1279D0844C9357664DC5A08BA21BD3305C@msgb09.nih.gov> References: <56BC940F.6070109@zoho.com> <56BCD05C.2040409@gmail.com> <56BCD414.2010304@zoho.com> <56BD0D87.6030008@gmail.com> <56BF1E4D.5000901@tlinx.org> <6CF2FC1279D0844C9357664DC5A08BA21BD2FA07@msgb09.nih.gov> <56C478E0.70904@zoho.com> <6CF2FC1279D0844C9357664DC5A08BA21BD31EAE@msgb09.nih.gov> <56C49DFD.5060700@zoho.com> In-Reply-To: <56C49DFD.5060700@zoho.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2016-02/txt/msg00275.txt.bz2 Byron Boulton sent the following at Wednesday, February 17, 2016 11:21 AM >On 2/17/2016 11:00 AM, Buchbinder, Barry (NIH/NIAID) [E] wrote: locate >> Byron Boulton sent the following at Wednesday, February 17, 2016 8:43 >> AM >>> On 2/16/2016 5:55 PM, Buchbinder, Barry (NIH/NIAID) [E] wrote: >>>> >>>> This is technically OT since this involved a non-cygwin tool. >>>> >>>> find is slow compared with a non-Cygwin tool, specifically dir (cmd.ex= e). >>>> >>>> Compare find with cmd.exe's dir. Note that even with the benefit of >>>> caching (compare the 1st and 3rd times), find takes twice as long as d= ir. >>>> Comparing cached times (2nd vs 3rd), dir is 3X faster. >>>> >>>> $ time cmd /c dir /s /b 'C:\usr' > /dev/null ; \ time find /c/usr > >>>> /dev/null ; \ time cmd /c dir /s /b 'C:\usr' > /dev/null >>>> >>>> real 0m1.326s >>>> user 0m0.000s >>>> sys 0m0.047s >>>> >>>> real 0m2.465s >>>> user 0m0.280s >>>> sys 0m2.184s >>>> >>>> real 0m0.874s >>>> user 0m0.000s >>>> sys 0m0.031s >>>> >>>> (Note: c:\usr has nothing to do with /usr.) >>>> >>>> Here's how I use dir *in the abstract* for drives C: and D:. (Note: >>>> the >>>> /a: option of dir lists all files, including hidden ones; /o:n sorts >>>> by >>>> name.) >>>> >>>> for D in /c /d >>>> do >>>> "$(cygpath "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "$= D")" >>>> done | \ >>>> tr -s '\r\n' '\n' | \ >>>> cygpath -u -f - | \ >>>> sed -e '/^$/d' -e 's,/\+,/,g' \ >>>> sort -u \ >>>> /usr/libexec/frcode > /tmp/updatedb.tmp chmod --reference >>>> /var/locatedb /tmp/updatedb.tmp mv /tmp/updatedb.tmp /var/locatedb >>>> >>>> What I actually do (attached) is more complicated. My script >>>> chooses which directories are scanned, does them in parallel, and >>>> prints pretty messages. I get error messages for very long paths (> >>>> ~250 bytes). It works well enough for me; YMMV. >>> >>> Are you using dir in some sort of custom way to build the database >>> used by locate? Or are you saying that rather than ever using the >>> find command to find files, you use a custom script which uses dir? >> >> I use dir only to generate the locate database, because scanning the >> better part of several disks takes so long. I do not substitute dir >> for find for other purposes. One could, but usually locate does what >> I need, and when it doesn't, I use find. > >understands how to read this custom database? If I read you updatedb.sh >script properly, it produces a file which is just a sorted text file >with one line per file found by updatedb.sh. Sorry. In the example in the email text I forgot a pipe sign after sort and feeding into /usr/libexec/frcode, which convert to located format. That fragment should have been as follows. sort -u | \ /usr/libexec/frcode > /tmp/updatedb.tmp It's really been so long since I put updated.sh together that I would need to study it to make detailed comments. Indeed, my memories of putting it together are lost in the mists of time. What I'd advise is to use the script that comes with findutils, /usr/bin/updated, as your model. Substitute dir for find, adjust start Points 9drives or directories), convert line endings, etc., and running through cygpath, and making other necessary changes before running through frcode. Sorry that I cannot be of more help. Good luck. - Barry Disclaimer: Statements made herein are not made on behalf of NIAID. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple