public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* RE: find.exe vs. cmd.exe dir command vs. filesystem object in vbs script
@ 2012-05-05 22:27 Buchbinder, Barry (NIH/NIAID) [E]
  0 siblings, 0 replies; 3+ messages in thread
From: Buchbinder, Barry (NIH/NIAID) [E] @ 2012-05-05 22:27 UTC (permalink / raw)
  To: cygwin, 'Cary Lewis'

Cary Lewis sent the following at Friday, April 27, 2012 10:29 AM
>I have a system that makes use of a number of directories which contain
>hundreds of thousands of files.
>
>The sheer number of files in the directories makes it very difficult to
>do simple things using cygwin.
>
>For example the find command takes a very long time to start outputting
>filenames.
>
>However, in a cmd.exe window, the dir.exe command immediately starts
>outputting files.
>
>I would like to find out which api calls the CMD dir.exe command is
>using vs. the cygwin find.exe program.
>
>In the end I want to build an efficient delete files utility based on
>date, type, etc. I also need to compare files in the filesystem with
>references in a database
>
>I am starting to think that I should use the CMD dir.exe command and by
>parsing its output, take appropriate action.
>
>Performance is further hampered by the files residing on a SAN.

I use cmd's DIR to just get file & directory names, finding it much faster
than find.

$ "$(cygpath -u "${COMSPEC}")" /c dir /s /b /a: /o:n "$(cygpath -w "${CygwinPath}")" | \
tr -s '\r\n' '\n' | \
cygpath -u -f -

(There might be a speed advantages to working up a sed script instead of using
cygpath.  Based on *no data*, I've assumed that cmd's speed advantage over find
is due to not stating files.  If cygpath stats files, sed might be faster.)

While you might be able to get cmd /c DIR to give you dates, that will
probably require use of gawk or the like.

- Barry
  Disclaimer:  Statements made herein are not made on behalf of NIAID.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: find.exe vs. cmd.exe dir command vs. filesystem object in vbs script
  2012-04-27 14:29 Cary Lewis
@ 2012-04-27 18:28 ` Keith Christian
  0 siblings, 0 replies; 3+ messages in thread
From: Keith Christian @ 2012-04-27 18:28 UTC (permalink / raw)
  To: cygwin

On Fri, Apr 27, 2012 at 8:28 AM, Cary Lewis <cary.lewis@gmail.com> wrote:
> I have a system that makes use of a number of directories which
> contain hundreds of thousands of files.


Cary, I can't comment on any API references, but here is a possible workaround.

The "locate" command works similarly to the "find" command, but
consults a special database (which you can re-generate at any time)
for quick access.

Then, when searching for a directory name or file name, you can use
the "locate" command and the collection of separate "locatedb"
databases, which will return results very quickly

Note - whenever files are added, deleted, or renamed in any of the
hypothetical "somedir_0n" directories, the "updatedb" command will
have to be run again to create the locatedb databases.

The "locate -S -d /var/locatedb_somedir_0n" command outputs statistics
on the database just created ,showing the number of filenames in each
and other stats.


Example:

1. Suppose the following directories exist on your SAN, and each of
"somedir_nn" contains about 100,000 files each.

	/san_main_dir/corp_files/somedir_01
	/san_main_dir/corp_files/somedir_02
	/san_main_dir/corp_files/somedir_03
	/san_main_dir/corp_files/somedir_04
		[ .....ad nauseum..... ]
	/san_main_dir/corp_files/somedir_nn


2. Put the following lines into a script (e.g.
create_san_locatedb.sh,) which will create a separate "locatedb"
database for each subdirectory:


	#!/bin/bash
	# Create separate locatedb databases for directories containing a
large number of files on a SAN.

	time updatedb --localpaths='/san_main_dir/corp_files/somedir_01'
--output=/var/locatedb_somedir_01 2>/dev/null
	echo "locatedb_somedir_01 created..."
	echo
	locate -S -d /var/locatedb_somedir_01
	echo
	time updatedb --localpaths='/san_main_dir/corp_files/somedir_02'
--output=/var/locatedb_somedir_02 2>/dev/null
	echo "locatedb_somedir_02 created..."
	echo
	locate -S -d /var/locatedb_somedir_02
	echo
	time updatedb --localpaths='/san_main_dir/corp_files/somedir_03'
--output=/var/locatedb_somedir_03 2>/dev/null
	echo "locatedb_somedir_03 created..."
	echo
	locate -S -d /var/locatedb_somedir_03
	echo
	time updatedb --localpaths='/san_main_dir/corp_files/somedir_04'
--output=/var/locatedb_somedir_04 2>/dev/null
	echo "locatedb_somedir_04 created..."
	echo
	locate -S -d /var/locatedb_somedir_04
	echo
	echo "Custom locatedb directories created"



3. Now that the databases are created, here are some example commands
to find the directory and filenames within:


	Show a list of files ending in "DAT2012" from the database locatedb_somedir_03:

		locate --database=/var/locatedb_somedir_03 "*DAT2012"


	Show a list of files, (ignoring cAsE sEnSiTiViTy,) with "dAt2012"
anywhere in the directory path
	or in the filename, from the database locatedb_somedir_03:

		locate -i --database=/var/locatedb_somedir_03 "dat2012"


	Show a list of files with "DAT2012" with a preceding directory name
containing "uncooked",
	from the database locatedb_somedir_01:

		locate --database=/var/locatedb_somedir_01 "*uncooked*DAT2012"


	Show a list of files with "DAT2012" with a preceding file or
directory name containing "uncooked",
	from the database locatedb_somedir_01:

		locate --database=/var/locatedb_somedir_01 "*uncooked*DAT2012"


	Search across all four hypothetical locatedb_somedir_0n databases
using four separate command lines:
		
		locate --database=/var/locatedb_somedir_01 "DAT2012"
		locate --database=/var/locatedb_somedir_02 "DAT2012"
		locate --database=/var/locatedb_somedir_03 "DAT2012"
		locate --database=/var/locatedb_somedir_04 "DAT2012"


4. If the directory names or file names below /san_main_dir/corp_files
change after the "locatedb_somedir_0n" databases are created, new
files won't be found using "locate," and items deleted since the
previous steps in (2) above will still appear until the databases are
re-created.  Re-run Step 2 if this is the case.  The "time" commands
before each updatedb command will help gauge how long it takes to
create the "locatedb" databases.


========Keith

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

* find.exe vs. cmd.exe dir command vs. filesystem object in vbs script
@ 2012-04-27 14:29 Cary Lewis
  2012-04-27 18:28 ` Keith Christian
  0 siblings, 1 reply; 3+ messages in thread
From: Cary Lewis @ 2012-04-27 14:29 UTC (permalink / raw)
  To: cygwin

I have a system that makes use of a number of directories which
contain hundreds of thousands of files.

I know this is a bad design. I inherited it.

The sheer number of files in the directories makes it very difficult
to do simple things using cygwin.

For example the find command takes a very long time to start
outputting filenames.

However, in a cmd.exe window, the dir.exe command immediately starts
outputting files.

I would like to find out which api calls the CMD dir.exe command is
using vs. the cygwin find.exe program.

In the end I want to build an efficient delete files utility based on
date, type, etc. I also need to compare files in the filesystem with
references in a database

I am starting to think that I should use the CMD dir.exe command and
by parsing its output, take appropriate action.

Performance is further hampered by the files residing on a SAN.

Any thoughts / suggestions?

Thanks.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-05 22:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-05 22:27 find.exe vs. cmd.exe dir command vs. filesystem object in vbs script Buchbinder, Barry (NIH/NIAID) [E]
  -- strict thread matches above, loose matches on Subject: below --
2012-04-27 14:29 Cary Lewis
2012-04-27 18:28 ` Keith Christian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).