public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Jeffrey Altman <jaltman@secure-endpoints.com>
To: cygwin@cygwin.com
Subject: Re: ls/stat on OneDrive causes download of files
Date: Wed, 6 Mar 2024 13:55:17 -0500	[thread overview]
Message-ID: <7d9fe460-5704-424b-a89b-e34ef2176d38@secure-endpoints.com> (raw)
In-Reply-To: <ZeilkJK7Csryuzkc@calimero.vinschen.de>

On 3/6/2024 12:19 PM, Corinna Vinschen via Cygwin wrote:
> We can add an explicit call to
>
>    RtlSetProcessPlaceholderCompatibilityMode (PHCM_EXPOSE_PLACEHOLDERS);
>
> and we can recognize the IO_REPARSE_TAG_FILE_PLACEHOLDER and
> IO_REPARSE_TAG_CLOUD_* tags during symlink evaluation, but even then
> we still have to know what the reparse point buffer actually contains.
>
> Given that the content of reparse points with these reparse tags are
> undocumented, some people using cloud services should examine these
> reparse points so we can add some suitable code to Cygwin.
>
>
> Corinna
I'm not an expert in this area by any means but here are my 
recollections from when Microsoft presented in-person on cloud 
placeholders to filter and filesystem developers many years ago.

Files and directories that are placeholders should have either the 
FILE_ATTRIBUTE_RECALL_ON_DATA_ACCESS or FILE_ATTRIBUTE_RECALL_ON_OPEN 
file attributes set. When these attributes are set, applications and 
mini filters are advised not to "read" or "open" the files or 
directories unless they absolutely need to because doing so will cause 
the placeholder to be replaced by an object containing the actual data 
which might take a long time to fetch, might cost the end user money, or 
might fail depending upon the network connectivity. In particular, 
anti-malware should ignore them during scans and only analyze the data 
when it is fetched locally by an end user application.

I believe that IO_REPARSE_TAG_FILE_PLACEHOLDER was replaced by 
IO_REPARSE_TAG_CLOUD_1 ..IO_REPARSE_TAG_CLOUD_F. Any reparse tag 
attached to a placeholder object is for the interpretation of the filter 
associated with the back-end storage and not for the consumption of 
applications. The content of the reparse tags can be back-end 
proprietary; different reparse data for onedrive, icloud, dropbox, etc.

The default ProcessPlaceholderCompaibilityMode is 
PHCM_EXPOSE_PLACEHOLDERS which makes the FILE_ATTRIBUTE flags and 
reparse tags visible. Microsoft maintains a database of processes for 
which PHCM_DISGUISE_PLACEHOLDER is set which hides that information. Its 
unclear to me that explicitly setting the placeholder compatibility mode 
is useful.

I'm not sure that exposing the object as a symlink is a good idea. A 
posix symlink is an object whose type and target information cannot 
change. In the case of a placeholder, the placeholder is silently 
replaced by the actual object either when the object is opened or the 
object's data is accessed. An application that believes it knows that 
the object is a symlink will be mighty confused when it turns out to be 
a file or a directory.

Perhaps the question that needs to be asked is whether there are opens 
that can be skipped if an object is known to not be locally present 
(either of the FILE_ATTRIBUTE flags are set)?

Jeffrey Altman



  reply	other threads:[~2024-03-06 18:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06  0:54 Marcin Wisnicki
2024-03-06 13:22 ` Corinna Vinschen
2024-03-06 13:28   ` Corinna Vinschen
2024-03-06 13:54     ` Brian Inglis
2024-03-06 17:19       ` Corinna Vinschen
2024-03-06 18:55         ` Jeffrey Altman [this message]
2024-03-06 19:14           ` Corinna Vinschen
2024-03-07  9:06           ` Corinna Vinschen
2024-03-08 10:37           ` Corinna Vinschen
2024-03-08 12:52             ` Thomas Wolff
2024-03-08 13:15               ` Jeffrey Altman
2024-03-08 13:56                 ` Corinna Vinschen
2024-03-08 22:21                   ` Corinna Vinschen
2024-03-08 22:26                     ` Marcin Wisnicki
2024-03-09 20:29                       ` Marcin Wisnicki
2024-03-11 17:04                         ` Corinna Vinschen
2024-03-06 19:00         ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d9fe460-5704-424b-a89b-e34ef2176d38@secure-endpoints.com \
    --to=jaltman@secure-endpoints.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).