From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id AAC72385735F for ; Tue, 11 Oct 2022 07:53:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AAC72385735F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x634.google.com with SMTP id bj12so29493840ejb.13 for ; Tue, 11 Oct 2022 00:53:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=; b=E77dprdGP56Re3bKL/JK3nItDXvZQ+coHewA20dAxSrpcAZd3wfMuOYO6vxPmSVG0f ms/odw+297eD4SBkVx6BfcbvSlYdsIIk6TgW8xMqmmqW9/Zk8KvcLzGi5ZW4H4syuS+Y h/EqXaKQZwbjxGv8pzqdCF5WzK3yM0sME6zxWK5tbnTY6NKTqLUkPzdTTOhQRt1lpmcp ihUR3HbAu/9yCw0PEFRd9DeKPGLBrNAL475DVENZi8GRp7X5cHx79PARM1I+tw5xG86+ asmXuNs7kN3Cmcu5Sl36lLAXSJQLJraQA/zb4jWaC0gqEdRTuBt0c4SIHZ+mXWGog054 2wDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=; b=468FI7c6uCieCH+FDFNi+YXAfJBm9JOMW+Qq7I2lHZrLhVJtfcX2JmNEyAJwamqfDu xmYDTM3/A6YZY5ltdY7T4Lmastcx4qUHdqiW/OVbtYoEEIHBJNXAiAa+wwnUXXAj1KIK Yj/of6hyC5XaWhJ/UUhfUBHcowPOoiNDQjF4Px6X2SVZNKPxWSyFpC5xjdnvYSBRSSsT qeQPwHCoZs7XjrrLh0hUL1qx1wvb01XxPMALn6W0Y4uA5akGnscXn4BP1YMXBMyt1t9F NjOs/I2PBPrYjxEcQmvjM05oka/06O3DI/9AWtq+6/nWuXk1pUBWbZMuBwESEF6lgIxS wTzQ== X-Gm-Message-State: ACrzQf3KboE6B2PWDyWbqNhxwQXgkRMT2SfgQLNxMqE6txdi5cUPixJO VqkSW+uM67ZDl8fjKnmuQeqFDqlvhKSb49RKVyC4ySadaYs= X-Google-Smtp-Source: AMsMyM5bmKTxLbuHb6VTQxKfBvmGZwhmGwCj4EtA0OqGUkRpt5KxnHuX4EbInGrHEycWG2KJpbyhli1oQQwLhQrdVcs= X-Received: by 2002:a17:907:2672:b0:780:8bb5:25a3 with SMTP id ci18-20020a170907267200b007808bb525a3mr17877712ejc.281.1665474792887; Tue, 11 Oct 2022 00:53:12 -0700 (PDT) MIME-Version: 1.0 From: "Matt D." Date: Tue, 11 Oct 2022 03:53:01 -0400 Message-ID: Subject: Cygwin triggers integrity scrubbing on ReFS filesystems, making searching files impossible on large datasets To: cygwin@cygwin.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_20,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,URI_TRY_3LD autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I formatted a drive today, with ReFS on a Storage Pool mirror with integrity streams enabled, before copying data over from a backup. The data included several million files, which I search often with tools like find and grep. After the copy was finished, I tried doing a simple find: time find . -iname file.png I noticed that the search was taking much longer than expected, and I gave up after waiting for over 20 minutes. I confirmed that I could perform a search of the same data on an external USB3 drive formatted NTFS in between 1-1.5 minutes. To verify that this is in fact an incompatibility with ReFS's integrity streams, I formatted the same pool with this feature disabled and copied the files back over. Without integrity streams, the find operation took about 30 seconds. I confirmed this further by formatting the pool as NTFS, with a similar result. I then formatted the pool one last time with ReFS again with integrity streams enabled, and the problem returned. Although the behavior appears as a program hang, it's just very slow at searching, and not actually frozen. It continues to respond to Ctrl-C and, if a more permissive pattern is used, output can be seen during the search; it's just very slow. I believe the issue has to do with how Cygwin or find is accessing these files as it searches, triggering the integrity scrubber on each visit, causing the search to be unbearably slow. Using Windows search on the same disk does not have this problem. I haven't tried to do any performance comparison with grep, but I would expect the experience to be similarly poor or worse. It's interesting that the scrubber is triggered in this example with find, as I'm only examining the name of files, and not trying to read their contents. See here for more information on ReFS integrity streams: https://learn.microsoft.com/en-us/windows-server/storage/refs/integrity-streams To format a disk with this feature, PowerShell must be used, as it's not enabled by default or accessible from the GUI: Format-Volume -DriveLetter D -FileSystem REFS -SetIntegrityStreams $true The hardware I used was two Crucial MX500 2TB SSDs, recently trimmed, in a RAID1 mirror configuration in Storage Spaces on Windows 10 Professional for Workstations. My system just formatted and fully updated. Cygwin was also a fresh download and fully updated. The system is otherwise very fast, with a Ryzen 1800X and 64GB of memory. At this point, I am unable to use Cygwin whatsoever on any disk formatted ReFS with the integrity streams feature enabled for any kind of performant workload on a dataset that includes I/O on a large number of files.