From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by sourceware.org (Postfix) with ESMTPS id 07B9F385C337 for ; Tue, 28 Jun 2022 15:24:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 07B9F385C337 Received: by mail-ed1-x52f.google.com with SMTP id z19so18047397edb.11 for ; Tue, 28 Jun 2022 08:24:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bQy9q9+kOTpOqhDyztk0BsSHi/H6pFZukDzTWuj+SE4=; b=Kugu7nZZ87Q7WEY1Y+C6NK/2SG2DIuzFDjSBPiaNxCTNYD/QdLwPsrUQhYi/BN8puY E6uN3QdhXEQHh+6TH+ZKajGry/VP+/yItPKFlnW5kcADpKyr58kK7chylmmuNUw2qti3 4Esa4pBIDg9fsTr5zRl9BqUk4ftndIpW9Q8QanzKeYdOaD8Ewss4bmfLoocwncv9Hkpz oxNSFWwwBkGzWH8wemo3HDVZBqrQq4VhNMCaCan1M3a6sFACwlO6Nz/kTbO2PB5NCCOY 30YPggWt57TT3ei1TkR0aEzyC9sKohmNcmWB0HSLv4ZyryB4/RACY9YSMfBJ98qSBnSn pISg== X-Gm-Message-State: AJIora+j+YDOCgTs0U2GpVIVj8gtqd7YKzhDsmD4RmN6HtI0X5wMNtQh kCW8HY8ejEeZK7s8jbKDC/Ge7lQRT6N96uib3gY= X-Google-Smtp-Source: AGRyM1sszAFTyZ2rio96Fr4vcGspXkkr0zFt71pUqi6aQJBVqXpPg31Cz21+mhjLJ7UBvSsJkTdVSpxQrBRmIAlm3Uw= X-Received: by 2002:a05:6402:1219:b0:437:74dd:640d with SMTP id c25-20020a056402121900b0043774dd640dmr20483874edw.312.1656429843666; Tue, 28 Jun 2022 08:24:03 -0700 (PDT) MIME-Version: 1.0 References: <3eb44329-3b12-896c-14c4-3473d43aed3d@ispras.ru> In-Reply-To: <3eb44329-3b12-896c-14c4-3473d43aed3d@ispras.ru> From: Adonis Ling Date: Tue, 28 Jun 2022 23:23:52 +0800 Message-ID: Subject: Re: Why does different types of array subscript used to iterate affect auto vectorization To: Alexander Monakov Cc: gcc-help@gcc.gnu.org X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2022 15:24:07 -0000 Hi Alexander, thanks for your reply. On Tue, Jun 28, 2022 at 9:06 PM Alexander Monakov wrote: > On Mon, 27 Jun 2022, Adonis Ling via Gcc-help wrote: > > > Hi all, > > > > Recently, I met an issue with auto vectorization. > > > > As following code shows, why uint32_t prevents the compiler (GCC 12.1 + > O3) > > from optimizing by auto vectorization. See > https://godbolt.org/z/a3GfaKEq6. > > > > #include > > > > // no auto vectorization > > void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t > to) { > > for (uint32_t i = from; i < to; i++) { > > array[nread++] = i; > > } > > } > > Here the main problem is '*array' and 'nread' have the same type, so they > might > overlap. Ideally the compiler would recognize that that cannot happen > because it > would make 'array[nread++] = i' undefined due to unsequenced > modifications, but > GCC is not sufficiently smart (yet). The secondary issue is the same as > below: > I got your point. After that, I tried to add __restrict__ to nread as the following shows and GCC still doesn't optimize it. #include // no auto vectorization void test32(uint32_t *array, uint32_t & __restrict__ nread, uint32_t from, uint32_t to) { for (uint32_t i = from; i < to; i++) { array[nread++] = i; } } However, when I used Clang to compile, I noticed the code was optimized by Clang. See https://godbolt.org/z/eEz9W7o9z . > > // no auto vectorization > > void test_another_32(uint32_t *array, uint32_t &nread, uint32_t from, > > uint32_t to) { > > uint32_t index = nread; > > for (uint32_t i = from; i < to; i++) { > > array[index++] = i; > > } > > nread = index; > > } > > ... here: the issue is that index is unsigned and shorter than pointer > type, it > can wrap around from 0xffffffff to 0, making the access non-consecutive. > When > you compile for 32-bit x86, this loop is vectorized. > > Alexander > Clang also optimizes this function. See https://godbolt.org/z/eEz9W7o9z . -- Best regards, Adonis