From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id B0538382893E for ; Tue, 28 Jun 2022 13:06:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0538382893E Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id 682F740755C5; Tue, 28 Jun 2022 13:06:13 +0000 (UTC) Date: Tue, 28 Jun 2022 16:06:13 +0300 (MSK) From: Alexander Monakov To: Adonis Ling cc: gcc-help@gcc.gnu.org Subject: Re: Why does different types of array subscript used to iterate affect auto vectorization In-Reply-To: Message-ID: <3eb44329-3b12-896c-14c4-3473d43aed3d@ispras.ru> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2022 13:06:21 -0000 On Mon, 27 Jun 2022, Adonis Ling via Gcc-help wrote: > Hi all, > > Recently, I met an issue with auto vectorization. > > As following code shows, why uint32_t prevents the compiler (GCC 12.1 + O3) > from optimizing by auto vectorization. See https://godbolt.org/z/a3GfaKEq6. > > #include > > // no auto vectorization > void test32(uint32_t *array, uint32_t &nread, uint32_t from, uint32_t to) { > for (uint32_t i = from; i < to; i++) { > array[nread++] = i; > } > } Here the main problem is '*array' and 'nread' have the same type, so they might overlap. Ideally the compiler would recognize that that cannot happen because it would make 'array[nread++] = i' undefined due to unsequenced modifications, but GCC is not sufficiently smart (yet). The secondary issue is the same as below: > // no auto vectorization > void test_another_32(uint32_t *array, uint32_t &nread, uint32_t from, > uint32_t to) { > uint32_t index = nread; > for (uint32_t i = from; i < to; i++) { > array[index++] = i; > } > nread = index; > } ... here: the issue is that index is unsigned and shorter than pointer type, it can wrap around from 0xffffffff to 0, making the access non-consecutive. When you compile for 32-bit x86, this loop is vectorized. Alexander