From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id E17F03857410 for ; Fri, 27 May 2022 21:12:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E17F03857410 Received: by mail-ej1-x62d.google.com with SMTP id wh22so10936451ejb.7 for ; Fri, 27 May 2022 14:12:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Hn7QSCNQlyaWWG7snk6Fy7GkS4eN2ep25IUmz14biS8=; b=A4jqVrO/8qyUDCnIDXSu7bnRL2HbzXj2hC/YtLs7rWj+PjmY8qLh46+sgIwnT31q0f bY+6vn8bSm8jQvs2jkbbO6A4g4WMwJ5QMy/JkngjTXFs8UwSSY5KG03ydQi6SVfDwZ+1 9CT5Rq1rk66sP6LscHdiRXBCZKnnYH3H5sCBpT9q1xNR1GaLSfzIq+jrqC0Fsi7pBbXv klQEdltmVIO5L9+Jpybdl9grTGajUMOwomUBSMQAdDdnLv+XTnultayc0bINiGXO8pUp pStycAj0kX8T6tjIhFPKSfIacouNRi9wFbB0OpfSqrO9AWjTHyuq8kUKX2PETw+TR1qf nzlQ== X-Gm-Message-State: AOAM530fijrIKa6y3Am4hKNYsGk6tEPthJa/DxpSOi4XM8LiPM4BroNF +x7AcGocMF3DdOje/dtNNK1tmAdCIhM8VzVtOXiSYMcaKpJ4DQ9z X-Google-Smtp-Source: ABdhPJzO3xi/6qtxOosRNXtYwi63oBCA2S1U4xGaxTn0Bb9ES6J6IFmC5tLoQgT0yvfigkndhK3aiHBTDd6FeiRnjKg= X-Received: by 2002:a17:907:6e88:b0:6fa:888d:74a7 with SMTP id sh8-20020a1709076e8800b006fa888d74a7mr38997506ejc.335.1653685953202; Fri, 27 May 2022 14:12:33 -0700 (PDT) MIME-Version: 1.0 From: Laleh Beni Date: Fri, 27 May 2022 14:12:04 -0700 Message-ID: Subject: Loop splitting based on constant prefix of an array To: gcc@gcc.gnu.org X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 May 2022 21:12:37 -0000 GCC compiler is able to understand if the prefix of an array holds constant/static data and apply compiler optimizations on that partial constant part of the array, however, it seems that it is not leveraging this information in all cases. On understanding the behavior of compiler optimization for partially constant arrays and especially how the loop splitting pass could have an influence on the potential constant related optimizations such as constant folding I am using the following example: Considering an array where the prefix of that array is compile-time constant data, and the rest of the array is runtime data, should the compiler be able to optimize the calculation for the first part of the array? Let's look at the below example: You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b #include inline int sum(const int array[], size_t len) { int res =3D 0; for (size_t i =3D 0; i < len; i++) { res +=3D array[i]; } return res; } int main(int argc, char** argv) { int arr1[6] =3D {200,2,3, argc, argc+1, argc+2}; return sum(arr1, 6); } In our sum function we are measuring the some of the array elements, where the first half of it is static compile-time constants and the second half are dynamic data. When we compile this with the "x86-64 GCC 12.1" compiler with "-O3 -std=3Dc++2a " flags, we get the following assembly code: main: mov rax, QWORD PTR .LC0[rip] mov DWORD PTR [rsp-28], edi mov DWORD PTR [rsp-32], 3 movq xmm1, QWORD PTR [rsp-32] mov QWORD PTR [rsp-40], rax movq xmm0, QWORD PTR [rsp-40] lea eax, [rdi+1] add edi, 2 mov DWORD PTR [rsp-24], eax paddd xmm0, xmm1 mov DWORD PTR [rsp-20], edi movq xmm1, QWORD PTR [rsp-24] paddd xmm0, xmm1 movd eax, xmm0 pshufd xmm2, xmm0, 0xe5 movd edx, xmm2 add eax, edx ret .LC0: .long 200 .long 2 However, if we add an =E2=80=9Cif=E2=80=9D condition in the loop for calcul= ating the result of the sum, the if condition seems to enable the loop splitting pass: You can see the code and its assembly here: https://godbolt.org/z/ejecbjMK= G #include inline int sum(const int array[], size_t len) { int res =3D 0; for (size_t i =3D 0; i < len; i++) { if (i < 1) res +=3D array[i]; else res +=3D array[i]; } return res; } int main(int argc, char** argv) { int arr1[6] =3D {200,2,3, argc, argc+1, argc+2}; return sum(arr1, 6); } we get the following assembly code: main: lea eax, [rdi+208+rdi*2] ret As you can see the =E2=80=9Cif=E2=80=9D condition has the same calculation = for both the =E2=80=9Cif=E2=80=9D and =E2=80=9Celse=E2=80=9D branch in calculating the s= um over the array, however, it seems that it is triggering the =E2=80=9Cloop splitting pass=E2=80=9D which= results in further optimizations such as constant folding of the whole computation and resulting in such a smaller and faster assembly code. My question is, why the compiler is not able to take advantage of constantans in the prefix of the array in the first place? Also adding a not necessary =E2=80=9Cif condition=E2=80=9D which is just re= peating the same code for "if" and "else", doesn=E2=80=99t seem to be the best way to hint t= he compiler to take advantage of this optimization; so is there another way to make the compiler aware of this? ( I used the -fsplit-loops flag and it didn't have any effect for this example.) As a next step if we use an array that has some constant values in the prefix but not a compile time constant length such as the following example= : Code link is here: https://godbolt.org/z/3qGqshzn9 #include inline int sum(const int array[], size_t len) { int res =3D 0; for (size_t i =3D 0; i < len; i++) { if (i < 1) res +=3D array[i]; else res +=3D array[i]; } return res; } int main(int argc, char** argv) { size_t len =3D argc+3; int arr3[len] =3D {600,10,1}; for (unsigned int i =3D 3; i < len; i++) arr3[i] =3D argc+i; return sum(arr3, 2); } In this case the GCC compiler is not able to apply constant folding on the first part of the array! In general is there anyway that the GCC compiler would understand this and apply constant folding optimizations here?