From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B98893858CDA; Mon, 22 Apr 2024 22:15:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B98893858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1713824118; bh=up6MyTOgopH2IypHvjwRJx0AY1X3hFJH7STpKiOEz0Y=; h=From:To:Subject:Date:From; b=AepQZthCDylaXkDgUCmf14Zdul8Ge0ULycA3fRnWGK7DLqCcI0RxWgfW6Bj1eZNxf YFVpxpLmuKDnsmtNcYzRlDLlIYepn7i36ceUv+3wNCq4z/0Kz+Pp83QJm1gPhp9cWO BNbvGsaRsdIUC2fwvsazpZKn2dVDlZJ+BzlPvGT4= From: "pinskia at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114814] New: Reduction sum of comparison should be better Date: Mon, 22 Apr 2024 22:15:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: pinskia at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114814 Bug ID: 114814 Summary: Reduction sum of comparison should be better Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take the example from PR 114809: ``` #include #include size_t count_chars(const char *src, size_t len, char c) { size_t count =3D 0; for (size_t i=3D0; i < len; i++) { count +=3D src[i] =3D=3D c; } return count; } ``` For aarch64 we produce currently for the inner loop: ``` .L4: ldr q31, [x3], 16 cmeq v31.16b, v31.16b, v22.16b and v31.16b, v23.16b, v31.16b zip1 v27.16b, v31.16b, v29.16b zip2 v31.16b, v31.16b, v29.16b zip1 v25.8h, v27.8h, v29.8h zip2 v27.8h, v27.8h, v29.8h zip1 v26.8h, v31.8h, v29.8h zip2 v31.8h, v31.8h, v29.8h zip2 v30.4s, v25.4s, v29.4s zip2 v28.4s, v27.4s, v29.4s uaddw v30.2d, v30.2d, v25.2s uaddw v28.2d, v28.2d, v27.2s uaddw v30.2d, v30.2d, v26.2s uaddw2 v28.2d, v28.2d, v26.4s uaddw v30.2d, v30.2d, v31.2s uaddw2 v31.2d, v28.2d, v31.4s add v31.2d, v30.2d, v31.2d add v24.2d, v24.2d, v31.2d cmp x5, x3 bne .L4 ``` But instead we should be able to just do: ``` .L4: ldr q31, [x3], 16 cmeq v31.16b, v31.16b, v22.16b and v31.16b, v23.16b, v31.16b addv b31, v31.16b fmov x0, d31 add x1, x1, x0 cmp x5, x3 bne .L4 ``` Instead. That is do the reduction of the sum of the compare inside the loop rather than outside.=