From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 09D2E3858C74; Sat, 20 Jan 2024 05:42:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 09D2E3858C74
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1705729354;
	bh=TyYdAcLZXlVU8i+kvus/nQCohEG0WOscaJMYLhwzKFY=;
	h=From:To:Subject:Date:From;
	b=FE0bJDnlOZbgjVmIJNE03I+ugYdWXIRB44dwebOmgpwlLH/HesJ9FB4kRUdVYoiXi
	 6YFFzMOUVFg8tT1CGTlY3nxoILM3ufT7dJ6g3h92fxJOYvpIW8WyTKs6/ntFWBTQn/
	 4yx4KpdeML+JU6sfMDoBww9o/Aeb7wKAMruQRDvg=
From: "pinskia at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/113517] New: vector SLP cost model should be improved
Date: Sat, 20 Jan 2024 05:42:33 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: pinskia at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 keywords bug_severity priority component assigned_to reporter
 target_milestone cf_gcctarget
Message-ID: <bug-113517-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113517

            Bug ID: 113517
           Summary: vector SLP cost model should be improved
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
void f(short *a, signed int *b)
{
        int sum =3D 0;
        sum +=3D a[0];
        sum +=3D a[1];
        sum +=3D a[2];
        sum +=3D a[3];
        *b =3D sum;
}
```

Right now by default this produces:
```
        ldrsh   w3, [x0]
        ldrsh   w4, [x0, 2]
        ldrsh   w2, [x0, 4]
        add     w3, w3, w4
        ldrsh   w0, [x0, 6]
        add     w2, w2, w3
        add     w0, w0, w2
        str     w0, [x1]
```

But disabling the cost model we get:
```
        ldr     d31, [x0]
        saddlv  s31, v31.4h
        str     s31, [x1]
        ret
```

(note this is better code generation than what LLVM produces as that uses
sshll/addv ).

For most cores, doing a float (vector) load and one vector instruction and =
one
vector store is better than doing 4 scalar loads and 3 scalar instructions =
and
one scalar store.  This is true on even ThunderX 1 and Cortex-A57.=