From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4AF333858C2B; Tue, 19 Dec 2023 05:05:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4AF333858C2B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1702962356; bh=cNcKGQ0gddr043OSYPrtMcBA7SnsRk1H5eBc2+KdRI4=; h=From:To:Subject:Date:From; b=yGb7RfO6XQIFxK9fqXORnzLre+f4MZKIULPE4pfR9MjDA79k8hyXw9VHRAOWQEz+T Sa2Lrgu7RtLknnwRsU7/9Xu0DilRzU7MLsWHRThw2o1UzRAg1QMm0fuA9WgsVNp7cN yF1GRaKqhE4+nK0nRv5VLbpbRSdfejUHkEv+HaEY= From: "liuhongt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector. Date: Tue, 19 Dec 2023 05:05:55 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: liuhongt at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113079 Bug ID: 113079 Summary: [x86] Fails to generate dot_prod instructions for 64-bit vector. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- int foo (int n, unsigned char* p, char* pi) { int sum =3D 0; for (int i =3D 0; i !=3D 8; i++) { sum +=3D p[i] * pi[i]; } return sum; } We can use 128-bit dot_prod instruction + clean upper 64 bits. Currently, g= cc generates a long instruction sequence. vmovq xmm0, QWORD PTR [rsi] vmovq xmm2, QWORD PTR [rdx] vpmovzxbw xmm1, xmm0 vpsrlq xmm0, xmm0, 32 vpmovsxbw xmm3, xmm2 vpmullw xmm1, xmm1, xmm3 vpsrlq xmm2, xmm2, 32 vpmovzxbw xmm0, xmm0 vpmovsxbw xmm2, xmm2 vpmullw xmm0, xmm0, xmm2 vpmovsxwd xmm2, xmm1 vpsrlq xmm1, xmm1, 32 vpmovsxwd xmm1, xmm1 vpaddd xmm2, xmm2, xmm1 vpmovsxwd xmm1, xmm0 vpsrlq xmm0, xmm0, 32 vpmovsxwd xmm0, xmm0 vpaddd xmm1, xmm1, xmm2 vpxor xmm2, xmm2, xmm2 vpshufb xmm2, xmm2, XMMWORD PTR .LC1[rip] vpaddd xmm0, xmm0, xmm1 vpshufb xmm1, xmm0, XMMWORD PTR .LC0[rip] vpor xmm1, xmm1, xmm2 vpaddd xmm0, xmm0, xmm1 vmovd eax, xmm0=