From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 244613858D37; Thu, 18 Apr 2024 18:01:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 244613858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1713463300; bh=+VbyjEkS+xw7a1rFWLEg9CXY9/CAci6+3xvl63GHZdg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=E3ZF0HVib4j8XoauGUQsy2j0IUkXi8dPkqD3xxwFSLGQrRi044h5yz6Y0ugjPN21T 1p7H0uhVKPCCaOmb8PbXM+oaOYUfAtsNYxyNmtoamEwMfWWHgsZJnHZt/3VgJJpPF5 fkyaoraE3zIIHVE+Q/0jeEAJ8QbgPv4pKHkXXCbg= From: "roger at nextmovesoftware dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal Date: Thu, 18 Apr 2024 18:01:39 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: roger at nextmovesoftware dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114767 --- Comment #5 from Roger Sayle --- Another interesting (simpler) case of -ffast-math pessimization is: void foo(_Complex double *c) { for (int i=3D0; i<16; i++) c[i] +=3D __builtin_complex(1.0,0.0); } Again without -ffast-math we vectorize consecutive additions, but with -ffast-math we (not so) cleverly avoid every second addition by producing significantly larger code that shuffles the real/imaginary parts around. This even suggests a missed-optimization for: void bar(_Complex double *c, double x) { for (int i=3D0; i<16; i++) c[i] +=3D x; } which may be more efficiently implemented (when safe) by: void bar(_Complex double *c, double x) { for (int i=3D0; i<16; i++) c[i] +=3D __builtin_complex(x,0.0); } i.e. insert/interleave a no-op zero addition, to simplify the vectorization. The existence of a suitable identity operation (+0, *1.0, &~0, |0, ^0) can = be used to avoid shuffling/permuting values/lanes out of vectors, when its possible for the vector operation to leave the other values unchanged.=