From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3D59B3858D32; Mon, 8 Apr 2024 09:58:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3D59B3858D32 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1712570317; bh=FIRlHzDAVlOub1YvnWPAj1nnGzVuTMw5hF87230Io5Q=; h=From:To:Subject:Date:From; b=JUdoEDXT15R7jUx4TTpFXmZICf33zYsLV5fhImlgCqe30eHtYAyyO2hYoRTTzkqnq RkVscFUNpAX+AsUibFrdCOLLTFbsgcA/aBUcTiquQgUbpMHUPe8jlwZ8VeTmDseUs+ b/TfROKJii51AfJ88z1rhHH0TFZi/LSk326dHbw0= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114635] New: OpenMP reductions fail dependency analysis Date: Mon, 08 Apr 2024 09:58:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114635 Bug ID: 114635 Summary: OpenMP reductions fail dependency analysis Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- The following testcase reduced from an HPC workload: #include #define RESTRICT restrict void work(int n, float *RESTRICT x, float *RESTRICT y, float *RESTRICT z, float *RESTRICT mass, float x0, float y0, float z0, float *RESTRICT ax, float *RESTRICT ay, float *RESTRICT az) { float lax =3D 0.0f, lay =3D 0.0f, laz =3D 0.0f; #if _OPENMP >=3D 201307 #pragma omp simd reduction(+:lax,lay,laz) #endif for (int i =3D 0; i < n; ++i) { float dx =3D x[i] - x0; float dy =3D y[i] - y0; float dz =3D z[i] - z0; float r2 =3D dx + dy + dz; if (r2 =3D=3D 0.0f) continue; float f =3D (1.0f / (r2 * sqrtf(r2))) * mass[i]; lax +=3D f * dx; lay +=3D f * dy; laz +=3D f * dz;=20 } *ax +=3D lax; *ay +=3D lay; *az +=3D laz; } when compiled with -Ofast -march=3Darmv9-a -fopenmp-simd vectorizes as expe= cted but when the pragma is in effect, e.g. -Ofast -march=3Darmv9-a -fopenmp th= en the main loop fails to vectorize with: (compute_affine_dependence ref_a: D.5962[_33], stmt_a: _69 =3D D.5962[_33]; ref_b: D.5962[_33], stmt_b: D.5962[_33] =3D _ifc__147; ) -> dependence analysis failed /app/example.c:16:17: missed: bad data dependence. /app/example.c:16:17: note: ***** Analysis failed with vector mode VNx4SF This doesn't seem to happen with just 2 reductions, but with 3 dependency analysis seems to fail. I don't know much about openmp but my understanding is that this pragma is intended for architectures that don't have masking support and works by splitting the loop and removing the reductions from the main loop creating openmp "workers" whom each work on one thread. the reduction values are turned into local arrays and these threads then wr= ite into their own slots into these arrays. The reduction itself is then done as a final post step. It looks like the only thing we can vectorize is the post step. I wonder, since the compiler is the one introducing these local arrays, can= we not mark them safe from inter dependencies?=