From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 90B5F3858C56; Tue, 27 Feb 2024 06:02:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90B5F3858C56 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709013776; bh=q+yTCo03mt8/HHnGobXY7DBVSTkf04j8CsX93hC5DNI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=yYpkoILGPe/BReGkBTqUDPyFV2Eiy10ElNvyW8h9Z2zfuU3wCMNOMf2pAy63yx3Q1 MVhcmXJ1kjffrFBV3RK8j4zCtIKsSGCklxfv0bjPpR9sAejOoqlLBittpJDgOYhsFb YCbWqCCKrUa16jOOXu5CvbpSi1SQI9ZSp3QQG3p8= From: "liuhongt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling Date: Tue, 27 Feb 2024 06:02:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: liuhongt at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112325 --- Comment #9 from Hongtao Liu --- The original case is a little different from the one in PR. It comes from ggml #include #include typedef uint16_t ggml_fp16_t; static float table_f32_f16[1 << 16]; inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) { uint16_t s; memcpy(&s, &f, sizeof(uint16_t)); return table_f32_f16[s]; } typedef struct { ggml_fp16_t d; ggml_fp16_t m; uint8_t qh[4]; uint8_t qs[32 / 2]; } block_q5_1; typedef struct { float d; float s; int8_t qs[32]; } block_q8_1; void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy) { const int qk =3D 32; const int nb =3D n / qk; const block_q5_1 * restrict x =3D vx; const block_q8_1 * restrict y =3D vy; float sumf =3D 0.0; for (int i =3D 0; i < nb; i++) { uint32_t qh; memcpy(&qh, x[i].qh, sizeof(qh)); int sumi =3D 0; for (int j =3D 0; j < qk/2; ++j) { const uint8_t xh_0 =3D ((qh >> (j + 0)) << 4) & 0x10; const uint8_t xh_1 =3D ((qh >> (j + 12)) ) & 0x10; const int32_t x0 =3D (x[i].qs[j] & 0xF) | xh_0; const int32_t x1 =3D (x[i].qs[j] >> 4) | xh_1; sumi +=3D (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]); } sumf +=3D (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s; } *s =3D sumf; }=