From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2B0BA385841D; Mon, 26 Dec 2022 09:20:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2B0BA385841D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1672046418; bh=03CUIXvhR08SoyF0IOBJvPw/R+DLcSJnCF7iZobEEWM=; h=From:To:Subject:Date:From; b=mau357kxluOHh5qb5UUlPn+9X4deyXC0MpOT2Z5pnYUJseu6R8b4UeDlyoKerPuCn web32UubiadXYYYkwusi8+F21zklERrnVbsr1Ga9qtGemS7z13RHD6c/05W4Mm1/gZ 4T2ShJfVq1L+LlW/LxdUNq+UWn5VFyXL+c1yVND8= From: "tkoenig at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108227] New: Unnecessary division when looping over array with size of elements not a power of two Date: Mon, 26 Dec 2022 09:20:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: tkoenig at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108227 Bug ID: 108227 Summary: Unnecessary division when looping over array with size of elements not a power of two Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tkoenig at gcc dot gnu.org Target Milestone: --- Consider typedef struct coord { double x, y, z; } coord; void foo(coord *from, coord *to) { unsigned long int n =3D to - from; for (unsigned long int i=3D0; i < n; i++) { from[i].x =3D from[i].x + 1.0; } } void bar (coord *from, coord *to) { char *c_from =3D (char *) from, *c_to =3D (char *) to; coord *p =3D from; long int c_n =3D c_to - c_from; for (long int i=3D0; i < c_n; i+=3D sizeof(coord)) { p->x =3D p->x + 1.0; p++; } } The code is functionally equivalent, but the assembly somewhat different: foo has foo: .LFB0: .cfi_startproc movabsq $-6148914691236517205, %rax movq %rsi, %rdx subq %rdi, %rdx sarq $3, %rdx imulq %rax, %rdx cmpq %rdi, %rsi je .L1 movsd .LC0(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: movsd (%rdi), %xmm0 addq $1, %rax addq $24, %rdi addsd %xmm1, %xmm0 movsd %xmm0, -24(%rdi) cmpq %rdx, %rax jb .L3 .L1: ret so it first divides by 12 (efficiently) to determine n. There are 7 instructions in the loop itself. bar has bar: .LFB1: .cfi_startproc subq %rdi, %rsi testq %rsi, %rsi jle .L6 movsd .LC0(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L8: movsd (%rdi,%rax), %xmm0 addsd %xmm1, %xmm0 movsd %xmm0, (%rdi,%rax) addq $24, %rax cmpq %rax, %rsi jg .L8 .L6: ret no need to divide, and one instruction less in the loop. I would expect foo to match bar.=