From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 0958D3858401; Sat, 27 May 2023 03:33:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0958D3858401
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1685158420;
	bh=8/I2RYmsuMlUW+0nkIrHYiMwUccCfvL+QbRfgnhF/Iw=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=NXxvwqU2ByRCrEzshE6nIoocYt5LQXWnBB73Akmq7piI0FGvl8LKdsmYtWizvtEiR
	 CCrtL1RUweVPBiZK9fdefgD+4DY5uq5sLoGiXUikCB5XNmC71rnEWmgPWEGn5Ikywu
	 MzJ3wglU0MjpdfHlQe6Cn8CoudY6W7i6UHH9I72U=
From: "richard.yao at alumni dot stonybrook.edu" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/109901] Optimization opportunity: ((((a) >
 (b)) - ((a) < (b))) < 0) -> ((a) < (b))
Date: Sat, 27 May 2023 03:33:39 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: richard.yao at alumni dot stonybrook.edu
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-109901-4-LOTrpoefV0@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-109901-4@http.gcc.gnu.org/bugzilla/>
References: <bug-109901-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109901

--- Comment #8 from Richard Yao <richard.yao at alumni dot stonybrook.edu> =
---
Created attachment 55177
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D55177&action=3Dedit
Source code for micro-benchmark.

Here is an example of how not having this optimization slows us down:

https://gcc.godbolt.org/z/nxdrb17G7

custom_binary_search_slow() and custom_binary_search_fast() are the same
function. The only difference is that I manually applied the ((((a) > (b)) -
((a) < (b))) <=3D 0) -> ((a) <=3D (b)) transformation to see what GCC would
generate if it were able to do this transformation.

This optimization alone makes binary search ~78% faster on my Ryzen 7 5800X:

Benchmark: array size: 1024, runs: 1000, repetitions: 10000, seed: 16851581=
01,
density: 10

Even distribution with 1024 32 bit integers, random access

|                                               Name |      Items |       H=
its
|     Misses |       Time |
|                                         ---------- | ---------- | -------=
---
| ---------- | ---------- |
|                          custom_binary_search_slow |       1024 |        =
983
|       9017 |   0.000313 |
|                          custom_binary_search_fast |       1024 |        =
983
|       9017 |   0.000176 |

I modified the microbenchmark from scandum/binary_search to better suit a
workload that I am micro-optimizing:

https://github.com/scandum/binary_search

In specific, I wanted to test on small arrays, but avoid cache effects
contaminating the results. One could easily add the search functions from my
modified version into the original to get get numbers for bigger array size=
s.

I have attached the source code for the modified micro-benchmark. The above=
 run
was done after compiling with -O2 -fno-inline. Compiling with just -O2 does=
 not
make much difference, since I deleted the code where -fno-inline makes a
difference from that file since it was not relevant to this issue.=