From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0958D3858401; Sat, 27 May 2023 03:33:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0958D3858401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685158420; bh=8/I2RYmsuMlUW+0nkIrHYiMwUccCfvL+QbRfgnhF/Iw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NXxvwqU2ByRCrEzshE6nIoocYt5LQXWnBB73Akmq7piI0FGvl8LKdsmYtWizvtEiR CCrtL1RUweVPBiZK9fdefgD+4DY5uq5sLoGiXUikCB5XNmC71rnEWmgPWEGn5Ikywu MzJ3wglU0MjpdfHlQe6Cn8CoudY6W7i6UHH9I72U= From: "richard.yao at alumni dot stonybrook.edu" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/109901] Optimization opportunity: ((((a) > (b)) - ((a) < (b))) < 0) -> ((a) < (b)) Date: Sat, 27 May 2023 03:33:39 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: richard.yao at alumni dot stonybrook.edu X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109901 --- Comment #8 from Richard Yao = --- Created attachment 55177 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D55177&action=3Dedit Source code for micro-benchmark. Here is an example of how not having this optimization slows us down: https://gcc.godbolt.org/z/nxdrb17G7 custom_binary_search_slow() and custom_binary_search_fast() are the same function. The only difference is that I manually applied the ((((a) > (b)) - ((a) < (b))) <=3D 0) -> ((a) <=3D (b)) transformation to see what GCC would generate if it were able to do this transformation. This optimization alone makes binary search ~78% faster on my Ryzen 7 5800X: Benchmark: array size: 1024, runs: 1000, repetitions: 10000, seed: 16851581= 01, density: 10 Even distribution with 1024 32 bit integers, random access | Name | Items | H= its | Misses | Time | | ---------- | ---------- | -------= --- | ---------- | ---------- | | custom_binary_search_slow | 1024 | = 983 | 9017 | 0.000313 | | custom_binary_search_fast | 1024 | = 983 | 9017 | 0.000176 | I modified the microbenchmark from scandum/binary_search to better suit a workload that I am micro-optimizing: https://github.com/scandum/binary_search In specific, I wanted to test on small arrays, but avoid cache effects contaminating the results. One could easily add the search functions from my modified version into the original to get get numbers for bigger array size= s. I have attached the source code for the modified micro-benchmark. The above= run was done after compiling with -O2 -fno-inline. Compiling with just -O2 does= not make much difference, since I deleted the code where -fno-inline makes a difference from that file since it was not relevant to this issue.=