From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 426B83858CDA; Sun, 26 Mar 2023 14:43:54 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 426B83858CDA
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1679841834;
	bh=nEUB7x54mau/Y7A6NKqGIKoMhtNehq77YCuBcmAP8oc=;
	h=From:To:Subject:Date:From;
	b=G1p9XsQDHZj7RyJw76JaLDLVm4QDOnXyMlxLKJKZBlHfqCKbrdzljpqDxJ7w39I4A
	 YSGobZc1u0xIRQp1zUPb6222lP0QUZK/qVc/rizlEP2NumdsK10jmyTg196PJu26sr
	 JnjQ2AFqm+q+sCqZuClzBd97tpbgwZN4TCpjcQSY=
From: "milasudril at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/109287] New: Optimizing sal shr pairs when inlining
 function
Date: Sun, 26 Mar 2023 14:43:53 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 12.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: milasudril at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_file_loc
 bug_status keywords bug_severity priority component assigned_to reporter
 target_milestone cf_gcctarget
Message-ID: <bug-109287-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109287

            Bug ID: 109287
           Summary: Optimizing sal shr pairs when inlining function
           Product: gcc
           Version: 12.2.0
               URL: https://gcc.godbolt.org/z/aPTsjc1sM
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: milasudril at gmail dot com
  Target Milestone: ---
            Target: x86-64_linux_gnu

I was trying to construct a span type to be used for working with a tile-ba=
sed
image

```
#include <cstdint>
#include <type_traits>
#include <cstddef>

template<class T, size_t TileSize>
class span_2d_tiled
{
public:
    using IndexType =3D size_t;

    static constexpr size_t tile_size()
    {
        return TileSize;
    }

    constexpr explicit span_2d_tiled(): span_2d_tiled{0u, 0u, nullptr} {}

    constexpr explicit span_2d_tiled(IndexType w, IndexType h, T* ptr):
        m_tilecount_x{1 + (w - 1)/TileSize},
        m_tilecount_y{1 + (h - 1)/TileSize},
        m_ptr{ptr}
    {}

    constexpr auto tilecount_x() const { return m_tilecount_x; }

    constexpr auto tilecount_y() const { return m_tilecount_y; }

    constexpr T& operator()(IndexType x, IndexType y) const
    {
        auto const x_tile =3D x/TileSize;
        auto const y_tile =3D y/TileSize;
        auto const x_offset =3D x%TileSize;
        auto const y_offset =3D y%TileSize;
        auto const tile_start =3D y_tile*m_tilecount_x + x_tile;

        return *(m_ptr + tile_start + y_offset*TileSize + x_offset);
    }

private:
    IndexType m_tilecount_x;
    IndexType m_tilecount_y;
    T* m_ptr;
};

template<size_t TileSize, class Func>
void visit_tiles(size_t x_count, size_t y_count, Func&& f)
{
    for(size_t k =3D 0; k !=3D y_count; ++k)
    {
        for(size_t l =3D 0; l !=3D x_count; ++l)
        {
            for(size_t y =3D 0; y !=3D TileSize; ++y)
            {
                for(size_t x =3D 0; x !=3D TileSize; ++x)
                {
                    f(l*TileSize + x, k*TileSize + y);
                }
            }
        }
    }
}

void do_stuff(float);

void call_do_stuff(span_2d_tiled<float, 16> foo)
{
    visit_tiles<decltype(foo)::tile_size()>(foo.tilecount_x(),
foo.tilecount_y(), [foo](size_t x, size_t y){
        do_stuff(foo(x, y));
    });
}
```

Here, the user of this API wants to access individual pixels. Thus, the
coordinates are transformed before calling f. To do so, we multiply by Tile=
Size
and adds the appropriate offset. In the callback, the pixel value is looked=
 up.
But now we must find out what tile it is, and the offset within that tile,
which means that the inverse transformation must be applied. As can be seen=
 in
the Godbolt link, GCC does not fully understand what is going on here. Howe=
ver,
latest clang appears to do a much better job with the same settings. It also
unrolls the inner loop, much better than if I used

```
#pragma GCC unroll 16
```=