public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109287] New: Optimizing sal shr pairs when inlining function
@ 2023-03-26 14:43 milasudril at gmail dot com
  2023-05-20  1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
  2023-05-20  1:30 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: milasudril at gmail dot com @ 2023-03-26 14:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287

            Bug ID: 109287
           Summary: Optimizing sal shr pairs when inlining function
           Product: gcc
           Version: 12.2.0
               URL: https://gcc.godbolt.org/z/aPTsjc1sM
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: milasudril at gmail dot com
  Target Milestone: ---
            Target: x86-64_linux_gnu

I was trying to construct a span type to be used for working with a tile-based
image

```
#include <cstdint>
#include <type_traits>
#include <cstddef>

template<class T, size_t TileSize>
class span_2d_tiled
{
public:
    using IndexType = size_t;

    static constexpr size_t tile_size()
    {
        return TileSize;
    }

    constexpr explicit span_2d_tiled(): span_2d_tiled{0u, 0u, nullptr} {}

    constexpr explicit span_2d_tiled(IndexType w, IndexType h, T* ptr):
        m_tilecount_x{1 + (w - 1)/TileSize},
        m_tilecount_y{1 + (h - 1)/TileSize},
        m_ptr{ptr}
    {}

    constexpr auto tilecount_x() const { return m_tilecount_x; }

    constexpr auto tilecount_y() const { return m_tilecount_y; }

    constexpr T& operator()(IndexType x, IndexType y) const
    {
        auto const x_tile = x/TileSize;
        auto const y_tile = y/TileSize;
        auto const x_offset = x%TileSize;
        auto const y_offset = y%TileSize;
        auto const tile_start = y_tile*m_tilecount_x + x_tile;

        return *(m_ptr + tile_start + y_offset*TileSize + x_offset);
    }

private:
    IndexType m_tilecount_x;
    IndexType m_tilecount_y;
    T* m_ptr;
};

template<size_t TileSize, class Func>
void visit_tiles(size_t x_count, size_t y_count, Func&& f)
{
    for(size_t k = 0; k != y_count; ++k)
    {
        for(size_t l = 0; l != x_count; ++l)
        {
            for(size_t y = 0; y != TileSize; ++y)
            {
                for(size_t x = 0; x != TileSize; ++x)
                {
                    f(l*TileSize + x, k*TileSize + y);
                }
            }
        }
    }
}

void do_stuff(float);

void call_do_stuff(span_2d_tiled<float, 16> foo)
{
    visit_tiles<decltype(foo)::tile_size()>(foo.tilecount_x(),
foo.tilecount_y(), [foo](size_t x, size_t y){
        do_stuff(foo(x, y));
    });
}
```

Here, the user of this API wants to access individual pixels. Thus, the
coordinates are transformed before calling f. To do so, we multiply by TileSize
and adds the appropriate offset. In the callback, the pixel value is looked up.
But now we must find out what tile it is, and the offset within that tile,
which means that the inverse transformation must be applied. As can be seen in
the Godbolt link, GCC does not fully understand what is going on here. However,
latest clang appears to do a much better job with the same settings. It also
unrolls the inner loop, much better than if I used

```
#pragma GCC unroll 16
```

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/109287] Optimizing sal shr pairs when inlining function
  2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
@ 2023-05-20  1:24 ` pinskia at gcc dot gnu.org
  2023-05-20  1:30 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-20  1:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-05-20
          Component|middle-end                  |tree-optimization
           Severity|normal                      |enhancement

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced down to:
unsigned f(unsigned t, unsigned b, unsigned *tt)
{
        t *= 16;
        t+= b;
        unsigned ttt =  t/16;
        *tt = t%16;
        return ttt;
}

Confirmed.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/109287] Optimizing sal shr pairs when inlining function
  2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
  2023-05-20  1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
@ 2023-05-20  1:30 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-20  1:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually it is closer to:
unsigned f(unsigned t, unsigned b, unsigned *tt)
{
        if (b >= 16) __builtin_unreachable();
        t *= 16;
        t+= b;
        *tt = t%16;
        unsigned ttt =  t/16;
        return ttt;
}

As we know the range of b will be [0,15] due to the loop

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-20  1:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
2023-05-20  1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
2023-05-20  1:30 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).