public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109287] New: Optimizing sal shr pairs when inlining function
@ 2023-03-26 14:43 milasudril at gmail dot com
2023-05-20 1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
2023-05-20 1:30 ` pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: milasudril at gmail dot com @ 2023-03-26 14:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287
Bug ID: 109287
Summary: Optimizing sal shr pairs when inlining function
Product: gcc
Version: 12.2.0
URL: https://gcc.godbolt.org/z/aPTsjc1sM
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: milasudril at gmail dot com
Target Milestone: ---
Target: x86-64_linux_gnu
I was trying to construct a span type to be used for working with a tile-based
image
```
#include <cstdint>
#include <type_traits>
#include <cstddef>
template<class T, size_t TileSize>
class span_2d_tiled
{
public:
using IndexType = size_t;
static constexpr size_t tile_size()
{
return TileSize;
}
constexpr explicit span_2d_tiled(): span_2d_tiled{0u, 0u, nullptr} {}
constexpr explicit span_2d_tiled(IndexType w, IndexType h, T* ptr):
m_tilecount_x{1 + (w - 1)/TileSize},
m_tilecount_y{1 + (h - 1)/TileSize},
m_ptr{ptr}
{}
constexpr auto tilecount_x() const { return m_tilecount_x; }
constexpr auto tilecount_y() const { return m_tilecount_y; }
constexpr T& operator()(IndexType x, IndexType y) const
{
auto const x_tile = x/TileSize;
auto const y_tile = y/TileSize;
auto const x_offset = x%TileSize;
auto const y_offset = y%TileSize;
auto const tile_start = y_tile*m_tilecount_x + x_tile;
return *(m_ptr + tile_start + y_offset*TileSize + x_offset);
}
private:
IndexType m_tilecount_x;
IndexType m_tilecount_y;
T* m_ptr;
};
template<size_t TileSize, class Func>
void visit_tiles(size_t x_count, size_t y_count, Func&& f)
{
for(size_t k = 0; k != y_count; ++k)
{
for(size_t l = 0; l != x_count; ++l)
{
for(size_t y = 0; y != TileSize; ++y)
{
for(size_t x = 0; x != TileSize; ++x)
{
f(l*TileSize + x, k*TileSize + y);
}
}
}
}
}
void do_stuff(float);
void call_do_stuff(span_2d_tiled<float, 16> foo)
{
visit_tiles<decltype(foo)::tile_size()>(foo.tilecount_x(),
foo.tilecount_y(), [foo](size_t x, size_t y){
do_stuff(foo(x, y));
});
}
```
Here, the user of this API wants to access individual pixels. Thus, the
coordinates are transformed before calling f. To do so, we multiply by TileSize
and adds the appropriate offset. In the callback, the pixel value is looked up.
But now we must find out what tile it is, and the offset within that tile,
which means that the inverse transformation must be applied. As can be seen in
the Godbolt link, GCC does not fully understand what is going on here. However,
latest clang appears to do a much better job with the same settings. It also
unrolls the inner loop, much better than if I used
```
#pragma GCC unroll 16
```
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/109287] Optimizing sal shr pairs when inlining function
2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
@ 2023-05-20 1:24 ` pinskia at gcc dot gnu.org
2023-05-20 1:30 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-20 1:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-05-20
Component|middle-end |tree-optimization
Severity|normal |enhancement
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced down to:
unsigned f(unsigned t, unsigned b, unsigned *tt)
{
t *= 16;
t+= b;
unsigned ttt = t/16;
*tt = t%16;
return ttt;
}
Confirmed.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/109287] Optimizing sal shr pairs when inlining function
2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
2023-05-20 1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
@ 2023-05-20 1:30 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-20 1:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109287
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Actually it is closer to:
unsigned f(unsigned t, unsigned b, unsigned *tt)
{
if (b >= 16) __builtin_unreachable();
t *= 16;
t+= b;
*tt = t%16;
unsigned ttt = t/16;
return ttt;
}
As we know the range of b will be [0,15] due to the loop
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-20 1:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-26 14:43 [Bug c++/109287] New: Optimizing sal shr pairs when inlining function milasudril at gmail dot com
2023-05-20 1:24 ` [Bug tree-optimization/109287] " pinskia at gcc dot gnu.org
2023-05-20 1:30 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).