public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104950] New: GCC does not emit branchless code
@ 2022-03-16  8:51 vincenzo.innocente at cern dot ch
  2022-03-16  9:08 ` [Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2022-03-16  8:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950

            Bug ID: 104950
           Summary: GCC does not emit branchless code
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

In this example GCC fails to emit branchless code while CLANG does.
In the actual application, measurements shows slow down up to a factor 2.
I managed to force branchless (-DBL) but the code is pretty unfriendly
godbolt link (GCC, clang, GCC -DBL 

https://godbolt.org/z/KWY1rjhhY



and here inlined

include <vector>
const float defaultBaseResponse = 0.5;
class DForest {
public:
    //based on FastForest::evaluate() and BDTree::parseTree()
    DForest() {
    }
    float evaluate(const float* features) const;

    std::vector<int> rootIndices_;
    //"node" layout: cut, index, left, right
    struct Node{
        float v; int i,l,r;
        constexpr int eval(float const * f) const {
#ifdef BL 
          auto m = f[i] > v;
          return *((&l) + int(m));
#else
          return f[i] > v ? r : l;
#endif
        }
    };
    std::vector<Node> nodes_;
    std::vector<float> responses_;
    std::vector<float> baseResponses_;
};

float DForest::evaluate(const float* features) const{
    float sum{defaultBaseResponse + baseResponses_[0]};
    for(int index : rootIndices_){
        do {
            index = nodes_[index].eval(features);
        } while (index>0);
        sum += responses_[-index];
    }
    return sum;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-16  9:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16  8:51 [Bug tree-optimization/104950] New: GCC does not emit branchless code vincenzo.innocente at cern dot ch
2022-03-16  9:08 ` [Bug rtl-optimization/104950] GCC does not emit branchless code for load next to each other pinskia at gcc dot gnu.org
2022-03-16  9:18 ` rguenth at gcc dot gnu.org
2022-03-16  9:21 ` rguenth at gcc dot gnu.org
2022-03-16  9:27 ` crazylht at gmail dot com
2022-03-16  9:31 ` pinskia at gcc dot gnu.org
2022-03-16  9:58 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).