From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108635 invoked by alias); 10 May 2018 17:41:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 108538 invoked by uid 89); 10 May 2018 17:41:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=D*ru, H*RU:209.85.128.193, Hx-spam-relays-external:209.85.128.193, trading X-HELO: mail-wr0-f193.google.com Received: from mail-wr0-f193.google.com (HELO mail-wr0-f193.google.com) (209.85.128.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 10 May 2018 17:41:02 +0000 Received: by mail-wr0-f193.google.com with SMTP id p18-v6so2816804wrm.1 for ; Thu, 10 May 2018 10:41:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:subject:to:from:message-id; bh=/XwGAiqEQWTC1yxVwbCyByxvxeNGDMecejoS8DnmS38=; b=iatn1SWzz+Uku7LKbSrVHxZM9ttxq+2kmETAZwWxVxoZaKzk3OwSE0iuW/9NWS+Wkj 3qO+sv1fxx5kJOVtutuCdfTgCVkoFA549c5DM/yN4TiI475uv0OPdAOGT2W1ar/QsWlc S306to1QXw2TNPRmo8gJDtar3T+EFe7y/lCgSmpjuWDWRiBgy1d6vOlOEOX3mIOHT9hc j5Y+7zAkC1FPRrJ/xfJ7iDOQN2S4jB1T85wZJsZBL0DwKIr69LUbkMw9jGLw+InSKifG 5b/2HUKVFyzwJ/8BxpqZomTMQBdw99XUXTfKo97KeimNeQSyikcnnlx5FYioHTlZ1S8L xBrQ== X-Gm-Message-State: ALKqPwdpI6dVkgXkg2Avrbs1ztT2P9+rzTzRczlUjJvCxCOZVvM6OTlr 8z/SiP7/nxXKubti7zWK4zAaFT0V X-Google-Smtp-Source: AB8JxZr6PJVf63eidu3ZmroL0Mhx63EHokWCYmqOpAvPiBPdS0RJrZn1X4FuWZE8KCGl7s54oiRlJw== X-Received: by 2002:adf:cd08:: with SMTP id w8-v6mr2098348wrm.187.1525974060380; Thu, 10 May 2018 10:41:00 -0700 (PDT) Received: from [192.168.178.32] (p2E530C99.dip0.t-ipconnect.de. [46.83.12.153]) by smtp.gmail.com with ESMTPSA id n23-v6sm1266907wmc.23.2018.05.10.10.40.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 May 2018 10:40:59 -0700 (PDT) Date: Thu, 10 May 2018 17:43:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: <20180510155641.2950-1-amonakov@ispras.ru> References: <20180510155641.2950-1-amonakov@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH 0/2] Introduce gcc_qsort To: gcc-patches@gcc.gnu.org,Alexander Monakov From: Richard Biener Message-ID: <82E69EBE-92EB-41AB-8E1B-ADBD65516D73@gmail.com> X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg00487.txt.bz2 On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov wrote: >Hello. > >This introduces a replacement for qsort() in GCC. The main selling >point is >reproducibility (currently compiler output may change depending on how >libc >qsort reorders not-bitwise-identical elements that compare equal) with >a=20 >small improvement speed-wise and small code growth (under 2K on >x86-64). > >The opening comment in sort.cc gives a brief implementation overview: > >/* This implements a sort function suitable for GCC use cases: > - signature-compatible to C qsort, but relaxed contract: > - may apply the comparator to elements in a temporary buffer What consequences has this or rather how is this observable and makes compa= rators behave differently?=20 Otherwise thanks for doing this. Will review tomorrow.=20 Richard.=20 > - may abort on allocation failure > - deterministic (but not necessarily stable) > - fast, especially for common cases (0-5 elements of size 8 or 4) > > The implementation uses a network sort for up to 5 elements and > a merge sort on top of that. Neither stage has branches depending on >comparator result, trading extra arithmetic for branch mispredictions.=20 >*/ > >I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 >compilation. > >Overall the new implementation is roughly 30% faster compared to Glibc >qsort, >with 2x or more speedup for cases with tiny element count. I see one >instance >where the new approach is significantly (1.5x) slower: it is ipa-icf.c: >sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 >entries) >and needs 14 indirect loads just to reach values to compare, so when >branch >prediction manages to guess correctly, it allows to overlap execution >of two >comparators and better hide their cache miss latency. > >Overall GCC spends about 0.3% time under qsort, but this doesn't >automatically >mean that this brings a 0.1% speed improvement: it may be larger or >smaller >depending on how new code affects cache behavior and branch predictors >in >other code, and it's not trivial to measure precisely. > >I can go into more detail about measured stats if there's interest :) > >Makefile.in changes are separated to patch 2 in the hope it'd make >review >easier, but the two patches will need to be applied together. > >Bootstrapped/regtested on x86-64, OK for trunk? > >Alexander Monakov (2): > gcc_qsort: source code changes > gcc_qsort: build system changes > > gcc/Makefile.in | 9 ++- >gcc/sort.cc | 232 >++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gcc/system.h | 7 +- > gcc/vec.c | 2 +- > 4 files changed, 243 insertions(+), 7 deletions(-) > create mode 100644 gcc/sort.cc