From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7322 invoked by alias); 5 May 2003 09:36:01 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 7280 invoked by uid 71); 5 May 2003 09:36:00 -0000 Resent-Date: 5 May 2003 09:36:00 -0000 Resent-Message-ID: <20030505093600.7279.qmail@sources.redhat.com> Resent-From: gcc-gnats@gcc.gnu.org (GNATS Filer) Resent-Cc: gcc-prs@gcc.gnu.org, gcc-bugs@gcc.gnu.org Resent-Reply-To: gcc-gnats@gcc.gnu.org, thome@lix.polytechnique.fr Received: (qmail 4364 invoked by uid 48); 5 May 2003 09:31:26 -0000 Message-Id: <20030505093126.4363.qmail@sources.redhat.com> Date: Mon, 05 May 2003 09:36:00 -0000 From: thome@lix.polytechnique.fr Reply-To: thome@lix.polytechnique.fr To: gcc-gnats@gcc.gnu.org X-Send-Pr-Version: gnatsweb-2.9.3 (1.1.1.1.2.31) Subject: optimization/10625: -freduce-all-givs has negative effect X-SW-Source: 2003-05/txt/msg00264.txt.bz2 List-Id: >Number: 10625 >Category: optimization >Synopsis: -freduce-all-givs has negative effect >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: pessimizes-code >Submitter-Id: net >Arrival-Date: Mon May 05 09:36:00 UTC 2003 >Closed-Date: >Last-Modified: >Originator: thome@lix.polytechnique.fr >Release: gcc-3.2 >Organization: >Environment: Red Hat Linux 8.0 >Description: Currently, on my P3 866MHz, the attached code takes 200ms with: -mcpu=pentiumpro -O3 -funroll-loops, and 232ms with: -mcpu=pentiumpro -O3 -funroll-loops -freduce-all-givs. -freduce-all-givs impacts negatively on the performance by around 15%. The code also suffers from the inability of unroll-loops to expand the two nested loops (the inner loop becomes constant when the outer is expanded). For the record, a hand-made unrolling of these loops can yield a 25% speedup (hence 150ms). Further hand-tweaking of the assembly code can improve the performace by an additional 20% or so. This code could probably be better optimized by gcc than what happens now. gcc version is here ; I know, that's a cvs version patched with some rh specifities. Bash me if redhat spewed in patches messing up with the areas of the code concerned, but I consider this unlikely. Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --host=i386-redhat-linux --with-system-zlib --enable-__cxa_atexit Thread model: posix gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7) >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: ----gnatsweb-attachment---- Content-Type: application/octet-stream; name="gccbug.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="gccbug.c" Ci8qCiAqIEN1cnJlbnRseSwgb24gbXkgUDMgODY2TUh6LCB0aGlzIGNvZGUgdGFrZXMKICogCiAq IDIwMG1zIHdpdGggLW1jcHU9cGVudGl1bXBybyAtTzMgLWZ1bnJvbGwtbG9vcHMsCiAqIGFuZCAy MzJtcyB3aXRoIC1tY3B1PXBlbnRpdW1wcm8gLU8zIC1mdW5yb2xsLWxvb3BzIC1mcmVkdWNlLWFs bC1naXZzLgogKgogKiAtZnJlZHVjZS1hbGwtZ2l2cyBpbXBhY3RzIG5lZ2F0aXZlbHkgb24gdGhl IHBlcmZvcm1hbmNlIGJ5IGFyb3VuZCAxNSUuCiAqCiAqIFRoZSBjb2RlIGFsc28gc3VmZmVycyBm cm9tIHRoZSBpbmFiaWxpdHkgb2YgdW5yb2xsLWxvb3BzIHRvIGV4cGFuZCB0aGUKICogdHdvIGlt YnJpY2F0ZWQgbG9vcHMgKHRoZSBpbm5lciBsb29wIGJlY29tZXMgY29uc3RhbnQgd2hlbiB0aGUg b3V0ZXIgaXMKICogZXhwYW5kZWQpLgogKgogKiBGb3IgdGhlIHJlY29yZCwgYSBoYW5kLW1hZGUg dW5yb2xsaW5nIG9mIHRoZXNlIGxvb3BzIGNhbiB5aWVsZCBhIDI1JQogKiBzcGVlZHVwIChoZW5j ZSAxNTBtcykuIEZ1cnRoZXIgaGFuZC10d2Vha2luZyBvZiB0aGUgYXNzZW1ibHkgY29kZSBjYW4K ICogaW1wcm92ZSB0aGUgcGVyZm9ybWFjZSBieSBhbiBhZGRpdGlvbmFsIDIwJSBvciBzby4KICoK ICogVGhpcyBjb2RlIGNvdWxkIHByb2JhYmx5IGJlIGJldHRlciBvcHRpbWl6ZWQgYnkgZ2NjIHRo YW4gd2hhdCBoYXBwZW5zCiAqIG5vdy4KICoKICogZ2NjIHZlcnNpb24gaXMgaGVyZSA7IEkga25v dywgdGhhdCdzIGEgY3ZzIHZlcnNpb24gcGF0Y2hlZCB3aXRoIHNvbWUKICogcmggc3BlY2lmaXRp ZXMuIEJhc2ggbWUgaWYgcmVkaGF0IHNwZXdlZCBpbiBwYXRjaGVzIG1lc3NpbmcgdXAgd2l0aAog KiB0aGUgYXJlYXMgb2YgdGhlIGNvZGUgY29uY2VybmVkLCBidXQgSSBjb25zaWRlciB0aGlzIHVu bGlrZWx5LgoKUmVhZGluZyBzcGVjcyBmcm9tIC91c3IvbGliL2djYy1saWIvaTM4Ni1yZWRoYXQt bGludXgvMy4yL3NwZWNzCkNvbmZpZ3VyZWQgd2l0aDogLi4vY29uZmlndXJlIC0tcHJlZml4PS91 c3IgLS1tYW5kaXI9L3Vzci9zaGFyZS9tYW4gLS1pbmZvZGlyPS91c3Ivc2hhcmUvaW5mbyAtLWVu YWJsZS1zaGFyZWQgLS1lbmFibGUtdGhyZWFkcz1wb3NpeCAtLWRpc2FibGUtY2hlY2tpbmcgLS1o b3N0PWkzODYtcmVkaGF0LWxpbnV4IC0td2l0aC1zeXN0ZW0temxpYiAtLWVuYWJsZS1fX2N4YV9h dGV4aXQKVGhyZWFkIG1vZGVsOiBwb3NpeApnY2MgdmVyc2lvbiAzLjIgMjAwMjA5MDMgKFJlZCBI YXQgTGludXggOC4wIDMuMi03KQoKICovCiNpbmNsdWRlIDxzdGRsaWIuaD4KCnR5cGVkZWYgdW5z aWduZWQgbG9uZyBtcF9saW1iX3Q7CgojZGVmaW5lIHVtdWxfcHBtbSh3MSwgdzAsIHUsIHYpIFwK ICBfX2FzbV9fICgibXVsbCAlMyIJCQkJCQkJXAoJICAgOiAiPWEiICh3MCksICI9ZCIgKHcxKQkJ CQkJXAoJICAgOiAiJTAiICh1KSwgInJtIiAodikpCiNkZWZpbmUgY2FycnlfdGVybWluYXRlXzIo dzEsdzAsYykJXAogIGFzbSAoICIJYWRkbCAlMiwgJTBcbiIJCVwKCSIJYWRjbCAkMCwgJTFcbiIJ CVwKCQkgIDogIityIiAodzApLCAiK3JtIiAodzEpIDogInJtIiAoYykpCiNkZWZpbmUgYWRkMl9z dHJlYW0oZDEsZDAsczEsczAsaW4pCVwKICBhc20gKAkiCWFkZGwJJTQsJTFcbiIJXAoJIglhZGNs CSUzLCUwXG4iCVwKCSIJYWRjbAkkMCwlMlxuIglcCgkJOiAiK3IiIChkMSksICIrciIgKGQwKSwg IityIiAoczEpIDogInJtIiAoczApLCAicm0iIChpbikpCgppbmxpbmUgdm9pZCBNVUw1KG1wX2xp bWJfdCAqIHIsY29uc3QgbXBfbGltYl90ICogczEsY29uc3QgbXBfbGltYl90ICogczIpCnsKCW1w X2xpbWJfdCB1LHYsdyx6OwoKCWludCBpLGo7Cgljb25zdCBpbnQgbj01OwoJCgkvKiBGaWxsIGlu IHRoZSBibGFua3MgZmlyc3QuLi4gKi8KCgl1bXVsX3BwbW0odSxyWzBdLHMxWzBdLHMyWzBdKTsK CS8qIHNvbWUgc3R1ZmYgcGVuZGluZyBpbiB1ICovCgoJZm9yKGk9MTtpPD1uLTI7aSsrKSB7CgkJ dW11bF9wcG1tKHcsdixzMVtpXSxzMlswXSk7CgkJY2FycnlfdGVybWluYXRlXzIodyx2LHUpOwoJ CXJbaV09djsKCQl1PXc7CgkJLyogc29tZSBzdHVmZiBwZW5kaW5nIGluIHUgKi8KCQkvKiByWzAu LmldIGhhdmUgYmVlbiB3cml0dGVuICovCgl9CgoJcltuLTFdPXUrczFbbi0xXSpzMlswXTsKCgkv KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKiovCgoJZm9yKGo9MTtqPD1uLTI7 aisrKSB7CgkJdW11bF9wcG1tKHYsdSxzMVswXSxzMltqXSk7CgkJLyogc29tZSBzdHVmZiBwZW5k aW5nIGluIHYsdSA7IHcseiBhdmFpbGFibGUgKi8KCQlmb3IoaT0xO2krai0xPG4tMjtpKyspIHsK CQkJdW11bF9wcG1tKHosdyxzMVtpXSxzMltqXSk7CgkJCWFkZDJfc3RyZWFtKHYscltpK2otMV0s eix3LHUpOwoJCQkvKiBhZGQgdSB0byByW2krai0xXSAod2FzIHBlbmRpbmcpLiBBZGQgdyBhbmQg dGhlCgkJCSAqIGNhcnJ5IG91dCB0byB2LiBBZGQgdGhlIGNhcnJ5IG91dCB0byB6LgoJCQkgKi8K CQkJdT12OwoJCQl2PXo7CgkJCS8qIHNvbWUgc3R1ZmYgcGVuZGluZyBpbiB2LHUgOyB3LHogYXZh aWxhYmxlLgoJCQkgKiBBZGQtdXAgdG8gcltpK2pdIGlzIHBlbmRpbmcgKGluIHUpICovCgoJCQkv KiBXcml0ZXMgaW4gcltqLi5qK2ktMV0gaGF2ZSBiZWVuIHBlcmZvcm1lZCAqLwoJCX0KCgkJLyoK CQlBU1NFUlQoaitpLTEgPT0gbi0yKTsKCQlBU1NFUlQoaitpID09IG4tMSk7CgkJKi8KCgkJLyog d3JpdGUgdG8gcltuLTJdIGlzIHBlbmRpbmcsIGRhdGEgaXMgaW4gdS4gKi8KCQoJCXc9czFbaV0q czJbal07CgkJY2FycnlfdGVybWluYXRlXzIodyxyW24tMl0sdSk7CgkJcltuLTFdKz13K3Y7Cgl9 CgoJcltuLTFdKz1zMVswXSpzMltuLTFdOwp9CgppbnQgbWFpbigpCnsKCWludCBpOwoKCW1wX2xp bWJfdCBhWzVdLGJbNV0sY1s1XTsKCglmb3IoaT0wO2k8NTtpKyspIHsKCQlhW2ldPXJhbmQoKTsK CQliW2ldPXJhbmQoKTsKCX0KCglmb3IoaT0wO2k8MTAwMDAwMDtpKyspIHsKCQlNVUw1KGMsYSxi KTsKCQlhWzBdKz1jWzRdOwoJfQoJcmV0dXJuIGNbNF07Cn0K