From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36781 invoked by alias); 12 Nov 2015 21:52:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 36686 invoked by uid 89); 12 Nov 2015 21:52:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_40,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: eggs.gnu.org Received: from eggs.gnu.org (HELO eggs.gnu.org) (208.118.235.92) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Thu, 12 Nov 2015 21:52:28 +0000 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zwzn4-0007d9-GI for gcc-patches@gcc.gnu.org; Thu, 12 Nov 2015 16:52:25 -0500 Received: from mel.act-europe.fr ([194.98.77.210]:37430 helo=smtp.eu.adacore.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zwzn4-0007d3-6s for gcc-patches@gcc.gnu.org; Thu, 12 Nov 2015 16:52:22 -0500 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id BCD4E330DB9A for ; Thu, 12 Nov 2015 22:52:20 +0100 (CET) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5L6Kqw-aFMVK for ; Thu, 12 Nov 2015 22:52:20 +0100 (CET) Received: from polaris.localnet (bon31-6-88-161-99-133.fbx.proxad.net [88.161.99.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id 8C648330DB94 for ; Thu, 12 Nov 2015 22:52:20 +0100 (CET) From: Eric Botcazou To: gcc-patches@gcc.gnu.org Subject: [i386] Rotate stack checking loop Date: Thu, 12 Nov 2015 21:52:00 -0000 Message-ID: <1476590.igsxHmG0Ly@polaris> User-Agent: KMail/4.14.9 (Linux/3.16.7-29-desktop; KDE/4.14.9; x86_64; ; ) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart14476345.vo4XT66hL9" Content-Transfer-Encoding: 7Bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 194.98.77.210 X-SW-Source: 2015-11/txt/msg01603.txt.bz2 This is a multi-part message in MIME format. --nextPart14476345.vo4XT66hL9 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Content-length: 996 Hi, this patch rotates the loop generated in the prologue to do stack checking when -fstack-check is specified, thereby saving one branch instruction. It was initially implemented as a WHILE loop to match the generic implementation but can be turned into a DO-WHILE loop because the amount of stack to be checked is known at compile time (since it's the static part of the frame). The patch also changes a mov+sub pair into an lea in the common case on Linux, saving one more instruction in the process. Tested on x86/Linux & x86-64/Linux (ix86_adjust_stack_and_probe path) and x86/Solaris (ix86_emit_probe_stack_range path). OK for the mainline? 2015-11-12 Eric Botcazou * config/i386/i386.c (ix86_adjust_stack_and_probe): Adjust and use an lea instruction when possible. (output_adjust_stack_and_probe): Rotate the loop and simplify. (ix86_emit_probe_stack_range): Adjust. (output_probe_stack_range): Rotate the loop and simplify. -- Eric Botcazou --nextPart14476345.vo4XT66hL9 Content-Disposition: attachment; filename="rotate_i386.diff" Content-Transfer-Encoding: 7Bit Content-Type: text/x-patch; charset="utf-8"; name="rotate_i386.diff" Content-length: 5491 Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 230245) +++ config/i386/i386.c (working copy) @@ -12137,10 +12137,10 @@ ix86_adjust_stack_and_probe (const HOST_ rtx size_rtx = GEN_INT (size), last; /* See if we have a constant small number of probes to generate. If so, - that's the easy case. The run-time loop is made up of 11 insns in the + that's the easy case. The run-time loop is made up of 9 insns in the generic case while the compile-time loop is made up of 3+2*(n-1) insns for n # of intervals. */ - if (size <= 5 * PROBE_INTERVAL) + if (size <= 4 * PROBE_INTERVAL) { HOST_WIDE_INT i, adjust; bool first_probe = true; @@ -12207,19 +12207,27 @@ ix86_adjust_stack_and_probe (const HOST_ - (PROBE_INTERVAL + dope)))); /* LAST_ADDR = SP_0 + PROBE_INTERVAL + ROUNDED_SIZE. */ - emit_move_insn (sr.reg, GEN_INT (-rounded_size)); - emit_insn (gen_rtx_SET (sr.reg, - gen_rtx_PLUS (Pmode, sr.reg, - stack_pointer_rtx))); + if (rounded_size <= (HOST_WIDE_INT_1 << 31)) + emit_insn (gen_rtx_SET (sr.reg, + plus_constant (Pmode, stack_pointer_rtx, + -rounded_size))); + else + { + emit_move_insn (sr.reg, GEN_INT (-rounded_size)); + emit_insn (gen_rtx_SET (sr.reg, + gen_rtx_PLUS (Pmode, sr.reg, + stack_pointer_rtx))); + } /* Step 3: the loop - while (SP != LAST_ADDR) + do { SP = SP + PROBE_INTERVAL probe at SP } + while (SP != LAST_ADDR) adjusts SP and probes to PROBE_INTERVAL + N * PROBE_INTERVAL for values of N from 1 until it is equal to ROUNDED_SIZE. */ @@ -12275,23 +12283,16 @@ const char * output_adjust_stack_and_probe (rtx reg) { static int labelno = 0; - char loop_lab[32], end_lab[32]; + char loop_lab[32]; rtx xops[2]; - ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno); - ASM_GENERATE_INTERNAL_LABEL (end_lab, "LPSRE", labelno++); + ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno++); + /* Loop. */ ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab); - /* Jump to END_LAB if SP == LAST_ADDR. */ - xops[0] = stack_pointer_rtx; - xops[1] = reg; - output_asm_insn ("cmp%z0\t{%1, %0|%0, %1}", xops); - fputs ("\tje\t", asm_out_file); - assemble_name_raw (asm_out_file, end_lab); - fputc ('\n', asm_out_file); - /* SP = SP + PROBE_INTERVAL. */ + xops[0] = stack_pointer_rtx; xops[1] = GEN_INT (PROBE_INTERVAL); output_asm_insn ("sub%z0\t{%1, %0|%0, %1}", xops); @@ -12299,12 +12300,16 @@ output_adjust_stack_and_probe (rtx reg) xops[1] = const0_rtx; output_asm_insn ("or%z0\t{%1, (%0)|DWORD PTR [%0], %1}", xops); - fprintf (asm_out_file, "\tjmp\t"); + /* Test if SP == LAST_ADDR. */ + xops[0] = stack_pointer_rtx; + xops[1] = reg; + output_asm_insn ("cmp%z0\t{%1, %0|%0, %1}", xops); + + /* Branch. */ + fputs ("\tjne\t", asm_out_file); assemble_name_raw (asm_out_file, loop_lab); fputc ('\n', asm_out_file); - ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, end_lab); - return ""; } @@ -12315,10 +12320,10 @@ static void ix86_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size) { /* See if we have a constant small number of probes to generate. If so, - that's the easy case. The run-time loop is made up of 7 insns in the + that's the easy case. The run-time loop is made up of 6 insns in the generic case while the compile-time loop is made up of n insns for n # of intervals. */ - if (size <= 7 * PROBE_INTERVAL) + if (size <= 6 * PROBE_INTERVAL) { HOST_WIDE_INT i; @@ -12362,11 +12367,12 @@ ix86_emit_probe_stack_range (HOST_WIDE_I /* Step 3: the loop - while (TEST_ADDR != LAST_ADDR) + do { TEST_ADDR = TEST_ADDR + PROBE_INTERVAL probe at TEST_ADDR } + while (TEST_ADDR != LAST_ADDR) probes at FIRST + N * PROBE_INTERVAL for values of N from 1 until it is equal to ROUNDED_SIZE. */ @@ -12398,23 +12404,16 @@ const char * output_probe_stack_range (rtx reg, rtx end) { static int labelno = 0; - char loop_lab[32], end_lab[32]; + char loop_lab[32]; rtx xops[3]; - ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno); - ASM_GENERATE_INTERNAL_LABEL (end_lab, "LPSRE", labelno++); + ASM_GENERATE_INTERNAL_LABEL (loop_lab, "LPSRL", labelno++); + /* Loop. */ ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab); - /* Jump to END_LAB if TEST_ADDR == LAST_ADDR. */ - xops[0] = reg; - xops[1] = end; - output_asm_insn ("cmp%z0\t{%1, %0|%0, %1}", xops); - fputs ("\tje\t", asm_out_file); - assemble_name_raw (asm_out_file, end_lab); - fputc ('\n', asm_out_file); - /* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL. */ + xops[0] = reg; xops[1] = GEN_INT (PROBE_INTERVAL); output_asm_insn ("sub%z0\t{%1, %0|%0, %1}", xops); @@ -12424,12 +12423,16 @@ output_probe_stack_range (rtx reg, rtx e xops[2] = const0_rtx; output_asm_insn ("or%z0\t{%2, (%0,%1)|DWORD PTR [%0+%1], %2}", xops); - fprintf (asm_out_file, "\tjmp\t"); + /* Test if TEST_ADDR == LAST_ADDR. */ + xops[0] = reg; + xops[1] = end; + output_asm_insn ("cmp%z0\t{%1, %0|%0, %1}", xops); + + /* Branch. */ + fputs ("\tjne\t", asm_out_file); assemble_name_raw (asm_out_file, loop_lab); fputc ('\n', asm_out_file); - ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, end_lab); - return ""; } --nextPart14476345.vo4XT66hL9--