* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
@ 2015-02-03 21:11 ` lucier at math dot purdue.edu
2015-02-03 21:33 ` pinskia at gcc dot gnu.org
` (38 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2015-02-03 21:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #1 from lucier at math dot purdue.edu ---
Created attachment 34660
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34660&action=edit
Input file for bug
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
2015-02-03 21:11 ` [Bug other/64928] Inordinate " lucier at math dot purdue.edu
@ 2015-02-03 21:33 ` pinskia at gcc dot gnu.org
2015-02-03 21:35 ` pinskia at gcc dot gnu.org
` (37 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-02-03 21:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note phase opt and generate is a toplevel time area.
The passes which take most of the time are:
tree DSE : 2.80 ( 8%) usr 0.00 ( 0%) sys 2.80 ( 8%) wall
0 kB ( 0%) ggc
out of ssa : 5.90 (16%) usr 0.50 (59%) sys 6.41 (17%) wall
26 kB ( 0%) ggc
CSE : 7.53 (21%) usr 0.01 ( 1%) sys 7.53 (20%) wall
30934 kB ( 6%) ggc
reload CSE regs : 5.74 (16%) usr 0.00 ( 0%) sys 5.73 (16%) wall
12325 kB ( 2%) ggc
scheduling 2 : 1.79 ( 5%) usr 0.01 ( 1%) sys 1.77 ( 5%) wall
299 kB ( 0%) ggc
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
2015-02-03 21:11 ` [Bug other/64928] Inordinate " lucier at math dot purdue.edu
2015-02-03 21:33 ` pinskia at gcc dot gnu.org
@ 2015-02-03 21:35 ` pinskia at gcc dot gnu.org
2015-02-03 21:49 ` lucier at math dot purdue.edu
` (36 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: pinskia at gcc dot gnu.org @ 2015-02-03 21:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |compile-time-hog
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this is just an issue with computed goto (indirect gotos).
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (2 preceding siblings ...)
2015-02-03 21:35 ` pinskia at gcc dot gnu.org
@ 2015-02-03 21:49 ` lucier at math dot purdue.edu
2015-02-06 5:07 ` lucier at math dot purdue.edu
` (35 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2015-02-03 21:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #4 from lucier at math dot purdue.edu ---
On 02/03/2015 04:32 PM, pinskia at gcc dot gnu.org wrote:
> > --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Note phase opt and generate is a toplevel time area.
> The passes which take most of the time are:
I'm also concerned about excessive memory usage; the largest passes (>
20 MB) are
alias analysis : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.19 (
1%) wall 23934 kB ( 5%) ggc
tree SSA incremental : 0.23 ( 1%) usr 0.01 ( 1%) sys 0.26 (
1%) wall 27481 kB ( 5%) ggc
dominator optimization : 0.22 ( 1%) usr 0.01 ( 1%) sys 0.22 (
1%) wall 27417 kB ( 5%) ggc
tree loop invariant motion: 0.16 ( 0%) usr 0.03 ( 4%) sys 0.19 (
1%) wall 64219 kB (12%) ggc
expand : 0.39 ( 1%) usr 0.02 ( 2%) sys 0.40 (
1%) wall 87038 kB (17%) ggc
CSE : 7.53 (21%) usr 0.01 ( 1%) sys 7.53
(20%) wall 30934 kB ( 6%) ggc
integrated RA : 0.87 ( 2%) usr 0.00 ( 0%) sys 0.99 (
3%) wall 48097 kB ( 9%) ggc
LRA non-specific : 1.61 ( 4%) usr 0.01 ( 1%) sys 1.63 (
4%) wall 37254 kB ( 7%) ggc
This also affects the 4.8 branch and the mainline.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (3 preceding siblings ...)
2015-02-03 21:49 ` lucier at math dot purdue.edu
@ 2015-02-06 5:07 ` lucier at math dot purdue.edu
2015-02-06 5:08 ` lucier at math dot purdue.edu
` (34 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2015-02-06 5:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #5 from lucier at math dot purdue.edu ---
Created attachment 34681
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34681&action=edit
_io.i.gz: larger test file
With this compiler:
firefly:~/Downloads/gambit/lib> /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc-devel/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release
Thread model: posix
gcc version 5.0.0 20150206 (experimental) [trunk revision 220467] (GCC)
and the input file _io.c, I find
/pkgs/gcc-mainline/bin/gcc -Q -save-temps -Wno-unused -Wno-write-strings -O1
-fno-math-errno -fschedule-insns2 -fno-strict-aliasing -fno-trapping-math
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fprofile-arcs
-ftest-coverage -I"../include" -c -o "_io.o" -I. -DHAVE_CONFIG_H
-D___GAMBCDIR="\"/usr/local/Gambit-C\"" -D___SYS_TYPE_CPU="\"x86_64\""
-D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\""
-D___CONFIGURE_COMMAND="\"./configure 'CC=/pkgs/gcc-mainline/bin/gcc -Q
-save-temps' '--enable-coverage' '--enable-track-scheme'"\"
-D___OBJ_EXTENSION="\".o\"" -D___EXE_EXTENSION="\"\"" -D___BAT_EXTENSION="\"\""
-D___PRIMAL _io.c -D___LIBRARY
Execution times (seconds)
phase setup : 0.78 (100%) usr 0.04 (100%) sys 0.83 (100%)
wall 156905 kB (100%) ggc
TOTAL : 0.78 0.04 0.83
156922 kB
btowc wctob mbrlen __signbitf __signbit __signbitl ___H__20___io
___H__23__23_fail_2d_check_2d_datum_2d_parsing_2d_exception
___H_datum_2d_parsing_2d_exception_3f_
___H_datum_2d_parsing_2d_exception_2d_kind
___H_datum_2d_parsing_2d_exception_2d_readenv
___H_datum_2d_parsing_2d_exception_2d_parameters
___H__23__23_raise_2d_datum_2d_parsing_2d_exception
___H__23__23_fail_2d_check_2d_unterminated_2d_process_2d_exception
___H_unterminated_2d_process_2d_exception_3f_
___H_unterminated_2d_process_2d_exception_2d_procedure
___H_unterminated_2d_process_2d_exception_2d_arguments
___H__23__23_raise_2d_unterminated_2d_process_2d_exception
___H__23__23_fail_2d_check_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception
___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_3f_
___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_procedure
___H_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception_2d_arguments
___H__23__23_raise_2d_nonempty_2d_input_2d_port_2d_character_2d_buffer_2d_exception
___H__23__23_fail_2d_check_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception
___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_3f_
___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_procedure
___H_no_2d_such_2d_file_2d_or_2d_directory_2d_exception_2d_arguments
___H__23__23_raise_2d_no_2d_such_2d_file_2d_or_2d_directory_2d_exception
___H__23__23_raise_2d_os_2d_io_2d_exception
___H__23__23_raise_2d_io_2d_exception ___H__23__23_fail_2d_check_2d_settings
___H__23__23_fail_2d_check_2d_exact_2d_integer_2d_or_2d_string_2d_or_2d_settings
___H__23__23_fail_2d_check_2d_string_2d_or_2d_ip_2d_address
___H__23__23_make_2d_writeenv ___H__23__23_make_2d_readenv
___H__23__23_readenv_2d_current_2d_filepos
___H__23__23_readenv_2d_relative_2d_filepos ___H__23__23_make_2d_psettings
___H__23__23_parse_2d_psettings_21_ ___H__23__23_psettings_2d__3e_roptions
___H__23__23_psettings_2d__3e_woptions
___H__23__23_psettings_2d__3e_input_2d_readtable
___H__23__23_psettings_2d__3e_output_2d_readtable
___H__23__23_psettings_2d_options_2d__3e_options
___H__23__23_psettings_2d__3e_device_2d_flags
___H__23__23_psettings_2d__3e_permissions
___H__23__23_psettings_2d__3e_output_2d_width ___H__23__23_port_3f_
___H_port_3f_ ___H__23__23_input_2d_port_3f_ ___H_input_2d_port_3f_
___H__23__23_output_2d_port_3f_ ___H_output_2d_port_3f_
___H__23__23_fail_2d_check_2d_port ___H__23__23_fail_2d_check_2d_input_2d_port
___H__23__23_fail_2d_check_2d_output_2d_port
___H__23__23_fail_2d_check_2d_character_2d_input_2d_port
___H__23__23_fail_2d_check_2d_character_2d_output_2d_port
___H__23__23_fail_2d_check_2d_byte_2d_port
___H__23__23_fail_2d_check_2d_byte_2d_input_2d_port
___H__23__23_fail_2d_check_2d_byte_2d_output_2d_port
___H__23__23_fail_2d_check_2d_device_2d_input_2d_port
___H__23__23_fail_2d_check_2d_device_2d_output_2d_port
___H__23__23_make_2d_io_2d_condvar ___H__23__23_io_2d_condvar_3f_
___H__23__23_io_2d_condvar_2d_for_2d_writing_3f_
___H__23__23_io_2d_condvar_2d_port
___H__23__23_io_2d_condvar_2d_port_2d_set_21_
___H__23__23_make_2d_dummy_2d_port ___H_open_2d_dummy
___H__23__23_make_2d_device_2d_port ___H__23__23_make_2d_rdevice_2d_condvar
___H__23__23_make_2d_wdevice_2d_condvar
___H__23__23_make_2d_device_2d_port_2d_from_2d_single_2d_device
___H__23__23_close_2d_device ___H__23__23_input_2d_port_2d_byte_2d_position
___H_input_2d_port_2d_byte_2d_position
___H__23__23_output_2d_port_2d_byte_2d_position
___H_output_2d_port_2d_byte_2d_position
___H__23__23_device_2d_port_2d_wait_2d_for_2d_input_21_
___H__23__23_device_2d_port_2d_wait_2d_for_2d_output_21_
___H__23__23_char_2d_rbuf_2d_fill ___H__23__23_byte_2d_rbuf_2d_fill
___H__23__23_char_2d_wbuf_2d_drain_2d_no_2d_reset
___H__23__23_char_2d_wbuf_2d_drain
___H__23__23_byte_2d_wbuf_2d_drain_2d_no_2d_reset
___H__23__23_byte_2d_wbuf_2d_drain ___H__23__23_vect_2d_port_2d_options
___H__23__23_fail_2d_check_2d_vector_2d_input_2d_port
___H__23__23_fail_2d_check_2d_vector_2d_output_2d_port
___H__23__23_fail_2d_check_2d_vector_2d_or_2d_settings
___H__23__23_subvector_2d__3e_fifo ___H__23__23_fifo_2d__3e_vector
___H__23__23_open_2d_vector_2d_generic ___H__23__23_open_2d_vector
___H_open_2d_vector ___H__23__23_make_2d_vector_2d_pipe_2d_port
___H__23__23_open_2d_vector_2d_pipe_2d_generic
___H__23__23_open_2d_vector_2d_pipe ___H_open_2d_vector_2d_pipe
___H__23__23_open_2d_input_2d_vector ___H_open_2d_input_2d_vector
___H__23__23_open_2d_output_2d_vector ___H_open_2d_output_2d_vector
___H__23__23_get_2d_output_2d_vector ___H_get_2d_output_2d_vector
___H_call_2d_with_2d_input_2d_vector ___H_call_2d_with_2d_output_2d_vector
___H_with_2d_input_2d_from_2d_vector ___H_with_2d_output_2d_to_2d_vector
___H__23__23_make_2d_vector_2d_port
___H__23__23_fail_2d_check_2d_string_2d_input_2d_port
___H__23__23_fail_2d_check_2d_string_2d_output_2d_port
___H__23__23_fail_2d_check_2d_string_2d_or_2d_settings
___H__23__23_substring_2d__3e_fifo ___H__23__23_fifo_2d__3e_string
___H__23__23_open_2d_string_2d_generic ___H__23__23_open_2d_string
___H_open_2d_string ___H__23__23_make_2d_string_2d_pipe_2d_port
___H__23__23_open_2d_string_2d_pipe_2d_generic
___H__23__23_open_2d_string_2d_pipe ___H_open_2d_string_2d_pipe
___H__23__23_open_2d_input_2d_string ___H_open_2d_input_2d_string
___H__23__23_open_2d_output_2d_string ___H_open_2d_output_2d_string
___H__23__23_get_2d_output_2d_string ___H_get_2d_output_2d_string
___H_call_2d_with_2d_input_2d_string ___H_call_2d_with_2d_output_2d_string
___H_with_2d_input_2d_from_2d_string ___H_with_2d_output_2d_to_2d_string
___H__23__23_make_2d_string_2d_port
___H__23__23_fail_2d_check_2d_u8vector_2d_input_2d_port
___H__23__23_fail_2d_check_2d_u8vector_2d_output_2d_port
___H__23__23_fail_2d_check_2d_u8vector_2d_or_2d_settings
___H__23__23_subu8vector_2d__3e_fifo ___H__23__23_fifo_2d__3e_u8vector
___H__23__23_open_2d_u8vector_2d_generic ___H__23__23_open_2d_u8vector
___H_open_2d_u8vector ___H__23__23_make_2d_u8vector_2d_pipe_2d_port
___H__23__23_open_2d_u8vector_2d_pipe_2d_generic
___H__23__23_open_2d_u8vector_2d_pipe ___H_open_2d_u8vector_2d_pipe
___H__23__23_open_2d_input_2d_u8vector ___H_open_2d_input_2d_u8vector
___H__23__23_open_2d_output_2d_u8vector ___H_open_2d_output_2d_u8vector
___H__23__23_get_2d_output_2d_u8vector ___H_get_2d_output_2d_u8vector
___H_call_2d_with_2d_input_2d_u8vector ___H_call_2d_with_2d_output_2d_u8vector
___H_with_2d_input_2d_from_2d_u8vector ___H_with_2d_output_2d_to_2d_u8vector
___H__23__23_make_2d_u8vector_2d_port ___H__23__23_port_2d_of_2d_kind_3f_
___H__23__23_port_2d_kind ___H__23__23_port_2d_device ___H__23__23_port_2d_name
___H__23__23_read ___H_read
___H__23__23_write_2d_generic_2d_to_2d_character_2d_port ___H__23__23_write
___H_write ___H__23__23_display ___H_display ___H__23__23_pretty_2d_print
___H_pretty_2d_print ___H__23__23_print ___H_print ___H_println
___H__23__23_newline ___H_newline ___H__23__23_flush_2d_input_2d_buffering
___H__23__23_force_2d_output ___H_force_2d_output
___H__23__23_close_2d_input_2d_port ___H_close_2d_input_2d_port
___H__23__23_close_2d_output_2d_port ___H_close_2d_output_2d_port
___H__23__23_close_2d_port ___H_close_2d_port ___H_input_2d_port_2d_readtable
___H_input_2d_port_2d_readtable_2d_set_21_ ___H_output_2d_port_2d_readtable
___H_output_2d_port_2d_readtable_2d_set_21_
___H__23__23_input_2d_port_2d_timeout_2d_set_21_
___H_input_2d_port_2d_timeout_2d_set_21_
___H__23__23_output_2d_port_2d_timeout_2d_set_21_
___H_output_2d_port_2d_timeout_2d_set_21_
___H__23__23_port_2d_io_2d_exception_2d_handler_2d_set_21_
___H_port_2d_io_2d_exception_2d_handler_2d_set_21_
___H__23__23_input_2d_port_2d_char_2d_position
___H_input_2d_port_2d_char_2d_position
___H__23__23_output_2d_port_2d_char_2d_position
___H_output_2d_port_2d_char_2d_position
___H__23__23_input_2d_port_2d_line_2d_set_21_
___H__23__23_input_2d_port_2d_line ___H_input_2d_port_2d_line
___H__23__23_input_2d_port_2d_column_2d_set_21_
___H__23__23_input_2d_port_2d_column ___H_input_2d_port_2d_column
___H__23__23_output_2d_port_2d_line_2d_set_21_
___H__23__23_output_2d_port_2d_line ___H_output_2d_port_2d_line
___H__23__23_output_2d_port_2d_column_2d_set_21_
___H__23__23_output_2d_port_2d_column ___H_output_2d_port_2d_column
___H__23__23_output_2d_port_2d_width ___H_output_2d_port_2d_width
___H__23__23_object_2d__3e_truncated_2d_string
___H__23__23_object_2d__3e_string ___H_object_2d__3e_string
___H__23__23_string_2d__3e_limited_2d_string
___H__23__23_force_2d_limited_2d_string_21_
___H__23__23_input_2d_port_2d_characters_2d_buffered
___H_input_2d_port_2d_characters_2d_buffered ___H__23__23_char_2d_ready_3f_
___H_char_2d_ready_3f_ ___H__23__23_peek_2d_char ___H_peek_2d_char
___H__23__23_read_2d_char ___H_read_2d_char ___H__23__23_read_2d_substring
___H_read_2d_substring ___H__23__23_read_2d_line ___H_read_2d_line
___H__23__23_read_2d_all ___H_read_2d_all
___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_path
___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_psettings
___H__23__23_read_2d_all_2d_as_2d_a_2d_begin_2d_expr_2d_from_2d_port
___H__23__23_write_2d_char ___H_write_2d_char ___H__23__23_write_2d_substring
___H_write_2d_substring ___H__23__23_write_2d_string
___H__23__23_input_2d_port_2d_bytes_2d_buffered
___H_input_2d_port_2d_bytes_2d_buffered ___H__23__23_read_2d_u8 ___H_read_2d_u8
___H__23__23_read_2d_subu8vector ___H_read_2d_subu8vector
___H__23__23_write_2d_u8 ___H_write_2d_u8 ___H__23__23_write_2d_subu8vector
___H_write_2d_subu8vector ___H__23__23_options_2d_set_21_
___H__23__23_port_2d_settings_2d_set_21_ ___H_port_2d_settings_2d_set_21_
___H__23__23_fail_2d_check_2d_tty_2d_port ___H__23__23_tty_3f_ ___H_tty_3f_
___H__23__23_tty_2d_type_2d_set_21_ ___H_tty_2d_type_2d_set_21_
___H__23__23_tty_2d_text_2d_attributes_2d_set_21_
___H_tty_2d_text_2d_attributes_2d_set_21_ ___H__23__23_tty_2d_history
___H_tty_2d_history ___H__23__23_tty_2d_history_2d_set_21_
___H_tty_2d_history_2d_set_21_
___H__23__23_tty_2d_history_2d_max_2d_length_2d_set_21_
___H_tty_2d_history_2d_max_2d_length_2d_set_21_
___H__23__23_tty_2d_paren_2d_balance_2d_duration_2d_set_21_
___H_tty_2d_paren_2d_balance_2d_duration_2d_set_21_
___H__23__23_tty_2d_mode_2d_set_21_ ___H_tty_2d_mode_2d_set_21_
___H__23__23_fail_2d_check_2d_process_2d_port
___H__23__23_make_2d_process_2d_psettings
___H__23__23_open_2d_process_2d_generic ___H__23__23_open_2d_process
___H_open_2d_process ___H__23__23_open_2d_input_2d_process
___H_open_2d_input_2d_process ___H__23__23_open_2d_output_2d_process
___H_open_2d_output_2d_process ___H_call_2d_with_2d_input_2d_process
___H_call_2d_with_2d_output_2d_process ___H_with_2d_input_2d_from_2d_process
___H_with_2d_output_2d_to_2d_process ___H__23__23_process_2d_pid
___H_process_2d_pid ___H__23__23_process_2d_status ___H_process_2d_status
___H__23__23_fail_2d_check_2d_host_2d_info ___H_host_2d_info_3f_
___H_host_2d_info_2d_name ___H_host_2d_info_2d_aliases
___H_host_2d_info_2d_addresses ___H__23__23_host_2d_info ___H_host_2d_info
___H__23__23_host_2d_name ___H_host_2d_name
___H__23__23_string_2d_or_2d_ip_2d_address_3f_ ___H__23__23_ip_2d_address_3f_
___H__23__23_fail_2d_check_2d_service_2d_info ___H_service_2d_info_3f_
___H_service_2d_info_2d_name ___H_service_2d_info_2d_aliases
___H_service_2d_info_2d_port_2d_number ___H_service_2d_info_2d_protocol
___H__23__23_service_2d_info ___H_service_2d_info
___H__23__23_fail_2d_check_2d_protocol_2d_info ___H_protocol_2d_info_3f_
___H_protocol_2d_info_2d_name ___H_protocol_2d_info_2d_aliases
___H_protocol_2d_info_2d_number ___H__23__23_protocol_2d_info
___H_protocol_2d_info ___H__23__23_fail_2d_check_2d_network_2d_info
___H_network_2d_info_3f_ ___H_network_2d_info_2d_name
___H_network_2d_info_2d_aliases ___H_network_2d_info_2d_number
___H__23__23_network_2d_info ___H_network_2d_info
___H__23__23_fail_2d_check_2d_tcp_2d_client_2d_port
___H__23__23_make_2d_tcp_2d_psettings
___H__23__23_make_2d_tcp_2d_client_2d_port ___H__23__23_open_2d_tcp_2d_client
___H_open_2d_tcp_2d_client ___H__23__23_fail_2d_check_2d_socket_2d_info
___H_socket_2d_info_3f_ ___H_socket_2d_info_2d_family
___H_socket_2d_info_2d_port_2d_number ___H_socket_2d_info_2d_address
___H__23__23_socket_2d_info_2d_setup_21_
___H__23__23_tcp_2d_client_2d_socket_2d_info
___H__23__23_tcp_2d_client_2d_self_2d_socket_2d_info
___H_tcp_2d_client_2d_self_2d_socket_2d_info
___H__23__23_tcp_2d_client_2d_peer_2d_socket_2d_info
___H_tcp_2d_client_2d_peer_2d_socket_2d_info
___H__23__23_fail_2d_check_2d_address_2d_info ___H_address_2d_info_3f_
___H_address_2d_info_2d_family ___H_address_2d_info_2d_socket_2d_type
___H_address_2d_info_2d_protocol ___H_address_2d_info_2d_socket_2d_info
___H__23__23_net_2d_family_2d_encode ___H__23__23_net_2d_family_2d_decode
___H__23__23_net_2d_socket_2d_type_2d_encode
___H__23__23_net_2d_socket_2d_type_2d_decode
___H__23__23_net_2d_protocol_2d_encode ___H__23__23_net_2d_protocol_2d_decode
___H__23__23_address_2d_info_2d_setup_21_ ___H__23__23_address_2d_infos
___H_address_2d_infos ___H__23__23_fail_2d_check_2d_tcp_2d_server_2d_port
___H__23__23_make_2d_tcp_2d_server_2d_port
___H__23__23_process_2d_tcp_2d_server_2d_psettings
___H__23__23_open_2d_tcp_2d_server_2d_aux ___H__23__23_open_2d_tcp_2d_server
___H_open_2d_tcp_2d_server ___H__23__23_tcp_2d_server_2d_socket_2d_info
___H_tcp_2d_server_2d_socket_2d_info
___H__23__23_string_2d__3e_address_2d_and_2d_port_2d_number
___H__23__23_fail_2d_check_2d_directory_2d_port
___H__23__23_make_2d_directory_2d_psettings
___H__23__23_make_2d_directory_2d_port ___H__23__23_open_2d_directory
___H_open_2d_directory ___H__23__23_fail_2d_check_2d_event_2d_queue_2d_port
___H__23__23_make_2d_event_2d_queue_2d_port ___H__23__23_open_2d_event_2d_queue
___H_open_2d_event_2d_queue ___H__23__23_make_2d_path_2d_psettings
___H__23__23_make_2d_input_2d_path_2d_psettings
___H__23__23_open_2d_file_2d_generic
___H__23__23_open_2d_file_2d_generic_2d_from_2d_psettings
___H__23__23_path_2d_reference ___H__23__23_open_2d_file ___H_open_2d_file
___H__23__23_open_2d_input_2d_file ___H_open_2d_input_2d_file
___H__23__23_open_2d_output_2d_file ___H_open_2d_output_2d_file
___H_call_2d_with_2d_input_2d_file ___H_call_2d_with_2d_output_2d_file
___H_with_2d_input_2d_from_2d_file ___H_with_2d_output_2d_to_2d_file
___H_with_2d_input_2d_from_2d_port ___H_with_2d_output_2d_to_2d_port
___H__23__23_open_2d_predefined ___H_console_2d_port
___H__23__23_open_2d_all_2d_predefined
___H__23__23_force_2d_output_2d_on_2d_predefined ___H__23__23_make_2d_filepos
___H__23__23_filepos_2d_line ___H__23__23_filepos_2d_col
___H__23__23_fail_2d_check_2d_readtable ___H__23__23_readtable_3f_
___H_readtable_3f_ ___H__23__23_readtable_2d_copy_2d_shallow
___H__23__23_readtable_2d_copy ___H_readtable_2d_case_2d_conversion_3f_
___H_readtable_2d_case_2d_conversion_3f__2d_set
___H_readtable_2d_keywords_2d_allowed_3f_
___H_readtable_2d_keywords_2d_allowed_3f__2d_set
___H_readtable_2d_sharing_2d_allowed_3f_
___H_readtable_2d_sharing_2d_allowed_3f__2d_set
___H_readtable_2d_eval_2d_allowed_3f_
___H_readtable_2d_eval_2d_allowed_3f__2d_set
___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f_
___H_readtable_2d_write_2d_extended_2d_read_2d_macros_3f__2d_set
___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f_
___H_readtable_2d_write_2d_cdr_2d_read_2d_macros_3f__2d_set
___H_readtable_2d_max_2d_write_2d_level
___H_readtable_2d_max_2d_write_2d_level_2d_set
___H_readtable_2d_max_2d_write_2d_length
___H_readtable_2d_max_2d_write_2d_length_2d_set
___H_readtable_2d_max_2d_unescaped_2d_char
___H_readtable_2d_max_2d_unescaped_2d_char_2d_set
___H_readtable_2d_comment_2d_handler
___H_readtable_2d_comment_2d_handler_2d_set ___H_readtable_2d_start_2d_syntax
___H_readtable_2d_start_2d_syntax_2d_set
___H__23__23_extract_2d_language_2d_and_2d_tail
___H__23__23_readtable_2d_setup_2d_for_2d_language_21_
___H__23__23_readtable_2d_setup_2d_for_2d_standard_2d_level_21_
___H__23__23_make_2d_readtable_2d_parameter ___H__23__23_start_2d_main
___H__23__23_make_2d_marktable ___H__23__23_marktable_2d_mark_21_
___H__23__23_marktable_2d_lookup_21_ ___H__23__23_marktable_2d_save
___H__23__23_marktable_2d_restore_21_
___H__23__23_might_2d_write_2d_differently_3f_ ___H__23__23_default_2d_wr
___H__23__23_wr_2d_str ___H__23__23_wr_2d_substr ___H__23__23_wr_2d_ch
___H__23__23_wr_2d_filler ___H__23__23_wr_2d_spaces ___H__23__23_wr_2d_indent
___H__23__23_shifted_2d_column ___H__23__23_wr_2d_sn
___H__23__23_wr_2d_no_2d_display ___H__23__23_wr_2d_mark
___H__23__23_wr_2d_stamp ___H__23__23_wr_2d_symbol
___H__23__23_escape_2d_symbol_3f_ ___H__23__23_escape_2d_symkey_3f_
___H__23__23_wr_2d_keyword ___H__23__23_escape_2d_keyword_3f_
___H__23__23_wr_2d_pair ___H__23__23_print_2d_marker
___H__23__23_wr_2d_one_2d_line_2d_pretty_2d_print
___H__23__23_wr_2d_fits_2d_on_2d_line ___H__23__23_wr_2d_complex
___H__23__23_wr_2d_char ___H__23__23_wr_2d_hex ___H__23__23_wr_2d_oct
___H__23__23_wr_2d_string ___H__23__23_wr_2d_escaped_2d_string
___H__23__23_reader_2d__3e_open_2d_close ___H__23__23_head_2d__3e_open_2d_close
___H__23__23_wr_2d_vector ___H__23__23_wr_2d_vector_2d_aux1
___H__23__23_wr_2d_vector_2d_aux2 ___H__23__23_wr_2d_vector_2d_aux3
___H__23__23_wr_2d_foreign ___H__23__23_explode_2d_object
___H__23__23_implode_2d_object ___H__23__23_explode_2d_structure
___H__23__23_implode_2d_structure ___H__23__23_implode_2d_frame
___H__23__23_implode_2d_continuation ___H__23__23_explode_2d_procedure
___H__23__23_explode_2d_closure ___H__23__23_explode_2d_subprocedure
___H__23__23_implode_2d_procedure
___H__23__23_implode_2d_procedure_2d_or_2d_return
___H__23__23_explode_2d_return ___H__23__23_implode_2d_return
___H__23__23_wr_2d_opaque ___H__23__23_wr_2d_serialize
___H__23__23_wr_2d_s8vector ___H__23__23_wr_2d_u8vector
___H__23__23_wr_2d_s16vector ___H__23__23_wr_2d_u16vector
___H__23__23_wr_2d_s32vector ___H__23__23_wr_2d_u32vector
___H__23__23_wr_2d_s64vector ___H__23__23_wr_2d_u64vector
___H__23__23_wr_2d_f32vector ___H__23__23_wr_2d_f64vector
___H__23__23_wr_2d_structure ___H__23__23_wr_2d_gc_2d_hash_2d_table
___H__23__23_explode_2d_gc_2d_hash_2d_table
___H__23__23_implode_2d_gc_2d_hash_2d_table ___H__23__23_wr_2d_meroon
___H__23__23_wr_2d_jazz ___H__23__23_wr_2d_frame
___H__23__23_wr_2d_continuation ___H__23__23_wr_2d_promise
___H__23__23_explode_2d_promise ___H__23__23_implode_2d_promise
___H__23__23_wr_2d_will ___H__23__23_wr_2d_procedure ___H__23__23_wr_2d_return
___H__23__23_wr_2d_box ___H__23__23_wr_2d_other ___H__23__23_eof_2d_object_3f_
___H_eof_2d_object_3f_ ___H_transcript_2d_on ___H_transcript_2d_off
___H__23__23_make_2d_chartable ___H__23__23_chartable_2d_copy
___H__23__23_chartable_2d_ref ___H__23__23_chartable_2d_set_21_
___H__23__23_readtable_2d_char_2d_delimiter_3f_
___H__23__23_readtable_2d_char_2d_delimiter_3f__2d_set_21_
___H__23__23_readtable_2d_char_2d_handler
___H__23__23_readtable_2d_char_2d_handler_2d_set_21_
___H__23__23_readtable_2d_char_2d_sharp_2d_handler
___H__23__23_readtable_2d_char_2d_sharp_2d_handler_2d_set_21_
___H__23__23_readtable_2d_char_2d_class_2d_set_21_
___H__23__23_readtable_2d_convert_2d_case
___H__23__23_readtable_2d_string_2d_convert_2d_case_21_
___H__23__23_readtable_2d_parse_2d_keyword
___H__23__23_read_2d_datum_2d_or_2d_eof
___H__23__23_read_2d_datum_2d_or_2d_label
___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none
___H__23__23_read_2d_datum_2d_or_2d_label_2d_or_2d_none_2d_or_2d_dot
___H__23__23_script_2d_marker ___H__23__23_none_2d_marker
___H__23__23_dot_2d_marker ___H__23__23_label_2d_marker_3f_
___H__23__23_label_2d_marker_2d_enter_21_
___H__23__23_label_2d_marker_2d_reference
___H__23__23_label_2d_marker_2d_fixup_2d_handler_2d_add_21_
___H__23__23_label_2d_marker_2d_define
___H__23__23_label_2d_marker_2d_fixup_21_
___H__23__23_read_2d_check_2d_labels_21_ ___H__23__23_build_2d_list
___H__23__23_read_2d_next_2d_char_2d_expecting ___H__23__23_build_2d_vector
___H__23__23_build_2d_delimited_2d_string
___H__23__23_build_2d_delimited_2d_number_2f_keyword_2f_symbol
___H__23__23_string_2d__3e_number_2f_keyword_2f_symbol
___H__23__23_char_2d_octal_3f_ ___H__23__23_char_2d_hexadecimal_3f_
___H__23__23_build_2d_escaped_2d_string_2d_up_2d_to
___H__23__23_build_2d_decimal_2d_integer ___H__23__23_build_2d_read_2d_macro
___H__23__23_skip_2d_extended_2d_comment
___H__23__23_skip_2d_single_2d_line_2d_comment
___H__23__23_skip_2d_comment_2d_done ___H__23__23_read_2d_sharp
___H__23__23_read_2d_sharp_2d_aux ___H__23__23_read_2d_sharp_2d_vector
___H__23__23_read_2d_sharp_2d_char ___H__23__23_read_2d_sharp_2d_comment
___H__23__23_read_2d_sharp_2d_bang
___H__23__23_read_2d_sharp_2d_keyword_2f_symbol
___H__23__23_read_2d_sharp_2d_colon ___H__23__23_read_2d_sharp_2d_semicolon
___H__23__23_read_2d_sharp_2d_quotation ___H__23__23_read_2d_sharp_2d_ampersand
___H__23__23_read_2d_sharp_2d_dot ___H__23__23_read_2d_sharp_2d_less
___H__23__23_read_2d_sharp_2d_digit ___H__23__23_wrap ___H__23__23_wrap_2d_op
___H__23__23_wrap_2d_op0 ___H__23__23_wrap_2d_op1 ___H__23__23_wrap_2d_op1_2a_
___H__23__23_wrap_2d_op2 ___H__23__23_wrap_2d_op3 ___H__23__23_wrap_2d_op4
___H__23__23_read_2d_sharp_2d_other ___H__23__23_read_2d_whitespace
___H__23__23_read_2d_single_2d_line_2d_comment
___H__23__23_read_2d_escaped_2d_string ___H__23__23_read_2d_quotation
___H__23__23_closing_2d_parenthesis_2d_for
___H__23__23_read_2d_vector_2d_or_2d_list ___H__23__23_read_2d_list
___H__23__23_read_2d_vector ___H__23__23_read_2d_other
___H__23__23_read_2d_none ___H__23__23_read_2d_illegal ___H__23__23_read_2d_dot
___H__23__23_read_2d_number_2f_keyword_2f_symbol
___H__23__23_read_2d_assoc_2d_string_3d__3f_
___H__23__23_read_2d_string_3d__3f_ ___H__23__23_read_2d_six
___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof ___H__23__23_six_2d_type_3f_
___H__23__23_make_2d_standard_2d_readtable ___setup_mod ___init_mod ____20___io
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> <visibility> <build_ssa_passes> <chkp_passes>
<opt_local_passes> <free-inline-summary> <profile> <whole-program>
<profile_estimate> <inline> <pure-const> <static-var> <single-use>
<comdats>Assembling functions:
___setup_mod ___init_mod ___H__23__23_make_2d_standard_2d_readtable
___H__23__23_six_2d_type_3f_ ___H__23__23_read_2d_six_2d_datum_2d_or_2d_eof {GC
1963188k -> 1911014k}^Cmakefile:150: recipe for target '_io.o' failed
make: *** [_io.o] Interrupt
When I killed it, top was reporting:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8760 lucier 20 0 37.918g 0.029t 584 D 4.7 95.6 34:11.14 cc1
(I don't remember seeing resident memory measured in terabytes before ;-)
I'm having similar problems with the 4.8 branch.
I'm including _io.i.gz
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug other/64928] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (4 preceding siblings ...)
2015-02-06 5:07 ` lucier at math dot purdue.edu
@ 2015-02-06 5:08 ` lucier at math dot purdue.edu
2015-02-09 14:31 ` [Bug middle-end/64928] [4.8/4.9/5 Regression] " rguenth at gcc dot gnu.org
` (33 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2015-02-06 5:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #6 from lucier at math dot purdue.edu ---
The problem does not appear with this compiler:
maclaurin-271% gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
so it appears to be a regression.
Brad
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (5 preceding siblings ...)
2015-02-06 5:08 ` lucier at math dot purdue.edu
@ 2015-02-09 14:31 ` rguenth at gcc dot gnu.org
2015-02-09 15:07 ` rguenth at gcc dot gnu.org
` (32 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-09 14:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |memory-hog
Component|other |middle-end
Known to work| |4.4.7
Target Milestone|--- |4.8.5
Summary|Inordinate cpu time and |[4.8/4.9/5 Regression]
|memory usage in "phase opt |Inordinate cpu time and
|and generate" with |memory usage in "phase opt
|-ftest-coverage |and generate" with
|-fprofile-arcs |-ftest-coverage
| |-fprofile-arcs
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Given from the description I suppose that non-profiling/coverage mode is fine.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (6 preceding siblings ...)
2015-02-09 14:31 ` [Bug middle-end/64928] [4.8/4.9/5 Regression] " rguenth at gcc dot gnu.org
@ 2015-02-09 15:07 ` rguenth at gcc dot gnu.org
2015-02-16 19:57 ` law at redhat dot com
` (31 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-09 15:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-02-09
Ever confirmed|0 |1
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so the memory is used by out-of-SSA it seems
#5 0x0000000000c9eebc in coalesce_ssa_name ()
at /space/rguenther/src/svn/gcc-4_9-branch/gcc/tree-ssa-coalesce.c:1330
1330 graph = build_ssa_conflict_graph (liveinfo);
(gdb) p *cl->list.htab
$10 = {entries = 0x2b19b30, size = 524287, n_elements = 77146, n_deleted = 0,
searches = 122189, collisions = 6508, size_prime_index = 16}
where we malloc(!) 77146 entries of size 12.
But of course bad is the conflict graph with 76063 bitmaps eating up around
1GB of memory for the first testcase (and function
___H__23__23_u8vector_2d__3e_object).
That's likely caused by the change to more aggressively coalesce anonymous
SSA names.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (7 preceding siblings ...)
2015-02-09 15:07 ` rguenth at gcc dot gnu.org
@ 2015-02-16 19:57 ` law at redhat dot com
2015-03-05 17:22 ` rguenth at gcc dot gnu.org
` (30 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: law at redhat dot com @ 2015-02-16 19:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Jeffrey A. Law <law at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |law at redhat dot com
--- Comment #10 from Jeffrey A. Law <law at redhat dot com> ---
Might want to look at 65076 as well where phase opt and generate is taking 89%
of the compile time. Might be a better testcase to work with.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (8 preceding siblings ...)
2015-02-16 19:57 ` law at redhat dot com
@ 2015-03-05 17:22 ` rguenth at gcc dot gnu.org
2015-03-05 23:07 ` steven at gcc dot gnu.org
` (29 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-05 17:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so it's already calculate_live_ranges that takes much memory. I have a
small patch to improve that somewhat.
But what we really need is to get the "must coalesce" stuff "coalesced" with
respect to both live and conflict computation. That is, map must-coalesce
SSA vars to the same partition. That loses the SSA corruption testing, but
well so it might be much more controversical (silent wrong-code instead of
ICE).
Unfortunately in the testcase there are only 2750 must-coalesces but
109493 partitions participating in the coalescing (so at least 50000 want
coalesces).
The good news is of course that we can simply choose to _not_ coalesce that
many variables, but say only the important ones.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (9 preceding siblings ...)
2015-03-05 17:22 ` rguenth at gcc dot gnu.org
@ 2015-03-05 23:07 ` steven at gcc dot gnu.org
2015-03-06 0:45 ` law at redhat dot com
` (28 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: steven at gcc dot gnu.org @ 2015-03-05 23:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |steven at gcc dot gnu.org
--- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> It seems that loop invariant motion is responsible for most of the abnormals,
> thus -fno-tree-loop-im restores performance.
>
> The loop LIM detects is of style
>
> <bb 6>: (header)
> # ___fp_3(ab) = PHI <___fp_41(4), ___fp_5(21)>
> # ___r1_7(ab) = PHI <___r1_42(4), ___r1_9(21)>
> # ___r2_11(ab) = PHI <___r2_43(4), ___r3_17(21)>
> # ___r3_19(ab) = PHI <___r3_44(4), ___r3_23(21)>
> # ___r4_25 = PHI <___r4_45(4), ___r4_26(21)>
> # gotovar.17_29 = PHI <_51(4), _69(21)>
> goto gotovar.17_29;
Perhaps disable LIM (and maybe PRE) if the CFG has a large edge/bb ratio (i.e.
dense CFG)? There's probably no benefit in such cases anyway.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (10 preceding siblings ...)
2015-03-05 23:07 ` steven at gcc dot gnu.org
@ 2015-03-06 0:45 ` law at redhat dot com
2015-03-06 10:53 ` rguenth at gcc dot gnu.org
` (27 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: law at redhat dot com @ 2015-03-06 0:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #13 from Jeffrey A. Law <law at redhat dot com> ---
I think we've done similar things for Brad's large testcases in the past. You
want to look at both the edge/bb density as well as the overall size. ie, a
high density doesn't really hurt if the total cfg is small.
See "is_too_expensive" in gcse.c for the current heuristics to avoid trying
global opts on these kinds of testcases.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (11 preceding siblings ...)
2015-03-06 0:45 ` law at redhat dot com
@ 2015-03-06 10:53 ` rguenth at gcc dot gnu.org
2015-03-06 12:35 ` rguenth at gcc dot gnu.org
` (26 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-06 10:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that if we fix out-of-SSA coalescing (patch in testing) then RTL CSE
explodes via DF.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (12 preceding siblings ...)
2015-03-06 10:53 ` rguenth at gcc dot gnu.org
@ 2015-03-06 12:35 ` rguenth at gcc dot gnu.org
2015-03-06 12:47 ` rguenth at gcc dot gnu.org
` (25 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-06 12:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Mar 6 12:34:28 2015
New Revision: 221237
URL: https://gcc.gnu.org/viewcvs?rev=221237&root=gcc&view=rev
Log:
2015-03-06 Richard Biener <rguenther@suse.de>
PR middle-end/64928
* tree-ssa-live.h (struct tree_live_info_d): Add livein_obstack
and liveout_obstack members.
(calculate_live_on_exit): Remove.
(calculate_live_ranges): Change declaration.
* tree-ssa-live.c (liveness_bitmap_obstack): Remove global var.
(new_tree_live_info): Adjust.
(calculate_live_ranges): Delete livein when not wanted.
(calculate_live_ranges): Do not initialize liveness_bitmap_obstack.
Deal with partly deleted live info.
(loe_visit_block): Remove temporary bitmap by using
bitmap_ior_and_compl_into.
(live_worklist): Adjust accordingly.
(calculate_live_on_exit): Make static.
* tree-ssa-coalesce.c (coalesce_ssa_name): Tell calculate_live_ranges
we do not need livein.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-coalesce.c
trunk/gcc/tree-ssa-live.c
trunk/gcc/tree-ssa-live.h
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (13 preceding siblings ...)
2015-03-06 12:35 ` rguenth at gcc dot gnu.org
@ 2015-03-06 12:47 ` rguenth at gcc dot gnu.org
2015-03-06 12:53 ` rguenth at gcc dot gnu.org
` (24 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-06 12:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34974
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34974&action=edit
Patch to limit coalescing amount
The committed patch improves peak memory usage from 7.6GB to 5.8GB for the
small testcase.
The attached patch reduces memory usage from SSA coalescing further (to ~300MB)
by simply doing less coalescing. Unfortunately the generated RTL puts a bigger
load on CSE/DF and thus we need 7.6GB again (eventually one can find an optimal
--param max-out-of-ssa-coalesce-names, but that's probably highly testcase
specific).
In theory you can iterate on coalescing piecewise as well, but the overhead
for doing this might be too big (basically up to computing live/conflict
for each coalesce pair separately, taking into account previous coalesces).
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (14 preceding siblings ...)
2015-03-06 12:47 ` rguenth at gcc dot gnu.org
@ 2015-03-06 12:53 ` rguenth at gcc dot gnu.org
2015-03-06 13:01 ` rguenth at gcc dot gnu.org
` (23 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-06 12:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34975
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34975&action=edit
do not compute live/conflict for abnormal coalesces
This is the other idea of simply not computing live/conflict for abnormal
coalesces we know to always succeed. This shrinks the following live/conflict
problem for the regular coalesces by unifying some partitions.
Doesn't help this particular testcase much.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (15 preceding siblings ...)
2015-03-06 12:53 ` rguenth at gcc dot gnu.org
@ 2015-03-06 13:01 ` rguenth at gcc dot gnu.org
2015-03-18 12:54 ` rguenth at gcc dot gnu.org
` (22 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-06 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> Created attachment 34975 [details]
> do not compute live/conflict for abnormal coalesces
>
> This is the other idea of simply not computing live/conflict for abnormal
> coalesces we know to always succeed. This shrinks the following
> live/conflict
> problem for the regular coalesces by unifying some partitions.
>
> Doesn't help this particular testcase much.
But it fixes PR63155 ...
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (16 preceding siblings ...)
2015-03-06 13:01 ` rguenth at gcc dot gnu.org
@ 2015-03-18 12:54 ` rguenth at gcc dot gnu.org
2015-05-20 14:49 ` [Bug middle-end/64928] [4.8/4.9/5/6 " wellnhofer at aevum dot de
` (21 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-03-18 12:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5/6 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (17 preceding siblings ...)
2015-03-18 12:54 ` rguenth at gcc dot gnu.org
@ 2015-05-20 14:49 ` wellnhofer at aevum dot de
2015-05-20 14:49 ` wellnhofer at aevum dot de
` (20 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: wellnhofer at aevum dot de @ 2015-05-20 14:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Bug 64928 depends on bug 66209, which changed state.
Bug 66209 Summary: Out of memory when compiling with --coverage and optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66209
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |DUPLICATE
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5/6 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (18 preceding siblings ...)
2015-05-20 14:49 ` [Bug middle-end/64928] [4.8/4.9/5/6 " wellnhofer at aevum dot de
@ 2015-05-20 14:49 ` wellnhofer at aevum dot de
2015-06-23 8:19 ` rguenth at gcc dot gnu.org
` (19 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: wellnhofer at aevum dot de @ 2015-05-20 14:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Nick Wellnhofer <wellnhofer at aevum dot de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wellnhofer at aevum dot de
--- Comment #19 from Nick Wellnhofer <wellnhofer at aevum dot de> ---
*** Bug 66209 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.8/4.9/5/6 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (19 preceding siblings ...)
2015-05-20 14:49 ` wellnhofer at aevum dot de
@ 2015-06-23 8:19 ` rguenth at gcc dot gnu.org
2015-06-26 19:56 ` [Bug middle-end/64928] [4.9/5/6 " jakub at gcc dot gnu.org
` (18 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-23 8:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.8.5 |4.9.3
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.9/5/6 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (20 preceding siblings ...)
2015-06-23 8:19 ` rguenth at gcc dot gnu.org
@ 2015-06-26 19:56 ` jakub at gcc dot gnu.org
2015-06-26 20:28 ` jakub at gcc dot gnu.org
` (17 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 19:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.9.3 has been released.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [4.9/5/6 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (21 preceding siblings ...)
2015-06-26 19:56 ` [Bug middle-end/64928] [4.9/5/6 " jakub at gcc dot gnu.org
@ 2015-06-26 20:28 ` jakub at gcc dot gnu.org
2020-09-29 0:14 ` [Bug middle-end/64928] [8/9/10/11 " lucier at math dot purdue.edu
` (16 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-06-26 20:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.9.3 |4.9.4
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (22 preceding siblings ...)
2015-06-26 20:28 ` jakub at gcc dot gnu.org
@ 2020-09-29 0:14 ` lucier at math dot purdue.edu
2020-09-29 7:09 ` rguenth at gcc dot gnu.org
` (15 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2020-09-29 0:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #30 from lucier at math dot purdue.edu ---
I'm coming back to this project.
I naively thought "Well, I don't need arc profiling, I'll just set
-ftest-coverage without -fprofile-arcs" but it appears that I can't do that,
the gcda files are generated by -fprofile-arcs.
It seems to me that test coverage could be implemented simply by instrumenting
each basic block in an algorithm that's linear in the number of basic blocks.
Is it possible to do this?
Brad
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (23 preceding siblings ...)
2020-09-29 0:14 ` [Bug middle-end/64928] [8/9/10/11 " lucier at math dot purdue.edu
@ 2020-09-29 7:09 ` rguenth at gcc dot gnu.org
2020-09-29 12:17 ` lucier at math dot purdue.edu
` (14 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-29 7:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to lucier from comment #30)
> I'm coming back to this project.
>
> I naively thought "Well, I don't need arc profiling, I'll just set
> -ftest-coverage without -fprofile-arcs" but it appears that I can't do that,
> the gcda files are generated by -fprofile-arcs.
>
> It seems to me that test coverage could be implemented simply by
> instrumenting each basic block in an algorithm that's linear in the number
> of basic blocks. Is it possible to do this?
>
> Brad
I don't think the instrumentation itself is the problem - it's already
doing better than one counter per block. It's simply that the large
source runs into multiple non-linearities in core pieces of the compiler
that cannot be turned off ...
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (24 preceding siblings ...)
2020-09-29 7:09 ` rguenth at gcc dot gnu.org
@ 2020-09-29 12:17 ` lucier at math dot purdue.edu
2020-09-29 13:06 ` rguenther at suse dot de
` (13 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2020-09-29 12:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #32 from lucier at math dot purdue.edu ---
I don't know precisely what you're saying, but it compiles fine without the
instrumentation.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (25 preceding siblings ...)
2020-09-29 12:17 ` lucier at math dot purdue.edu
@ 2020-09-29 13:06 ` rguenther at suse dot de
2021-03-10 2:10 ` lucier at math dot purdue.edu
` (12 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenther at suse dot de @ 2020-09-29 13:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #33 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 29 Sep 2020, lucier at math dot purdue.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
>
> --- Comment #32 from lucier at math dot purdue.edu ---
> I don't know precisely what you're saying, but it compiles fine without the
> instrumentation.
Yes - the instrumentation does complicate the IL but the instrumentation
should be already better than linear in the blocks.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (26 preceding siblings ...)
2020-09-29 13:06 ` rguenther at suse dot de
@ 2021-03-10 2:10 ` lucier at math dot purdue.edu
2021-03-10 2:13 ` lucier at math dot purdue.edu
` (11 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2021-03-10 2:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #34 from lucier at math dot purdue.edu ---
I decided to approach this a bit more methodically by generating a series of
synthetic programs, each twice as long as the previous, and to measure the
compilation time. I'll attach the associated .i files here.
Each .i file was generated from a Scheme file with 2^k copies, k=1,..,5, of a
simple recursive definition of the fibonacci function, suitably renamed. So
these are not large files by my standards.
The short summary is that CPU time seems to grow quadraticly with the length of
the code. The required memory grows very quickly, too---I killed the
compilation with k=5 (so 32 copies of fibonacci function) because the
computation filled 32GB of RAM and 32GB of swap.
Perhaps this parameterized input files might be of help.
Brad
I downloaded the git sources for gcc:
heine:~/programs/gcc/gcc-mainline> git log
commit 7eef9a66018e23677058fec421229e3fa435a1a3 (HEAD -> master, origin/master,
origin/HEAD)
Author: Joel Brobecker <brobecker@adacore.com>
Date: Mon Mar 8 23:59:37 2021 -0300
I configured and built gcc with
heine:~/programs/gcc/gcc-mainline> /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/pkgs/gcc-mainline/bin/gcc
COLLECT_LTO_WRAPPER=/pkgs/gcc-mainline/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../../gcc-mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.1 20210309 (experimental) (GCC)
The program names are fib-1.c to fib-5.c, fib-k.c contains 2^k copies of
fibonacci.
/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1
-Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv
-fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2
-fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared
-D___SINGLE_HOST -D___DYNAMIC
-I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-1.o1' -Q
-fprofile-arcs -ftest-coverage -save-temps 'fib-1.c'
Time variable usr sys wall
GGC
phase setup : 0.02 (100%) 0.00 ( 0%) 0.03 (100%)
5039k (100%)
TOTAL : 0.02 0.00 0.03
5049k
btowc wctob mbrlen ___H_fib_2d_1 ___setup_mod ___init_mod
___LNK_fib_2d_1_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 1240k} <visibility> {heap 1240k} <build_ssa_passes>
{heap 1240k} <opt_local_passes> {heap 1240k} <remove_symbols> {heap 2468k}
<targetclone> {heap 2468k} <profile> {heap 2468k} <free-fnsummary> {heap
2468k}Streaming LTO
<whole-program> {heap 2468k} <profile_estimate> {heap 2468k} <fnsummary> {heap
2468k} <inline> {heap 2468k} <pure-const> {heap 2468k} <modref> {heap 2468k}
<free-fnsummary> {heap 2468k} <static-var> {heap 2468k} <single-use> {heap
2468k} <comdats> {heap 2468k}Assembling functions:
<simdclone> {heap 2468k} ___setup_mod ___init_mod ___H_fib_2d_1
___LNK_fib_2d_1_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
1519k ( 6%)
phase parsing : 0.06 ( 8%) 0.01 ( 20%) 0.08 ( 10%)
2072k ( 8%)
phase opt and generate : 0.67 ( 92%) 0.04 ( 80%) 0.70 ( 89%)
22M ( 86%)
dump files : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
callgraph functions expansion : 0.66 ( 90%) 0.03 ( 60%) 0.69 ( 87%)
21M ( 82%)
callgraph ipa passes : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
570k ( 2%)
cfg cleanup : 0.00 ( 0%) 0.00 ( 0%) 0.04 ( 5%)
64 ( 0%)
trivially dead code : 0.00 ( 0%) 0.01 ( 20%) 0.00 ( 0%)
0 ( 0%)
df live regs : 0.01 ( 1%) 0.00 ( 0%) 0.02 ( 3%)
0 ( 0%)
df live&initialized regs : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%)
0 ( 0%)
df reg dead/unused notes : 0.02 ( 3%) 0.00 ( 0%) 0.01 ( 1%)
305k ( 1%)
alias analysis : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
1482k ( 6%)
alias stmt walking : 0.02 ( 3%) 0.01 ( 20%) 0.02 ( 3%)
7280 ( 0%)
rebuild jump labels : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
preprocessing : 0.02 ( 3%) 0.00 ( 0%) 0.01 ( 1%)
240k ( 1%)
lexical analysis : 0.02 ( 3%) 0.01 ( 20%) 0.00 ( 0%)
0 ( 0%)
parser (global) : 0.01 ( 1%) 0.00 ( 0%) 0.04 ( 5%)
1239k ( 5%)
parser struct body : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
359k ( 1%)
parser function body : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 3%)
201k ( 1%)
tree gimplify : 0.00 ( 0%) 0.01 ( 20%) 0.00 ( 0%)
297k ( 1%)
tree copy propagation : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
13k ( 0%)
tree SSA rewrite : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
356k ( 1%)
tree SSA incremental : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
2918k ( 11%)
tree operand scan : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
314k ( 1%)
dominator optimization : 0.03 ( 4%) 0.01 ( 20%) 0.04 ( 5%)
531k ( 2%)
tree FRE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
36k ( 0%)
tree forward propagate : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%)
34k ( 0%)
tree conservative DCE : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
6224 ( 0%)
tree DSE : 0.03 ( 4%) 0.00 ( 0%) 0.04 ( 5%)
0 ( 0%)
tree loop invariant motion : 0.01 ( 1%) 0.00 ( 0%) 0.03 ( 4%)
2496k ( 9%)
tree strlen optimization : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
83k ( 0%)
dominance computation : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
out of ssa : 0.03 ( 4%) 0.00 ( 0%) 0.02 ( 3%)
64k ( 0%)
expand : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
2473k ( 9%)
forward prop : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%)
81k ( 0%)
CSE : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
211k ( 1%)
dead store elim2 : 0.01 ( 1%) 0.00 ( 0%) 0.02 ( 3%)
701k ( 3%)
loop init : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
29k ( 0%)
loop fini : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
116k ( 0%)
combiner : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
108k ( 0%)
if-conversion : 0.02 ( 3%) 0.00 ( 0%) 0.00 ( 0%)
666k ( 3%)
integrated RA : 0.06 ( 8%) 0.00 ( 0%) 0.05 ( 6%)
3986k ( 15%)
LRA non-specific : 0.05 ( 7%) 0.00 ( 0%) 0.06 ( 8%)
1324k ( 5%)
LRA reload inheritance : 0.01 ( 1%) 0.00 ( 0%) 0.01 ( 1%)
224 ( 0%)
LRA create live ranges : 0.09 ( 12%) 0.00 ( 0%) 0.08 ( 10%)
241k ( 1%)
LRA hard reg assignment : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%)
0 ( 0%)
reload CSE regs : 0.02 ( 3%) 0.00 ( 0%) 0.02 ( 3%)
368k ( 1%)
thread pro- & epilogue : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
10k ( 0%)
hard reg cprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
288 ( 0%)
scheduling 2 : 0.04 ( 5%) 0.00 ( 0%) 0.04 ( 5%)
149k ( 1%)
shorten branches : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
0 ( 0%)
final : 0.01 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
816k ( 3%)
initialize rtl : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 1%)
12k ( 0%)
rest of compilation : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 3%)
66k ( 0%)
TOTAL : 0.73 0.05 0.79
25M
/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1
-Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv
-fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2
-fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared
-D___SINGLE_HOST -D___DYNAMIC
-I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-2.o1' -Q
-fprofile-arcs -ftest-coverage -save-temps 'fib-2.c'
Time variable usr sys wall
GGC
phase setup : 0.01 (100%) 0.02 (100%) 0.04 (100%)
7596k (100%)
TOTAL : 0.01 0.02 0.04
7606k
btowc wctob mbrlen ___H_fib_2d_2 ___setup_mod ___init_mod
___LNK_fib_2d_2_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 1432k} <visibility> {heap 1432k} <build_ssa_passes>
{heap 1432k} <opt_local_passes> {heap 1432k} <remove_symbols> {heap 3104k}
<targetclone> {heap 3104k} <profile> {heap 3104k} <free-fnsummary> {heap
3104k}Streaming LTO
<whole-program> {heap 3104k} <profile_estimate> {heap 3104k} <fnsummary> {heap
3104k} <inline> {heap 3104k} <pure-const> {heap 3104k} <modref> {heap 3104k}
<free-fnsummary> {heap 3104k} <static-var> {heap 3104k} <single-use> {heap
3104k} <comdats> {heap 3104k}Assembling functions:
<simdclone> {heap 3104k} ___setup_mod ___init_mod ___H_fib_2d_2
___LNK_fib_2d_2_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1519k ( 2%)
phase parsing : 0.04 ( 1%) 0.05 ( 36%) 0.10 ( 3%)
2500k ( 4%)
phase opt and generate : 2.78 ( 99%) 0.09 ( 64%) 2.88 ( 97%)
62M ( 94%)
callgraph construction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
26k ( 0%)
callgraph functions expansion : 2.75 ( 98%) 0.09 ( 64%) 2.85 ( 96%)
61M ( 92%)
callgraph ipa passes : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
939k ( 1%)
ipa pure const : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
cfg cleanup : 0.04 ( 1%) 0.00 ( 0%) 0.04 ( 1%)
64 ( 0%)
trivially dead code : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
df scan insns : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
288 ( 0%)
df reaching defs : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
df live regs : 0.07 ( 2%) 0.00 ( 0%) 0.10 ( 3%)
0 ( 0%)
df live&initialized regs : 0.08 ( 3%) 0.00 ( 0%) 0.07 ( 2%)
0 ( 0%)
df reg dead/unused notes : 0.05 ( 2%) 0.01 ( 7%) 0.06 ( 2%)
935k ( 1%)
register information : 0.04 ( 1%) 0.00 ( 0%) 0.03 ( 1%)
0 ( 0%)
alias analysis : 0.02 ( 1%) 0.00 ( 0%) 0.00 ( 0%)
2960k ( 4%)
alias stmt walking : 0.13 ( 5%) 0.02 ( 14%) 0.10 ( 3%)
7472 ( 0%)
rebuild jump labels : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 1%)
0 ( 0%)
preprocessing : 0.00 ( 0%) 0.03 ( 21%) 0.03 ( 1%)
250k ( 0%)
lexical analysis : 0.02 ( 1%) 0.02 ( 14%) 0.06 ( 2%)
0 ( 0%)
parser (global) : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1252k ( 2%)
parser struct body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
359k ( 1%)
parser function body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
608k ( 1%)
inline parameters : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
39k ( 0%)
tree gimplify : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
505k ( 1%)
tree CFG cleanup : 0.02 ( 1%) 0.01 ( 7%) 0.02 ( 1%)
320k ( 0%)
tree copy propagation : 0.04 ( 1%) 0.00 ( 0%) 0.05 ( 2%)
24k ( 0%)
tree PTA : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
13k ( 0%)
tree SSA rewrite : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
605k ( 1%)
tree SSA incremental : 0.05 ( 2%) 0.00 ( 0%) 0.06 ( 2%)
9895k ( 14%)
tree operand scan : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
882k ( 1%)
dominator optimization : 0.13 ( 5%) 0.00 ( 0%) 0.16 ( 5%)
1261k ( 2%)
tree split crit edges : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1410k ( 2%)
tree reassociation : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
48 ( 0%)
tree code sinking : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
1680k ( 2%)
tree forward propagate : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 1%)
63k ( 0%)
tree conservative DCE : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
8288 ( 0%)
tree aggressive DCE : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
40 ( 0%)
tree DSE : 0.11 ( 4%) 0.00 ( 0%) 0.12 ( 4%)
0 ( 0%)
tree loop invariant motion : 0.09 ( 3%) 0.01 ( 7%) 0.09 ( 3%)
7961k ( 12%)
tree iv optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
22k ( 0%)
tree SSA uncprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree strlen optimization : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
149k ( 0%)
tree modref : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
2800 ( 0%)
dominance computation : 0.02 ( 1%) 0.00 ( 0%) 0.05 ( 2%)
0 ( 0%)
out of ssa : 0.11 ( 4%) 0.01 ( 7%) 0.13 ( 4%)
752 ( 0%)
expand : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
7567k ( 11%)
post expand cleanups : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
49k ( 0%)
varconst : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
1024 ( 0%)
forward prop : 0.09 ( 3%) 0.00 ( 0%) 0.09 ( 3%)
255k ( 0%)
CSE : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
659k ( 1%)
dead code elimination : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
dead store elim1 : 0.02 ( 1%) 0.00 ( 0%) 0.03 ( 1%)
467k ( 1%)
dead store elim2 : 0.04 ( 1%) 0.00 ( 0%) 0.03 ( 1%)
2157k ( 3%)
loop init : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
36k ( 0%)
loop fini : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
352k ( 1%)
combiner : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
260k ( 0%)
if-conversion : 0.03 ( 1%) 0.00 ( 0%) 0.04 ( 1%)
2511k ( 4%)
integrated RA : 0.21 ( 7%) 0.01 ( 7%) 0.22 ( 7%)
9272k ( 14%)
LRA non-specific : 0.18 ( 6%) 0.01 ( 7%) 0.16 ( 5%)
4240k ( 6%)
LRA virtuals elimination : 0.03 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
1264k ( 2%)
LRA reload inheritance : 0.04 ( 1%) 0.00 ( 0%) 0.04 ( 1%)
0 ( 0%)
LRA create live ranges : 0.41 ( 15%) 0.00 ( 0%) 0.44 ( 15%)
757k ( 1%)
LRA hard reg assignment : 0.08 ( 3%) 0.01 ( 7%) 0.09 ( 3%)
0 ( 0%)
reload CSE regs : 0.05 ( 2%) 0.00 ( 0%) 0.05 ( 2%)
1113k ( 2%)
thread pro- & epilogue : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
10k ( 0%)
if-conversion 2 : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
combine stack adjustments : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
hard reg cprop : 0.02 ( 1%) 0.00 ( 0%) 0.02 ( 1%)
432 ( 0%)
scheduling 2 : 0.11 ( 4%) 0.00 ( 0%) 0.12 ( 4%)
457k ( 1%)
reorder blocks : 0.02 ( 1%) 0.00 ( 0%) 0.01 ( 0%)
370k ( 1%)
shorten branches : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
final : 0.03 ( 1%) 0.00 ( 0%) 0.03 ( 1%)
2482k ( 4%)
straight-line strength reduction : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
4440 ( 0%)
rest of compilation : 0.08 ( 3%) 0.00 ( 0%) 0.03 ( 1%)
179k ( 0%)
remove unused locals : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
TOTAL : 2.82 0.14 2.98
66M
/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1
-Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv
-fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2
-fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared
-D___SINGLE_HOST -D___DYNAMIC
-I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-3.o1' -Q
-fprofile-arcs -ftest-coverage -save-temps 'fib-3.c'
Time variable usr sys wall
GGC
phase setup : 0.04 (100%) 0.00 ( 0%) 0.04 (100%)
8613k (100%)
TOTAL : 0.04 0.00 0.04
8624k
btowc wctob mbrlen ___H_fib_2d_3 ___setup_mod ___init_mod
___LNK_fib_2d_3_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 1436k} <visibility> {heap 1436k} <build_ssa_passes>
{heap 1436k} <opt_local_passes> {heap 1436k} <remove_symbols> {heap 3060k}
<targetclone> {heap 3060k} <profile> {heap 3060k} <free-fnsummary> {heap
3060k}Streaming LTO
<whole-program> {heap 3060k} <profile_estimate> {heap 3060k} <fnsummary> {heap
3060k} <inline> {heap 3060k} <pure-const> {heap 3060k} <modref> {heap 3060k}
<free-fnsummary> {heap 3060k} <static-var> {heap 3060k} <single-use> {heap
3060k} <comdats> {heap 3060k}Assembling functions:
<simdclone> {heap 3060k} ___setup_mod ___init_mod ___H_fib_2d_3
___LNK_fib_2d_3_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1519k ( 1%)
phase parsing : 0.09 ( 1%) 0.05 ( 11%) 0.14 ( 1%)
2845k ( 1%)
phase opt and generate : 13.80 ( 99%) 0.42 ( 89%) 14.22 ( 99%)
220M ( 98%)
callgraph functions expansion : 13.76 ( 99%) 0.42 ( 89%) 14.17 ( 99%)
216M ( 97%)
callgraph ipa passes : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
1687k ( 1%)
ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
176k ( 0%)
ipa profile : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
300k ( 0%)
ipa pure const : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
cfg construction : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
82k ( 0%)
cfg cleanup : 0.20 ( 1%) 0.01 ( 2%) 0.19 ( 1%)
64 ( 0%)
trivially dead code : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
df scan insns : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
288 ( 0%)
df reaching defs : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
df live regs : 0.37 ( 3%) 0.00 ( 0%) 0.40 ( 3%)
0 ( 0%)
df live&initialized regs : 0.37 ( 3%) 0.01 ( 2%) 0.38 ( 3%)
0 ( 0%)
df reg dead/unused notes : 0.17 ( 1%) 0.01 ( 2%) 0.18 ( 1%)
3229k ( 1%)
register information : 0.15 ( 1%) 0.00 ( 0%) 0.17 ( 1%)
0 ( 0%)
alias analysis : 0.07 ( 1%) 0.00 ( 0%) 0.05 ( 0%)
11M ( 5%)
alias stmt walking : 1.02 ( 7%) 0.02 ( 4%) 0.93 ( 6%)
7856 ( 0%)
rebuild jump labels : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
preprocessing : 0.03 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
268k ( 0%)
lexical analysis : 0.04 ( 0%) 0.02 ( 4%) 0.03 ( 0%)
0 ( 0%)
parser (global) : 0.00 ( 0%) 0.01 ( 2%) 0.03 ( 0%)
1275k ( 1%)
parser struct body : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
359k ( 0%)
parser function body : 0.01 ( 0%) 0.02 ( 4%) 0.04 ( 0%)
911k ( 0%)
tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
937k ( 0%)
tree CFG cleanup : 0.11 ( 1%) 0.00 ( 0%) 0.14 ( 1%)
1373k ( 1%)
tree copy propagation : 0.17 ( 1%) 0.00 ( 0%) 0.17 ( 1%)
48k ( 0%)
tree PTA : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
23k ( 0%)
tree SSA rewrite : 0.13 ( 1%) 0.00 ( 0%) 0.13 ( 1%)
1877k ( 1%)
tree SSA other : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
952 ( 0%)
tree SSA incremental : 0.24 ( 2%) 0.01 ( 2%) 0.24 ( 2%)
34M ( 15%)
tree operand scan : 0.01 ( 0%) 0.02 ( 4%) 0.03 ( 0%)
2882k ( 1%)
dominator optimization : 0.43 ( 3%) 0.01 ( 2%) 0.58 ( 4%)
4002k ( 2%)
tree CCP : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
47k ( 0%)
tree split crit edges : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
5019k ( 2%)
tree reassociation : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
48 ( 0%)
tree FRE : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
110k ( 0%)
tree code sinking : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
6070k ( 3%)
tree linearize phis : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
6432 ( 0%)
tree forward propagate : 0.20 ( 1%) 0.02 ( 4%) 0.21 ( 1%)
119k ( 0%)
tree conservative DCE : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
16k ( 0%)
tree aggressive DCE : 0.08 ( 1%) 0.00 ( 0%) 0.07 ( 0%)
40 ( 0%)
tree DSE : 0.47 ( 3%) 0.00 ( 0%) 0.47 ( 3%)
0 ( 0%)
tree loop invariant motion : 0.61 ( 4%) 0.04 ( 9%) 0.65 ( 5%)
27M ( 12%)
complete unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
544 ( 0%)
tree iv optimization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
47k ( 0%)
tree SSA uncprop : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
tree strlen optimization : 0.09 ( 1%) 0.00 ( 0%) 0.10 ( 1%)
281k ( 0%)
tree modref : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
2800 ( 0%)
dominance computation : 0.16 ( 1%) 0.00 ( 0%) 0.14 ( 1%)
0 ( 0%)
out of ssa : 0.72 ( 5%) 0.12 ( 26%) 0.85 ( 6%)
512k ( 0%)
expand : 0.10 ( 1%) 0.02 ( 4%) 0.11 ( 1%)
25M ( 11%)
post expand cleanups : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
89k ( 0%)
forward prop : 0.35 ( 3%) 0.01 ( 2%) 0.35 ( 2%)
888k ( 0%)
CSE : 0.10 ( 1%) 0.00 ( 0%) 0.11 ( 1%)
2302k ( 1%)
dead code elimination : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
dead store elim1 : 0.08 ( 1%) 0.00 ( 0%) 0.09 ( 1%)
1532k ( 1%)
dead store elim2 : 0.13 ( 1%) 0.00 ( 0%) 0.14 ( 1%)
7464k ( 3%)
loop init : 0.08 ( 1%) 0.00 ( 0%) 0.11 ( 1%)
50k ( 0%)
loop invariant motion : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
58k ( 0%)
loop fini : 0.03 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
928k ( 0%)
combiner : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%)
736k ( 0%)
if-conversion : 0.10 ( 1%) 0.00 ( 0%) 0.09 ( 1%)
9292k ( 4%)
integrated RA : 1.16 ( 8%) 0.01 ( 2%) 1.15 ( 8%)
37M ( 17%)
LRA non-specific : 0.93 ( 7%) 0.01 ( 2%) 0.95 ( 7%)
10M ( 5%)
LRA virtuals elimination : 0.06 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
4366k ( 2%)
LRA reload inheritance : 0.23 ( 2%) 0.00 ( 0%) 0.23 ( 2%)
0 ( 0%)
LRA create live ranges : 2.41 ( 17%) 0.00 ( 0%) 2.41 ( 17%)
2648k ( 1%)
LRA hard reg assignment : 0.78 ( 6%) 0.02 ( 4%) 0.78 ( 5%)
0 ( 0%)
reload : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
144 ( 0%)
reload CSE regs : 0.16 ( 1%) 0.01 ( 2%) 0.16 ( 1%)
3807k ( 2%)
thread pro- & epilogue : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
10k ( 0%)
if-conversion 2 : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
combine stack adjustments : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
hard reg cprop : 0.07 ( 1%) 0.02 ( 4%) 0.08 ( 1%)
720 ( 0%)
scheduling 2 : 0.36 ( 3%) 0.01 ( 2%) 0.35 ( 2%)
1590k ( 1%)
machine dep reorg : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
reorder blocks : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
1180k ( 1%)
shorten branches : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
final : 0.07 ( 1%) 0.01 ( 2%) 0.08 ( 1%)
8569k ( 4%)
straight-line strength reduction : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
8232 ( 0%)
rest of compilation : 0.13 ( 1%) 0.03 ( 6%) 0.18 ( 1%)
342k ( 0%)
remove unused locals : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
address taken : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
TOTAL : 13.89 0.47 14.36
224M
/pkgs/gcc-mainline/bin/gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1
-Wno-unused -Wno-write-strings -Wdisabled-optimization -fwrapv
-fno-strict-aliasing -fno-trapping-math -fno-math-errno -fschedule-insns2
-fomit-frame-pointer -fPIC -fno-common -mpc64 -rdynamic -shared
-D___SINGLE_HOST -D___DYNAMIC
-I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-4.o1' -Q
-fprofile-arcs -ftest-coverage -save-temps 'fib-4.c'
Time variable usr sys wall
GGC
phase setup : 0.05 (100%) 0.00 ( 0%) 0.06 (100%)
10M (100%)
TOTAL : 0.05 0.00 0.06
10M
btowc wctob mbrlen ___H_fib_2d_4 ___setup_mod ___init_mod
___LNK_fib_2d_4_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 1652k} <visibility> {heap 1652k} <build_ssa_passes>
{heap 1652k} <opt_local_passes> {heap 1652k} <remove_symbols> {heap 4168k}
<targetclone> {heap 4168k} <profile> {heap 4168k} <free-fnsummary> {heap
4168k}Streaming LTO
<whole-program> {heap 4168k} <profile_estimate> {heap 4168k} <fnsummary> {heap
4168k} <inline> {heap 4168k} <pure-const> {heap 4168k} <modref> {heap 4168k}
<free-fnsummary> {heap 4168k} <static-var> {heap 4168k} <single-use> {heap
4168k} <comdats> {heap 4168k}Assembling functions:
<simdclone> {heap 4168k} ___setup_mod ___init_mod ___H_fib_2d_4 {GC
madv_dontneed 556k} {GC 264M -> 260M} {GC madv_dontneed 116k} {GC 526M -> 302M}
___LNK_fib_2d_4_2e_o1 _sub_I_00100_0 _sub_D_00100_1
Time variable usr sys wall
GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
1519k ( 0%)
phase parsing : 0.16 ( 0%) 0.08 ( 3%) 0.23 ( 0%)
4049k ( 1%)
phase lang. deferred : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
96 ( 0%)
phase opt and generate : 55.79 (100%) 2.22 ( 97%) 58.03 (100%)
712M ( 99%)
garbage collection : 0.38 ( 1%) 0.00 ( 0%) 0.38 ( 1%)
0 ( 0%)
dump files : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
callgraph construction : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1108k ( 0%)
callgraph optimization : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
19k ( 0%)
callgraph functions expansion : 55.71 (100%) 2.21 ( 96%) 57.94 ( 99%)
706M ( 98%)
callgraph ipa passes : 0.07 ( 0%) 0.01 ( 0%) 0.09 ( 0%)
3221k ( 0%)
ipa function summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
335k ( 0%)
ipa inlining heuristics : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
16 ( 0%)
ipa profile : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%)
605k ( 0%)
ipa pure const : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
cfg construction : 0.06 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
159k ( 0%)
cfg cleanup : 0.68 ( 1%) 0.02 ( 1%) 0.69 ( 1%)
48 ( 0%)
trivially dead code : 0.11 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
0 ( 0%)
df scan insns : 0.09 ( 0%) 0.01 ( 0%) 0.11 ( 0%)
288 ( 0%)
df live regs : 1.30 ( 2%) 0.04 ( 2%) 1.36 ( 2%)
0 ( 0%)
df live&initialized regs : 1.52 ( 3%) 0.03 ( 1%) 1.56 ( 3%)
0 ( 0%)
df reg dead/unused notes : 0.52 ( 1%) 0.01 ( 0%) 0.54 ( 1%)
11M ( 2%)
register information : 0.34 ( 1%) 0.00 ( 0%) 0.34 ( 1%)
0 ( 0%)
alias analysis : 0.20 ( 0%) 0.00 ( 0%) 0.20 ( 0%)
26M ( 4%)
alias stmt walking : 7.31 ( 13%) 0.11 ( 5%) 7.32 ( 13%)
8624 ( 0%)
register scan : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
9008 ( 0%)
rebuild jump labels : 0.07 ( 0%) 0.00 ( 0%) 0.05 ( 0%)
0 ( 0%)
preprocessing : 0.02 ( 0%) 0.02 ( 1%) 0.07 ( 0%)
306k ( 0%)
lexical analysis : 0.06 ( 0%) 0.03 ( 1%) 0.10 ( 0%)
0 ( 0%)
parser (global) : 0.03 ( 0%) 0.02 ( 1%) 0.02 ( 0%)
1323k ( 0%)
parser function body : 0.05 ( 0%) 0.01 ( 0%) 0.05 ( 0%)
2029k ( 0%)
inline parameters : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
131k ( 0%)
tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
1802k ( 0%)
tree CFG construction : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
578k ( 0%)
tree CFG cleanup : 0.41 ( 1%) 0.00 ( 0%) 0.42 ( 1%)
5686k ( 1%)
tree copy propagation : 0.68 ( 1%) 0.00 ( 0%) 0.67 ( 1%)
96k ( 0%)
tree PTA : 0.01 ( 0%) 0.01 ( 0%) 0.02 ( 0%)
43k ( 0%)
tree PHI insertion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
866k ( 0%)
tree SSA rewrite : 0.57 ( 1%) 0.00 ( 0%) 0.57 ( 1%)
10M ( 1%)
tree SSA incremental : 1.15 ( 2%) 0.05 ( 2%) 1.20 ( 2%)
118M ( 16%)
tree operand scan : 0.10 ( 0%) 0.06 ( 3%) 0.25 ( 0%)
10M ( 1%)
dominator optimization : 3.64 ( 7%) 0.04 ( 2%) 3.82 ( 7%)
13M ( 2%)
tree CCP : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
94k ( 0%)
tree split crit edges : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
18M ( 3%)
tree reassociation : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
48 ( 0%)
tree FRE : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
208k ( 0%)
tree code sinking : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
18M ( 3%)
tree linearize phis : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
6432 ( 0%)
tree backward propagate : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
tree forward propagate : 1.65 ( 3%) 0.01 ( 0%) 1.66 ( 3%)
232k ( 0%)
tree conservative DCE : 0.29 ( 1%) 0.00 ( 0%) 0.29 ( 0%)
31k ( 0%)
tree aggressive DCE : 0.30 ( 1%) 0.00 ( 0%) 0.24 ( 0%)
40 ( 0%)
tree DSE : 1.88 ( 3%) 0.00 ( 0%) 1.89 ( 3%)
0 ( 0%)
tree loop invariant motion : 5.00 ( 9%) 0.15 ( 7%) 5.10 ( 9%)
103M ( 14%)
tree iv optimization : 0.01 ( 0%) 0.01 ( 0%) 0.02 ( 0%)
95k ( 0%)
tree SSA uncprop : 0.13 ( 0%) 0.00 ( 0%) 0.15 ( 0%)
0 ( 0%)
tree strlen optimization : 0.62 ( 1%) 0.00 ( 0%) 0.62 ( 1%)
547k ( 0%)
tree modref : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
2800 ( 0%)
dominance frontiers : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%)
0 ( 0%)
dominance computation : 0.58 ( 1%) 0.02 ( 1%) 0.59 ( 1%)
0 ( 0%)
out of ssa : 5.62 ( 10%) 1.11 ( 48%) 6.73 ( 12%)
2049k ( 0%)
expand vars : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
407k ( 0%)
expand : 0.39 ( 1%) 0.01 ( 0%) 0.42 ( 1%)
92M ( 13%)
post expand cleanups : 0.12 ( 0%) 0.00 ( 0%) 0.13 ( 0%)
169k ( 0%)
lower subreg : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
forward prop : 1.25 ( 2%) 0.05 ( 2%) 1.29 ( 2%)
3301k ( 0%)
CSE : 0.28 ( 1%) 0.00 ( 0%) 0.27 ( 0%)
8571k ( 1%)
dead code elimination : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%)
0 ( 0%)
dead store elim1 : 0.32 ( 1%) 0.00 ( 0%) 0.32 ( 1%)
5493k ( 1%)
dead store elim2 : 0.41 ( 1%) 0.00 ( 0%) 0.43 ( 1%)
23M ( 3%)
loop analysis : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%)
0 ( 0%)
loop init : 0.20 ( 0%) 0.00 ( 0%) 0.21 ( 0%)
62k ( 0%)
loop fini : 0.07 ( 0%) 0.02 ( 1%) 0.10 ( 0%)
3776k ( 1%)
combiner : 0.22 ( 0%) 0.00 ( 0%) 0.22 ( 0%)
2378k ( 0%)
if-conversion : 0.38 ( 1%) 0.01 ( 0%) 0.37 ( 1%)
36M ( 5%)
integrated RA : 5.43 ( 10%) 0.02 ( 1%) 5.44 ( 9%)
96M ( 13%)
LRA non-specific : 3.61 ( 6%) 0.01 ( 0%) 3.64 ( 6%)
21M ( 3%)
LRA virtuals elimination : 0.18 ( 0%) 0.01 ( 0%) 0.16 ( 0%)
15M ( 2%)
LRA create live ranges : 3.08 ( 6%) 0.01 ( 0%) 3.09 ( 5%)
2027k ( 0%)
LRA hard reg assignment : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 ( 0%)
reload : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
144 ( 0%)
reload CSE regs : 0.51 ( 1%) 0.00 ( 0%) 0.51 ( 1%)
13M ( 2%)
thread pro- & epilogue : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%)
9680 ( 0%)
if-conversion 2 : 0.05 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
24 ( 0%)
combine stack adjustments : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%)
0 ( 0%)
hard reg cprop : 0.21 ( 0%) 0.10 ( 4%) 0.31 ( 1%)
3288 ( 0%)
scheduling 2 : 1.36 ( 2%) 0.04 ( 2%) 1.38 ( 2%)
5904k ( 1%)
machine dep reorg : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%)
0 ( 0%)
reorder blocks : 0.19 ( 0%) 0.00 ( 0%) 0.23 ( 0%)
4176k ( 1%)
shorten branches : 0.14 ( 0%) 0.00 ( 0%) 0.14 ( 0%)
0 ( 0%)
final : 0.27 ( 0%) 0.01 ( 0%) 0.29 ( 0%)
31M ( 4%)
straight-line strength reduction : 0.10 ( 0%) 0.00 ( 0%) 0.10 ( 0%)
33k ( 0%)
rest of compilation : 0.93 ( 2%) 0.24 ( 10%) 1.15 ( 2%)
1158k ( 0%)
remove unused locals : 0.07 ( 0%) 0.00 ( 0%) 0.07 ( 0%)
0 ( 0%)
address taken : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%)
0 ( 0%)
repair loop structures : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%)
0 ( 0%)
TOTAL : 55.95 2.30 58.28
718M
heine:~/programs/gambit/gambit-profiled> /pkgs/gcc-mainline/bin/gcc
-march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY -O1 -Wno-unused
-Wno-write-strings -Wdisabled-optimization -fwrapv -fno-strict-aliasing
-fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer -fPIC
-fno-common -mpc64 -rdynamic -shared -D___SINGLE_HOST -D___DYNAMIC
-I"/home/lucier/programs/gambit/gambit-profiled/include" -o 'fib-5.o1' -Q
-fprofile-arcs -ftest-coverage -save-temps 'fib-5.c'
Time variable usr sys wall
GGC
phase setup : 0.08 (100%) 0.02 (100%) 0.13 ( 93%)
22M (100%)
TOTAL : 0.08 0.02 0.14
22M
btowc wctob mbrlen ___H_fib_2d_5 ___setup_mod ___init_mod
___LNK_fib_2d_5_2e_o1
Analyzing compilation unit
Performing interprocedural optimizations
<*free_lang_data> {heap 2884k} <visibility> {heap 2884k} <build_ssa_passes>
{heap 2884k} <opt_local_passes> {heap 3032k} <remove_symbols> {heap 7436k}
<targetclone> {heap 7436k} <profile> {heap 7436k} <free-fnsummary> {heap
7436k}Streaming LTO
<whole-program> {heap 7436k} <profile_estimate> {heap 7436k} <fnsummary> {heap
7436k} <inline> {heap 7436k} <pure-const> {heap 7436k} <modref> {heap 7436k}
<free-fnsummary> {heap 7436k} <static-var> {heap 7436k} <single-use> {heap
7436k} <comdats> {heap 7436k}Assembling functions:
<simdclone> {heap 7436k} ___setup_mod ___init_mod ___H_fib_2d_5gcc: fatal
error: Killed signal terminated program cc1
compilation terminated.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (27 preceding siblings ...)
2021-03-10 2:10 ` lucier at math dot purdue.edu
@ 2021-03-10 2:13 ` lucier at math dot purdue.edu
2021-03-10 9:47 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2021-03-10 2:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #35 from lucier at math dot purdue.edu ---
Created attachment 50345
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50345&action=edit
Parametrized input files for test coverage testing.
These are the .i files that go with my previous comment.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (28 preceding siblings ...)
2021-03-10 2:13 ` lucier at math dot purdue.edu
@ 2021-03-10 9:47 ` rguenth at gcc dot gnu.org
2021-03-10 14:16 ` lucier at math dot purdue.edu
` (9 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-10 9:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the issue is still the same - one thing I noticed is that store-motion also
adds a flag for each counter update to avoid introducing store-data-races.
-fallow-store-data-races mitigates that part and speeds up the compilation
quite a bit. In case there are threads involved you'd want
-fprofile-update=atomic
which then causes store-motion to give up and the compile-time is great
overall.
The original trigger of the regression is likely the marking of the profile
counters as to not be aliased - we might want to introduce another flag to
tell that store-data-races for the particular decl are not a consideration
(maybe even have some user-visible attribute for this).
Otherwise re-confirmed (I stripped options down to -O -fPIC -fprofile-arcs
-ftest-coverage):
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-2.o1-fib-2.i
1.84user 0.05system 0:01.90elapsed 99%CPU (0avgtext+0avgdata
160764maxresident)k
0inputs+0outputs (0major+58129minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-3.o1-fib-3.i
10.15user 0.17system 0:10.32elapsed 99%CPU (0avgtext+0avgdata
726688maxresident)k
0inputs+0outputs (0major+265008minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-4.o1-fib-4.i
43.60user 1.06system 0:44.68elapsed 99%CPU (0avgtext+0avgdata
6107260maxresident)k
0inputs+0outputs (0major+1765217minor)pagefaults 0swaps
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i
gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Command exited with non-zero status 1
143.09user 3.93system 2:28.29elapsed 99%CPU (0avgtext+0avgdata
24636148maxresident)k
37504inputs+0outputs (31major+6133278minor)pagefaults 0swaps
on the last which runs OOM adding -fallow-store-data-races does
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fallow-store-data-races
123.06user 0.45system 2:03.59elapsed 99%CPU (0avgtext+0avgdata
1777700maxresident)k
57304inputs+0outputs (68major+535127minor)pagefaults 0swaps
and -fprofile-update=atomic
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fprofile-update=atomic
0.61user 0.02system 0:00.63elapsed 100%CPU (0avgtext+0avgdata
73236maxresident)k
72inputs+0outputs (0major+18284minor)pagefaults 0swaps
and -fno-tree-loop-im
rguenther@ryzen:/tmp> /usr/bin/time ~/install/gcc-11.0/usr/local/bin/gcc -S -O
-fPIC -fprofile-arcs -ftest-coverage fib-5.o1-fib-5.i -fno-tree-loop-im
1.06user 0.01system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 90672maxresident)k
0inputs+0outputs (0major+24331minor)pagefaults 0swaps
I still wonder if you can produce an even smaller testcase where visualizing
the CFG is possible. Unfortunately the source is mechanically generated
and following it is hard. Like a testcase that retains the basic structure
but ends up with just a few (2, less than 10) computed gotos?
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (29 preceding siblings ...)
2021-03-10 9:47 ` rguenth at gcc dot gnu.org
@ 2021-03-10 14:16 ` lucier at math dot purdue.edu
2021-03-10 15:06 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2021-03-10 14:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #37 from lucier at math dot purdue.edu ---
Created attachment 50352
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50352&action=edit
Smaller parameterized test file
This file is generated from a single copy of the fibonacci function, and is
simplified a bit otherwise. I believe it has two computed gotos.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [8/9/10/11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (30 preceding siblings ...)
2021-03-10 14:16 ` lucier at math dot purdue.edu
@ 2021-03-10 15:06 ` rguenth at gcc dot gnu.org
2021-05-14 9:47 ` [Bug middle-end/64928] [9/10/11/12 " jakub at gcc dot gnu.org
` (7 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-10 15:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #38 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 50354
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50354&action=edit
SVG of the CFG at LIM
This is a SVG of the CFG as created by dot at the point of the first LIM pass.
The CFG isn't too special and I guess a switch instead of the computed goto
would present us with the same issues.
I suppose putting a hard limit on the number of stores to move and then
ordering candidates based on their importance (execution frequency) is the
way to go.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [9/10/11/12 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (31 preceding siblings ...)
2021-03-10 15:06 ` rguenth at gcc dot gnu.org
@ 2021-05-14 9:47 ` jakub at gcc dot gnu.org
2021-06-01 8:06 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-05-14 9:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|8.5 |9.4
--- Comment #39 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 8 branch is being closed.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [9/10/11/12 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (32 preceding siblings ...)
2021-05-14 9:47 ` [Bug middle-end/64928] [9/10/11/12 " jakub at gcc dot gnu.org
@ 2021-06-01 8:06 ` rguenth at gcc dot gnu.org
2022-05-27 9:35 ` [Bug middle-end/64928] [10/11/12/13 " rguenth at gcc dot gnu.org
` (5 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.4 |9.5
--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [10/11/12/13 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (33 preceding siblings ...)
2021-06-01 8:06 ` rguenth at gcc dot gnu.org
@ 2022-05-27 9:35 ` rguenth at gcc dot gnu.org
2022-06-28 10:31 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27 9:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.5 |10.4
--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [10/11/12/13 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (34 preceding siblings ...)
2022-05-27 9:35 ` [Bug middle-end/64928] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:31 ` jakub at gcc dot gnu.org
2023-07-07 10:30 ` [Bug middle-end/64928] [11/12/13/14 " rguenth at gcc dot gnu.org
` (3 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.4 |10.5
--- Comment #42 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [11/12/13/14 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (35 preceding siblings ...)
2022-06-28 10:31 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:30 ` rguenth at gcc dot gnu.org
2023-09-28 7:06 ` [Bug middle-end/64928] [11 " rguenth at gcc dot gnu.org
` (2 subsequent siblings)
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.5 |11.5
--- Comment #43 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (36 preceding siblings ...)
2023-07-07 10:30 ` [Bug middle-end/64928] [11/12/13/14 " rguenth at gcc dot gnu.org
@ 2023-09-28 7:06 ` rguenth at gcc dot gnu.org
2023-10-02 0:26 ` lucier at math dot purdue.edu
2023-10-04 6:39 ` rguenth at gcc dot gnu.org
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-28 7:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org
Known to fail| |11.4.0
CC| |rguenth at gcc dot gnu.org
Known to work| |12.1.0, 13.1.0, 14.0
Status|ASSIGNED |NEW
Summary|[11/12/13/14 Regression] |[11 Regression] Inordinate
|Inordinate cpu time and |cpu time and memory usage
|memory usage in "phase opt |in "phase opt and generate"
|and generate" with |with -ftest-coverage
|-ftest-coverage |-fprofile-arcs
|-fprofile-arcs |
--- Comment #44 from Richard Biener <rguenth at gcc dot gnu.org> ---
I tried the first input file with GCC 13.2 and on a Ryzen 9 7900X get a memory
usage of 105MB and 1.1s compile-time. The larger testcase needs 360MB peak
and 6.3s to compile. Both with mostly flat -ftime-report profile.
Upping to -O2 shows same memory peak but 13.1s for the larger testcase. We
then see
PRE : 2.09 ( 16%) 0.01 ( 1%) 2.15 ( 15%)
288k ( 0%)
as the biggest thing sticking out (similar for the small testcase).
I think we've come a long way here. GCC 12.3 behaves the same. For GCC 11.4
the larger testcase at -O2 I stopped after 3 minutes, the small testcase at -O1
takes 44s and 5GB memory.
Fixed for GCC 12+, I'm not going to look at identifying what to backport (I
usually backported compile-time/memory-usage improvements when reasonable, so
I suspect this was a bigger change).
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (37 preceding siblings ...)
2023-09-28 7:06 ` [Bug middle-end/64928] [11 " rguenth at gcc dot gnu.org
@ 2023-10-02 0:26 ` lucier at math dot purdue.edu
2023-10-04 6:39 ` rguenth at gcc dot gnu.org
39 siblings, 0 replies; 41+ messages in thread
From: lucier at math dot purdue.edu @ 2023-10-02 0:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #45 from lucier at math dot purdue.edu ---
I confirm that I no longer have this problem with
> gcc-12 -v
Using built-in specs.
COLLECT_GCC=gcc-12
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-ALHxjy/gcc-12-12.3.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)
A different example procedure still took > 45 minutes and > 3.5 GB to compile
with -ftest-coverage -fprofile-arcs (it had finished when I came back from
lunch) but it was quite large (even by my standards!).
If this is a "won't fix" for earlier versions of gcc, then I'm OK with closing
this PR.
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Bug middle-end/64928] [11 Regression] Inordinate cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs
2015-02-03 21:09 [Bug other/64928] New: unreasonable cpu time and memory usage in "phase opt and generate" with -ftest-coverage -fprofile-arcs lucier at math dot purdue.edu
` (38 preceding siblings ...)
2023-10-02 0:26 ` lucier at math dot purdue.edu
@ 2023-10-04 6:39 ` rguenth at gcc dot gnu.org
39 siblings, 0 replies; 41+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-04 6:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928
--- Comment #46 from Richard Biener <rguenth at gcc dot gnu.org> ---
It'll get closed when we close the GCC 11 branch, there's still the opportunity
for somebody to bisect what fixed it in GCC 12 in case it was something
trivial.
^ permalink raw reply [flat|nested] 41+ messages in thread