From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30385 invoked by alias); 15 Nov 2012 22:42:31 -0000 Received: (qmail 30318 invoked by uid 55); 15 Nov 2012 22:42:13 -0000 From: "tejohnson at google dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug bootstrap/55051] [4.8 Regression] profiledbootstrap failed Date: Thu, 15 Nov 2012 22:42:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: bootstrap X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: tejohnson at google dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.8.0 X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-11/txt/msg01459.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051 --- Comment #26 from Teresa Johnson 2012-11-15 22:42:12 UTC --- On Thu, Nov 15, 2012 at 6:33 AM, Teresa Johnson wrote: > > > > On Thu, Nov 15, 2012 at 2:56 AM, hubicka at ucw dot cz > wrote: >> >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051 >> >> --- Comment #24 from Jan Hubicka 2012-11-15 >> 10:56:53 UTC --- >> > Note though that this is not an assert. It just emits a message to >> > stderr. Do you think a better error message is appropriate? I'm not >> > sure the "some data files may have been removed" is an accurate >> > description of the issue. Perhaps something like "Profile data file >> > mismatch may indicate corrupt profile data"? >> >> Well, we should figure out why sum_all starts to diverge. If we had >> problems mixing cc1 and cc1plus executions, we should get mismatches in >> number of counters. > > > Right, it doesn't appear to be different executables since the number of > counters is identical. I'll instrument it and see if I can figure out why > they diverge. > >> >> What happens after the miscompare? > > > A flag is set so that the error is emitted at most once per merge, and then > we continue on with the merge and ignore it. Basically what it is doing is > saving the first merged summary (for the first object file's gcda we merge > into), and then for each additional object file that gets its counters > merged the resulting program summary is compared against the saved program > summary. But only if the number of runs is the same as the saved summary. > This could happen if the gcda files are walked in a different order during > updates (i.e. the gcov_list is in a different order for different processes > of the same executable), but I am not sure if that can happen. It appears that this is what is happening, and I think it makes sense that it can. We're essentially doing this: /* Now merge each file. */ for (gi_ptr = gcov_list; gi_ptr; gi_ptr = gi_ptr->next) { // Open existing gcda file for gi_ptr // Find program summary corresponding to this executable -> save in prg // Merge execution counts for each function // Merge program summary // - If this is the first merged file for this execution, save merged summary in all_prg // - Otherwise if #runs the same in prg and all_prg, print error message if prg != all_prg. // Write merged gcda } I found that in a couple cases, we printed the error message for libcpp/directives.gcda, where the saved all_prg summary was from gcc/gcc.gcda. I then instrumented the code so that each time we merge into one of these 2 gcda files I emit the pids, the number of runs, the number of counters and the merged sum_all. Comparing the results from all the merges to these two gcda files I see that most of the time the merges proceed in the same order, but there are a few cases where the order is different, resulting in a different sum_all with the same number of runs, and then things go back to normal and the sum_all matches again. E.g., here is one place where things get out of order briefly, resulting in one of the error messages being printed: ... pid 28432 ppid 28429 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 254 num 13193 sum_all 17058327 pid 28437 ppid 28365 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 255 num 13193 sum_all 17064832 pid 28439 ppid 28367 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 256 num 13193 sum_all 17071340 pid 28440 ppid 28436 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/gcc/gcc.gcda with runs 257 num 13193 sum_all 17177525 ... vs ... pid 28432 ppid 28429 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs 254 num 13193 sum_all 17058327 pid 28439 ppid 28367 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs 255 num 13193 sum_all 17064835 pid 28437 ppid 28365 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs 256 num 13193 sum_all 17071340 pid 28440 ppid 28436 Merging summary for /home/tejohnson/extra/gcc_trunk_3_obj/libcpp/directives.gcda with runs 257 num 13193 sum_all 17177525 ... Notice the middle two pids are flipped, resulting in the sum_all being different after run 255, and back to the same after run 256. I believe this could happen if pids 28437 and 28439 finished near-simultaneously, waited for the lock for gcc.gcda, and 28437 won first, but then by some luck of timing they subsequently both attempted to open directives.gcda at around the same time and 28439 happened to win the lock in the fcntl loop first. I believe it is also possible for object files to be in different orders in the gcov_list in different processes, since they are added to the head of that list in __gcov_init, which is invoked when running an object file's global constructors, according to the header comment. And for C++ at least, the order of initialization across translation units is undefined. That could also cause the sum_all to go temporarily out of sync between different object file gcda files. Overall, I think it makes sense to remove this check altogether. Would you agree? Testing the patch to remove this right now. Teresa > > Teresa > >> >> Honza >> >> -- >> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are on the CC list for the bug. > > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413