public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PING] [PATCH] _Cilk_for for C and C++
@ 2014-01-27 20:41 Iyer, Balaji V
  2014-01-27 20:53 ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-01-27 20:41 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, 'Jeff Law', 'Aldy Hernandez',
	'gcc-patches@gcc.gnu.org', 'rth@redhat.com'

Hi Jakub et al.,

	Did you get a chance to look at this _Cilk_for patch? 

Thanks,

Balaji V. Iyer.

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Iyer, Balaji V
> Sent: Friday, January 24, 2014 3:34 PM
> To: Jakub Jelinek
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: RE: [PATCH] _Cilk_for for C and C++
> 
> 
> 
> > -----Original Message-----
> > From: Jakub Jelinek [mailto:jakub@redhat.com]
> > Sent: Friday, January 24, 2014 2:42 PM
> > To: Iyer, Balaji V
> > Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'rth@redhat.com'
> > Subject: Re: [PATCH] _Cilk_for for C and C++
> >
> > On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote:
> > > 	This is how I started to think of it at first, but then when I
> > > thought
> > about it ... in _Cilk_for unlike the #pragma simd's for, the for
> > statement - not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10;
> > ii++") doesn't really do anything nor does it belong in the child
> > function. It is really mostly used to calculate the loop count and capture
> step-size and starting point.
> > >
> > > 	The child function has its own loop that will have a step size of 1
> > regardless of your step size. You use the step-size to find the correct spot.
> > Let me give you an example:
> > >
> > > _Cilk_for (int ii = 0; ii < 10; ii = ii  + 2) {
> > > 	Array [ii] = 5;
> > > }
> > >
> > > This is translated to the following (assume grain is something that
> > > the user
> > input):
> > >
> > > data_ptr.start = 0;
> > > data_ptr.end = 10;
> > > data_ptr.step_size = 2;
> > > __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain);
> > >
> > > Child_function (void *data_ptr, int high, int low) {
> > > 	for (xx = low; xx < high; xx++)
> > > 	 {
> > > 		Tmp_var = (xx * data_ptr->step_size) + data_ptr->start;
> > > 		// Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we would
> > have something like this:
> > > 		// Tmp_var = data_ptr->end - (xx * data_ptr->step_size)
> > > 		// The for-loop above won't change.
> > > 		Array[Tmp_var] = 5;
> > > 	}
> > > }
> >
> > This isn't really much different from
> > #pragma omp parallel for schedule(runtime, N) (i.e. the combined
> > construct), when it is combined, we also don't emit a call to
> > GOMP_parallel but to some other function to which we pass the number
> > of iterations and chunk size (== grain in Cilk+ terminology), the only
> > (minor) difference is that for OpenMP when you handle the whole low ...
> > high range the child function doesn't exit, but calls a function to
> > give it next pari of low/high and only when that function tells it
> > there is no further work to do, it returns.  But, the Cilk+ case is
> > clearly the same thing with just implicit telling there is no further work in
> the current function.
> >
> > So, I'd strongly prefer if you swap the parallel with Cilk_for, just
> > set the flag that the two are combined like OpenMP already has for
> > tons of constructs, and during expansion you just treat it together.
> 
> Hi Jakub,
> 	What you are suggesting here would require a significant rewrite of
> the code. This version of _Cilk_for works and it does share significant amount
> of work with OMP routines as requested by other GCC developers. Given
> the time constraints, let's try to get this version accepted so that the feature
> will be available for the users and we will look into moving toward your
> suggestion when the phase 1 opens again.
> 
> Thanks,
> 
> Balaji V. Iyer.
> 
> 
> >
> > 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-27 20:41 [PING] [PATCH] _Cilk_for for C and C++ Iyer, Balaji V
@ 2014-01-27 20:53 ` Jakub Jelinek
  2014-01-27 21:36   ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-01-27 20:53 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: Jason Merrill, 'Jeff Law', 'Aldy Hernandez',
	'gcc-patches@gcc.gnu.org', 'rth@redhat.com'

On Mon, Jan 27, 2014 at 08:41:14PM +0000, Iyer, Balaji V wrote:
> 	Did you get a chance to look at this _Cilk_for patch? 

IMHO it is not as much work as you are fearing, at most a few hours of work
to get it right, and well worth doing.  So, please at least try it out
and if you get stuck with it, explain why.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-27 20:53 ` Jakub Jelinek
@ 2014-01-27 21:36   ` Iyer, Balaji V
  2014-01-28 16:55     ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-01-27 21:36 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jason Merrill, 'Jeff Law', 'Aldy Hernandez',
	'gcc-patches@gcc.gnu.org', 'rth@redhat.com'

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Monday, January 27, 2014 3:50 PM
> To: Iyer, Balaji V
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Mon, Jan 27, 2014 at 08:41:14PM +0000, Iyer, Balaji V wrote:
> > 	Did you get a chance to look at this _Cilk_for patch?
> 
> IMHO it is not as much work as you are fearing, at most a few hours of work
> to get it right, and well worth doing.  So, please at least try it out and if you
> get stuck with it, explain why.

Hi Jakub,
	I tried it that way in the original patch submission for C (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a dead-end when I was trying to get STL iterators working for C++. This is why I re-structured things this way to get them both working.

Thanks,

Balaji V. Iyer.

> 
> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-27 21:36   ` Iyer, Balaji V
@ 2014-01-28 16:55     ` Iyer, Balaji V
  2014-01-29 11:31       ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-01-28 16:55 UTC (permalink / raw)
  To: 'Jakub Jelinek'
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'



> -----Original Message-----
> From: Iyer, Balaji V
> Sent: Monday, January 27, 2014 4:36 PM
> To: Jakub Jelinek
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> > -----Original Message-----
> > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> > owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> > Sent: Monday, January 27, 2014 3:50 PM
> > To: Iyer, Balaji V
> > Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'rth@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Mon, Jan 27, 2014 at 08:41:14PM +0000, Iyer, Balaji V wrote:
> > > 	Did you get a chance to look at this _Cilk_for patch?
> >
> > IMHO it is not as much work as you are fearing, at most a few hours of
> > work to get it right, and well worth doing.  So, please at least try
> > it out and if you get stuck with it, explain why.
> 
> Hi Jakub,
> 	I tried it that way in the original patch submission for C
> (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a
> dead-end when I was trying to get STL iterators working for C++. This is why I
> re-structured things this way to get them both working.
> 
> Thanks,
> 
> Balaji V. Iyer.
> 

Hi Jakub,
	I thought about it a bit more, and the main issue here is that we need access to the _Cilk_for loop's components both inside the child function and the parent function.

	So, at the moment, I have modelled the _Cilk_for as something like this:

#pragma omp for  schedule (runtime: grain)
_Cilk-for (vector<int>::iterator ii = array.begin (); ii != array.end (); ii++)
#pragma omp parallel 
{
	<body>
}

From what I understand, you feel this is a bit ugly and you want this to be modelled something like this?

#pragma omp parallel for schedule (runtime: grain)
_Cilk_for (vector<int>::iterator ii = array.begin (); ii != array.end(); ii++)
{
	<body>
}

Am I right?

As it stands, doing it the way you suggested did not work when we have iterators since iterator expansion pushed inside the child function and its expanded variables are not accessible outside the child function by gimplify_omp_for. That is, the expansion is put after #pragma omp parallel for and that is all pulled into the child function and thus the information to compute the count is lost for the parent function.

There is a hack that I think may get around this. This is a bit ugly and really is not the way I would think of _Cilk_fors. I am OK with trying this if you will accept it.
 
If I do something like this:
     
#pragma omp parallel for schedule (runtime:grain) if ((array.end() - array.begin ())/1)
_Cilk_for (vector <int>::iterator ii = array.begin (); ii != array.end (); ii++)
{
	<body>
}

The new addition is if clause where "if ((<end> - <start>) / <step>)"

Then, in the expand_parallel_task, I can extract the if (...) clause and then pass the expression as a parameter for the loop-count. Yes, it's bit ugly but if you are willing to accept it, I can try to implement this.

Please let me know.

Thanks,

Balaji V. Iyer.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-28 16:55     ` Iyer, Balaji V
@ 2014-01-29 11:31       ` Jakub Jelinek
  2014-01-29 15:54         ` Iyer, Balaji V
  2014-02-05  5:27         ` Iyer, Balaji V
  0 siblings, 2 replies; 26+ messages in thread
From: Jakub Jelinek @ 2014-01-29 11:31 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Tue, Jan 28, 2014 at 04:55:38PM +0000, Iyer, Balaji V wrote:
> 	I thought about it a bit more, and the main issue here is that we
> need access to the _Cilk_for loop's components both inside the child
> function and the parent function.

I guess for the C++ iterators, if in the _Cilk_for model you need to provide
number of iterations before parallelization, it really depends on what the
standard allows and what you want to do exactly.
If you need to provide the iteration count before spawning the threads and
the standard allows you that, then just lower it in the C++ FE already
so that you do:
  vector<int>::iterator temp = array.begin ();
  sizetype tempcount = (array.end () - temp);
before the parallel, and then
  #pragma omp parallel firstprivate(temp, tempcount)
    _Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
      {
        vector<int>::iterator ii = temp + temp2;
        <body>
      }
or similar.  The C++ FE needs to lower the C++ iterators anyway, the
middle-end can really only work with integral or pointer iterators, and it
depends on how exactly the Cilk+ standard defines _Cilk_for with iterators
(what methods must be implemented on the iterators and what methods and in
what order should be called).

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-29 11:31       ` Jakub Jelinek
@ 2014-01-29 15:54         ` Iyer, Balaji V
  2014-01-31 15:39           ` Iyer, Balaji V
  2014-02-05  5:27         ` Iyer, Balaji V
  1 sibling, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-01-29 15:54 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

Hi Jakub,

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, January 29, 2014 6:31 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Tue, Jan 28, 2014 at 04:55:38PM +0000, Iyer, Balaji V wrote:
> > 	I thought about it a bit more, and the main issue here is that we
> > need access to the _Cilk_for loop's components both inside the child
> > function and the parent function.
> 
> I guess for the C++ iterators, if in the _Cilk_for model you need to provide
> number of iterations before parallelization, it really depends on what the
> standard allows and what you want to do exactly.

Yes, I need the value before the parallelization context hits. This is why in my last patch I had the parallel around the body and omp for around the _Cilk-for statement. 


> If you need to provide the iteration count before spawning the threads and
> the standard allows you that, then just lower it in the C++ FE already so that
> you do:
>   vector<int>::iterator temp = array.begin ();
>   sizetype tempcount = (array.end () - temp); before the parallel, and then
>   #pragma omp parallel firstprivate(temp, tempcount)
>     _Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
>       {
>         vector<int>::iterator ii = temp + temp2;
>         <body>
>       }

This is kind of what I did (atlest tried to accomplish what you mentioned above). I can look into doing this, but is it possible for you to accept the patch as-is and we will look into fixing it in the future?

Thanks,

Balaji V. Iyer.

> or similar.  The C++ FE needs to lower the C++ iterators anyway, the middle-
> end can really only work with integral or pointer iterators, and it depends on
> how exactly the Cilk+ standard defines _Cilk_for with iterators (what
> methods must be implemented on the iterators and what methods and in
> what order should be called).
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-29 15:54         ` Iyer, Balaji V
@ 2014-01-31 15:39           ` Iyer, Balaji V
  0 siblings, 0 replies; 26+ messages in thread
From: Iyer, Balaji V @ 2014-01-31 15:39 UTC (permalink / raw)
  To: 'Jakub Jelinek'
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

Hello Everyone,
	Did anyone get a chance to look at this patch (link to patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01612.html)? I tried to do as Jakub mentioned but it hits a road-block when it comes to iterators due to variable scoping issues.
	This patch does not disrupt any other parts of the code and all the code here are wrapped inside a check for for Cilk Plus enabling. It also passes all the tests and does not disrupt any existing failing or passing ones for both 32 and 64 bit modes in my x86_64 machine. 

It is the last feature to make Cilk Plus feature-complete. Is the patch OK for trunk?

Thanks,

Balaji V. Iyer.

> -----Original Message-----
> From: Iyer, Balaji V
> Sent: Wednesday, January 29, 2014 10:54 AM
> To: Jakub Jelinek
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
> 
> > -----Original Message-----
> > From: Jakub Jelinek [mailto:jakub@redhat.com]
> > Sent: Wednesday, January 29, 2014 6:31 AM
> > To: Iyer, Balaji V
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'rth@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Tue, Jan 28, 2014 at 04:55:38PM +0000, Iyer, Balaji V wrote:
> > > 	I thought about it a bit more, and the main issue here is that we
> > > need access to the _Cilk_for loop's components both inside the child
> > > function and the parent function.
> >
> > I guess for the C++ iterators, if in the _Cilk_for model you need to
> > provide number of iterations before parallelization, it really depends
> > on what the standard allows and what you want to do exactly.
> 
> Yes, I need the value before the parallelization context hits. This is why in my
> last patch I had the parallel around the body and omp for around the _Cilk-
> for statement.
> 
> 
> > If you need to provide the iteration count before spawning the threads
> > and the standard allows you that, then just lower it in the C++ FE
> > already so that you do:
> >   vector<int>::iterator temp = array.begin ();
> >   sizetype tempcount = (array.end () - temp); before the parallel, and then
> >   #pragma omp parallel firstprivate(temp, tempcount)
> >     _Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
> >       {
> >         vector<int>::iterator ii = temp + temp2;
> >         <body>
> >       }
> 
> This is kind of what I did (atlest tried to accomplish what you mentioned
> above). I can look into doing this, but is it possible for you to accept the patch
> as-is and we will look into fixing it in the future?
> 
> Thanks,
> 
> Balaji V. Iyer.
> 
> > or similar.  The C++ FE needs to lower the C++ iterators anyway, the
> > middle- end can really only work with integral or pointer iterators,
> > and it depends on how exactly the Cilk+ standard defines _Cilk_for
> > with iterators (what methods must be implemented on the iterators and
> > what methods and in what order should be called).
> >
> > 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-01-29 11:31       ` Jakub Jelinek
  2014-01-29 15:54         ` Iyer, Balaji V
@ 2014-02-05  5:27         ` Iyer, Balaji V
  2014-02-07 14:02           ` Jakub Jelinek
  1 sibling, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-05  5:27 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

Hello Jakub,
	Attached, please find a fixed patch (diff.txt) that will do as you requested (model _Cilk_for like a #pragma omp parallel for). Along with this, I have also attached two Changelog entries (1 for C and 1 for C++).
	It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and does not affect any other tests in the testsuite.
	Is this Ok for trunk?

Thanks,

Balaji V. Iyer.



[-- Attachment #2: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3875 bytes --]

gcc/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
	* gimplify.c (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #3: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1707 bytes --]

gcc/cp/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(insert_init_exprs): Likewise.
	(insert_opp_before_cfor): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-04  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.

[-- Attachment #4: diff.txt --]
[-- Type: text/plain, Size: 89417 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..328f014 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,53 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+      OMP_FOR_CLAUSES (*tp) = NULL_TREE;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statment in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index fc12788..816529d 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -414,6 +414,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..33e1929 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 4ce51e4..8a84030 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -387,17 +387,18 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count used solely by a _Cilk_for statment.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *count)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -423,6 +424,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -437,6 +440,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -527,9 +531,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1); 
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end)); 
+		  if (TREE_CODE (cond) == LE_EXPR) 
+		    add_expr = build_one_cst (TREE_TYPE (orig_end)); 
+		  else if (TREE_CODE (cond) == GE_EXPR) 
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1); 
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end), 
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -562,6 +577,18 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					   orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -580,14 +607,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -610,6 +647,17 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  /* Count is used by _Cilk_for and that will always have
+	     collapse = 1.  */
+	  *count = fold_build2 (MINUS_EXPR, TREE_TYPE (orig_end), orig_end,
+				orig_init);
+	  *count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (*count), *count,
+				orig_step);
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 07d23ac..e0f3561 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1394,6 +1394,11 @@ init_pragma (void)
 
   cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 8a4868b..83e53fd 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11827,7 +11863,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   if (!fail)
     {
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &count);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11903,24 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11985,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12066,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12550,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13828,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index afe88c9..1277a25 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -106,6 +106,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,15 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
+
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..143890c 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,150 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Returns all the statments till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+tree
+copy_tree_till_cilk_for (tree *stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  tree new_stmt_list  = alloc_stmt_list ();
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (TREE_CODE (tsi_stmt (tsi)) != CILK_FOR)
+      {
+	append_to_statement_list (tsi_stmt (tsi), &new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+    
+  return new_stmt_list;
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Returns STATEMENT_LIST that contains STMT_LIST along with init. expressions
+   to save the variables in *LIST vector.  *OPP_INITS is a STATEMENT_LIST that
+   contains init. exprs.  */
+ 
+static tree
+insert_init_exprs (tree stmt_list, vec<tree, va_gc> *list, tree *opp_inits)
+{
+  if (vec_safe_is_empty (list))
+    return stmt_list;
+
+  unsigned int ix;
+  tree rhs;
+  tree opp_inits_t = alloc_stmt_list ();
+  FOR_EACH_VEC_SAFE_ELT (list, ix, rhs)
+    {
+      tree new_var = create_temporary_var (TREE_TYPE (rhs));
+      pushdecl (new_var);
+      add_decl_expr (new_var);
+      tree new_tree = build2 (INIT_EXPR, void_type_node, new_var, rhs);
+      append_to_statement_list (new_tree, &stmt_list);
+      tree new_opp_tree = build2 (INIT_EXPR, void_type_node, rhs, new_var);
+      append_to_statement_list (new_opp_tree, &opp_inits_t);
+    }
+  *opp_inits = opp_inits_t;
+  return stmt_list;
+}
+
+/* Helper function for cp_walk_tree.  if *TP is a CILK_FOR expression then
+   append DATA (typecasted to tree of type STATEMENT_LIST) before the
+   CILK_FOR expression and update *TP accordingly.  */
+
+static tree
+insert_opp_before_cfor (tree *tp, int *walk_subtrees, void *data)
+{
+  tree *list = (tree *) data;
+  tree stmt_list = *list;
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      append_to_statement_list (*tp, &stmt_list);
+      *tp = stmt_list;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+  
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  tree opp_stmt_list = NULL_TREE;
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  stmt_list = insert_init_exprs (stmt_list, cfor_vars, &opp_stmt_list);
+  
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  cp_walk_tree (&cfor_par_list, insert_opp_before_cfor, &opp_stmt_list, NULL);
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..c665384 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,8 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern tree copy_tree_till_cilk_for             (tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 57001c6..e4c9b57 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28835,7 +28847,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29131,7 +29143,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29160,11 +29172,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code == CILK_SIMD
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29173,13 +29192,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29337,7 +29369,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29378,7 +29410,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (code == CILK_FOR && cfor_block != NULL)
+	*cfor_block = copy_tree_till_cilk_for (&t);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29434,7 +29476,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29522,7 +29564,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29994,7 +30036,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31282,6 +31324,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31461,9 +31535,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31781,31 +31876,102 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+  cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7967db8..7b60b6e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13584,6 +13584,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 				args, complain, in_decl);
       stmt = begin_omp_parallel ();
       RECUR (OMP_PARALLEL_BODY (t));
+      if (flag_cilkplus
+	  && TREE_CODE (OMP_PARALLEL_BODY (t)) == CILK_FOR)
+	cilk_for_move_clauses_upward (&tmp, stmt);
       OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
 	= OMP_PARALLEL_COMBINED (t);
       break;
@@ -13599,6 +13602,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 9fb4fc0..8388a6b 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6058,6 +6058,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6470,12 +6471,20 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree count = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
+			      body, pre_body, &count);
 
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR)
+    {
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..6feb8f1 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,6 +1164,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1167,7 +1176,11 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1205,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1211,11 +1227,18 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
 	  newline_and_indent (buffer, spc + 2);
-	  pp_left_brace (buffer);
-	  pp_newline (buffer);
-	  dump_gimple_seq (buffer, gimple_omp_body (gs), spc + 4, flags);
-	  newline_and_indent (buffer, spc + 2);
-	  pp_right_brace (buffer);
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	      dump_gimple_seq (buffer, gimple_omp_body (gs),
+					spc + 4, flags);
+	  else 
+	    { 
+	      pp_left_brace (buffer); 
+	      pp_newline (buffer); 
+	      dump_gimple_seq (buffer, gimple_omp_body (gs), spc + 4, flags); 
+	      newline_and_indent (buffer, spc + 2); 
+	      pp_right_brace (buffer);
+	    }
 	}
     }
 }
@@ -1846,7 +1869,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1883,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2163,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9c9998d..8f5e15d 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5849,7 +5849,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6079,6 +6080,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for)
 	  OMP_CLAUSE_OPERAND (c, 0)
 	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
@@ -6458,11 +6463,17 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
+  bool is_cilk_for = false;
 
+  for (tree c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      is_cilk_for = true;
+  
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6498,7 +6509,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6563,8 +6574,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
-  gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+    gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
+			       simd ? ORT_SIMD : ORT_WORKSHARE,
+			       TREE_CODE (for_stmt) == CILK_FOR);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6825,6 +6837,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6895,7 +6908,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6955,7 +6968,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7897,6 +7910,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d7589aa..95f352f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,12 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+typedef struct cilk_for_information {
+  bool found;
+  tree induction_var;
+} cilk_for_info;
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +321,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +401,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1829,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  cilk_for_info *cf_info = (cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statments, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1992,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4448,43 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt, tree func_name)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt);
+  tree grain = find_omp_clause (clauses, OMP_CLAUSE_SCHEDULE);
+  gcc_assert (grain != NULL_TREE);
+  grain = OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (grain);
+
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4820,37 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4957,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4830,7 +5075,7 @@ expand_omp_taskreg (struct omp_region *region)
 	}
       if (dstidx != num)
 	vec_safe_truncate (child_cfun->local_decls, dstidx);
-
+ 
       /* Inform the callgraph about the new function.  */
       DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties;
       cgraph_add_new_function (child_fn, true);
@@ -4860,9 +5105,10 @@ expand_omp_taskreg (struct omp_region *region)
 	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
-
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, (*ws_args)[0]);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6786,218 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (TREE_TYPE (fd->loop.v), t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+  
+  t = build2 (MULT_EXPR, type, ind_var, step);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32)  */
+  vec_alloc (region->ws_args, 1);
+  region->ws_args->quick_push (lib_fun);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7338,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..7a779f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10000
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..d4327ad
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10000;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..e21d60d 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2391,6 +2394,9 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     case CILK_SIMD:
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
+    
+    case CILK_FOR:
+      goto dump_omp_loop;
 
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
@@ -2440,7 +2446,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-05  5:27         ` Iyer, Balaji V
@ 2014-02-07 14:02           ` Jakub Jelinek
  2014-02-07 14:33             ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-07 14:02 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

On Wed, Feb 05, 2014 at 05:27:26AM +0000, Iyer, Balaji V wrote:
> 	Attached, please find a fixed patch (diff.txt) that will do as you requested (model _Cilk_for like a #pragma omp parallel for). Along with this, I have also attached two Changelog entries (1 for C and 1 for C++).
> 	It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and does not affect any other tests in the testsuite.
> 	Is this Ok for trunk?

A step in the right direction, but I still see issues just from looking at
the *.gimple dump:

For the first testcase, I see:
            iter = std::vector<int>::begin (array); [return slot optimization]
            iter.1 = iter;
            D.13615 = std::vector<int>::end (array); [return slot optimization]
            try
              {
                retval.0 = __gnu_cxx::operator-<int*, std::vector<int> > (&D.13615, &iter);
              }
            finally
              {
                D.13615 = {CLOBBER};
              }
            #pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
            #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
              {
                difference_type retval.2;
                const difference_type D.13633;
                int D.13725;
                struct __normal_iterator & D.13726;
                bool retval.3;
                int & D.13728;
                int D.13729;
                int & D.13732;

                iter = iter.1;
                 private(D.13631)
                _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 + 1)
                                      D.13725 = D.13631 - D.13632;

So, the issues I see:
1) what is iter.1, why do you have it at all, and, after all, the iterator
is a class that needs to be constructed/destructed in the general way, so
creating any further copies of something is both costly and undesirable

2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for

3) iter should be firstprivate, and there should be no explicit private var
with assignment during gimplification, just handle it like any other
firstprivate during omp lowering

4) the printing looks weird for _Cilk_for, as I said earlier, the clauses
should probably be printed after the closing ) of _Cilk_for rather than
after nothing on the previous line; also, there is no {} printed around the
_Cilk_for body and the next line is weirdly indented

But more importantly, if I create some testcase with a generic C++
conforming iterator (copied over from
libgomp/testsuite/libgomp.c++/for-1.C), as in the second testcase, the
*.gimple dump shows that _Cilk_for is still around the #pragma omp parallel.
The intent of the second testcase is that you can really eyeball all the
ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
really inlined.

	Jakub

[-- Attachment #2: CF.C --]
[-- Type: text/plain, Size: 196 bytes --]

#include <vector>

void
foo (std::vector<int> &array)
{
  _Cilk_for (std::vector<int>::iterator iter = array.begin(); iter != array.end(); iter++)
  {
    if (*iter  == 6)
      *iter = 13;
  }
}

[-- Attachment #3: CF3.C --]
[-- Type: text/plain, Size: 4541 bytes --]

typedef __PTRDIFF_TYPE__ ptrdiff_t;

template <typename T>
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template <typename S> friend bool operator == (I<S> &, I<S> &);
  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
  template <typename S> friend bool operator < (I<S> &, I<S> &);
  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
  template <typename S> friend bool operator <= (I<S> &, I<S> &);
  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
  template <typename S> friend bool operator > (I<S> &, I<S> &);
  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
  template <typename S> friend bool operator >= (I<S> &, I<S> &);
  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
private:
  T *p;
};
template <typename T> I<T>::I () : p (0) {}
template <typename T> I<T>::~I () {}
template <typename T> I<T>::I (T *x) : p (x) {}
template <typename T> I<T>::I (const I &x) : p (x.p) {}
template <typename T> T &I<T>::operator * () { return *p; }
template <typename T> T *I<T>::operator -> () { return p; }
template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }

template <typename T>
class J
{
public:
  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
  const I<T> &begin ();
  const I<T> &end ();
private:
  I<T> b, e;
};

template <typename T> const I<T> &J<T>::begin () { return b; }
template <typename T> const I<T> &J<T>::end () { return e; }

template <typename T>
void baz (I<T> &i);

void
foo (J<int> j)
{
  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
    baz (i);
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-07 14:02           ` Jakub Jelinek
@ 2014-02-07 14:33             ` Iyer, Balaji V
  2014-02-07 14:53               ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-07 14:33 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'



> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Friday, February 7, 2014 9:03 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 05, 2014 at 05:27:26AM +0000, Iyer, Balaji V wrote:
> > 	Attached, please find a fixed patch (diff.txt) that will do as you
> requested (model _Cilk_for like a #pragma omp parallel for). Along with this,
> I have also attached two Changelog entries (1 for C and 1 for C++).
> > 	It passes all the tests on my x86_64 box (both 32 and 64 bit modes)
> and does not affect any other tests in the testsuite.
> > 	Is this Ok for trunk?
> 
> A step in the right direction, but I still see issues just from looking at the
> *.gimple dump:
> 
> For the first testcase, I see:
>             iter = std::vector<int>::begin (array); [return slot optimization]
>             iter.1 = iter;
>             D.13615 = std::vector<int>::end (array); [return slot optimization]
>             try
>               {
>                 retval.0 = __gnu_cxx::operator-<int*, std::vector<int> > (&D.13615,
> &iter);
>               }
>             finally
>               {
>                 D.13615 = {CLOBBER};
>               }
>             #pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
>             #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
>               {
>                 difference_type retval.2;
>                 const difference_type D.13633;
>                 int D.13725;
>                 struct __normal_iterator & D.13726;
>                 bool retval.3;
>                 int & D.13728;
>                 int D.13729;
>                 int & D.13732;
> 
>                 iter = iter.1;
>                  private(D.13631)
>                 _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 + 1)
>                                       D.13725 = D.13631 - D.13632;
> 
> So, the issues I see:
> 1) what is iter.1, why do you have it at all, and, after all, the iterator is a class
> that needs to be constructed/destructed in the general way, so creating any
> further copies of something is both costly and undesirable
> 

Well, to get the loop count, I need to calculate it using operator-(array.end (), &iter).

Now, if I do that iter is already set. I need to reset iter back to the original one (array.begin ()) in the child function. This is why I used a temporary variable called iter1.



> 2) the schedule clause doesn't belong on the omp parallel, but on the
> _Cilk_for
> 

What if grain is a variable say "x"? If I have it in the _Cilk_for, then won't it create omp_data_i->x. That is not correct. It should just emit "x." But let me look into this to make sure...

> 3) iter should be firstprivate, and there should be no explicit private var with
> assignment during gimplification, just handle it like any other firstprivate
> during omp lowering
> 

Do you mean to say I should manually insert a firstprivate for iter and not the system figure out that it is shared? 


> 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses should
> probably be printed after the closing ) of _Cilk_for rather than after nothing
> on the previous line; also, there is no {} printed around the _Cilk_for body
> and the next line is weirdly indented
> 

Ok will look into this.

> But more importantly, if I create some testcase with a generic C++
> conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for-
> 1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is still
> around the #pragma omp parallel.
> The intent of the second testcase is that you can really eyeball all the
> ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
> really inlined.
> 
> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-07 14:33             ` Iyer, Balaji V
@ 2014-02-07 14:53               ` Jakub Jelinek
  2014-02-07 22:14                 ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-07 14:53 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Fri, Feb 07, 2014 at 02:33:41PM +0000, Iyer, Balaji V wrote:
> > So, the issues I see:
> > 1) what is iter.1, why do you have it at all, and, after all, the iterator is a class
> > that needs to be constructed/destructed in the general way, so creating any
> > further copies of something is both costly and undesirable
> > 
> 
> Well, to get the loop count, I need to calculate it using operator-(array.end (), &iter).
> 
> Now, if I do that iter is already set. I need to reset iter back to the
> original one (array.begin ()) in the child function.  This is why I used a
> temporary variable called iter1.

operator- shouldn't really change iter, if it does, it is purely the user's
fault, isn't it?  It isn't operator -=, so it shouldn't really change
array.end () either.

> > 2) the schedule clause doesn't belong on the omp parallel, but on the
> > _Cilk_for
> > 
> 
> What if grain is a variable say "x"? If I have it in the _Cilk_for, then
> won't it create omp_data_i->x.  That is not correct.  It should just emit
> "x." But let me look into this to make sure...

You certainly should gimplify the clause operand before the omp parallel, it
must be an integral anyway, right?  So just use get_temp_regvar?
Then simply use firstprivate on the #pragma omp parallel.  When you actually
omp expand, you'll still be able to find the original variable and look it
up on the parallel.  But, if you can't make it work, guess I could live with
the clause on the parallel.

> > 3) iter should be firstprivate, and there should be no explicit private var with
> > assignment during gimplification, just handle it like any other firstprivate
> > during omp lowering
> > 
> 
> Do you mean to say I should manually insert a firstprivate for iter and
> not the system figure out that it is shared?

Yes.  The class iterator is quite special thing, because already the C++ FE
lowers it to an integral iterator instead.  And when you make it
firstprivate, omp lowering/expansion should take care of running the copy
constructor/destructor in the parallel for you.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-07 14:53               ` Jakub Jelinek
@ 2014-02-07 22:14                 ` Iyer, Balaji V
  2014-02-10 17:57                   ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-07 22:14 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 2930 bytes --]

Hi Jakub,
	Attached, please find a fixed patch. Along with it, I have also added 2 changelog files for C and C++ respectively.

Thanks,

Balaji V. Iyer.

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Friday, February 7, 2014 9:53 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Fri, Feb 07, 2014 at 02:33:41PM +0000, Iyer, Balaji V wrote:
> > > So, the issues I see:
> > > 1) what is iter.1, why do you have it at all, and, after all, the
> > > iterator is a class that needs to be constructed/destructed in the
> > > general way, so creating any further copies of something is both
> > > costly and undesirable
> > >
> >
> > Well, to get the loop count, I need to calculate it using operator-(array.end
> (), &iter).
> >
> > Now, if I do that iter is already set. I need to reset iter back to
> > the original one (array.begin ()) in the child function.  This is why
> > I used a temporary variable called iter1.
> 
> operator- shouldn't really change iter, if it does, it is purely the user's fault,
> isn't it?  It isn't operator -=, so it shouldn't really change array.end () either.
> 

This is fixed. Instead of creating a variable and doing the manual copying, I added a FIRSTPRIVATE clause.

> > > 2) the schedule clause doesn't belong on the omp parallel, but on
> > > the _Cilk_for
> > >
> >
> > What if grain is a variable say "x"? If I have it in the _Cilk_for,
> > then won't it create omp_data_i->x.  That is not correct.  It should
> > just emit "x." But let me look into this to make sure...
> 
> You certainly should gimplify the clause operand before the omp parallel, it
> must be an integral anyway, right?  So just use get_temp_regvar?
> Then simply use firstprivate on the #pragma omp parallel.  When you actually
> omp expand, you'll still be able to find the original variable and look it up on
> the parallel.  But, if you can't make it work, guess I could live with the clause
> on the parallel.
> 

This is fixed too.

> > > 3) iter should be firstprivate, and there should be no explicit
> > > private var with assignment during gimplification, just handle it
> > > like any other firstprivate during omp lowering
> > >

Yes this is what I did.

> >
> > Do you mean to say I should manually insert a firstprivate for iter
> > and not the system figure out that it is shared?
> 
> Yes.  The class iterator is quite special thing, because already the C++ FE
> lowers it to an integral iterator instead.  And when you make it firstprivate,
> omp lowering/expansion should take care of running the copy
> constructor/destructor in the parallel for you.
> 

I have also fixed the gimple/tree pretty print issue also. Is this OK?

Thanks,

Balaji V. Iyer.

[-- Attachment #2: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3875 bytes --]

gcc/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
	* gimplify.c (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #3: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1677 bytes --]

gcc/cp/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-07  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.

[-- Attachment #4: patch_cilk_for.txt --]
[-- Type: text/plain, Size: 89930 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..328f014 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,53 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+      OMP_FOR_CLAUSES (*tp) = NULL_TREE;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statment in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 50cc848..514a084 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -414,6 +414,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..33e1929 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index dd0a45d..8259979 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,18 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count used solely by a _Cilk_for statment.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *count)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +423,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +439,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +530,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1); 
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end)); 
+		  if (TREE_CODE (cond) == LE_EXPR) 
+		    add_expr = build_one_cst (TREE_TYPE (orig_end)); 
+		  else if (TREE_CODE (cond) == GE_EXPR) 
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1); 
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end), 
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +576,18 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					   orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +606,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +646,17 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  /* Count is used by _Cilk_for and that will always have
+	     collapse = 1.  */
+	  *count = fold_build2 (MINUS_EXPR, TREE_TYPE (orig_end), orig_end,
+				orig_init);
+	  *count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (*count), *count,
+				orig_step);
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 07d23ac..e0f3561 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1394,6 +1394,11 @@ init_pragma (void)
 
   cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 8a4868b..83e53fd 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11827,7 +11863,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   if (!fail)
     {
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &count);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11903,24 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11985,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12066,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12550,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13828,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index afe88c9..bf4e83a 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -106,6 +106,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..29661ab 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,122 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Returns all the statments till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+tree
+copy_tree_till_cilk_for (tree *stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  tree new_stmt_list  = alloc_stmt_list ();
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (TREE_CODE (tsi_stmt (tsi)) != CILK_FOR)
+      {
+	append_to_statement_list (tsi_stmt (tsi), &new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+    
+  return new_stmt_list;
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..c665384 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,8 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern tree copy_tree_till_cilk_for             (tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index f0722d6..d661d4b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28835,7 +28847,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29131,7 +29143,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29160,11 +29172,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code == CILK_SIMD
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29173,13 +29192,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29337,7 +29369,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29378,7 +29410,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (code == CILK_FOR && cfor_block != NULL)
+	*cfor_block = copy_tree_till_cilk_for (&t);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29434,7 +29476,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29522,7 +29564,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29994,7 +30036,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31290,6 +31332,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31469,9 +31543,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31789,31 +31884,102 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+  cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7967db8..7b60b6e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13584,6 +13584,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 				args, complain, in_decl);
       stmt = begin_omp_parallel ();
       RECUR (OMP_PARALLEL_BODY (t));
+      if (flag_cilkplus
+	  && TREE_CODE (OMP_PARALLEL_BODY (t)) == CILK_FOR)
+	cilk_for_move_clauses_upward (&tmp, stmt);
       OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
 	= OMP_PARALLEL_COMBINED (t);
       break;
@@ -13599,6 +13602,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 9fb4fc0..8388a6b 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6058,6 +6058,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6470,12 +6471,20 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree count = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
+			      body, pre_body, &count);
 
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR)
+    {
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9c9998d..d223b7a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5849,7 +5849,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6079,8 +6080,12 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6447,6 +6452,20 @@ gimplify_adjust_omp_clauses (tree *list_p)
   delete_omp_context (ctx);
 }
 
+static void
+omp_remove_clause (tree c, tree *list_p)
+{
+  tree ii = NULL_TREE;
+  while ((ii = *list_p) != NULL)
+    {
+      if (simple_cst_equal (ii, c) == 1)
+	*list_p = OMP_CLAUSE_CHAIN (ii);
+      else
+	list_p = &OMP_CLAUSE_CHAIN (ii);
+    }
+}
+	
+
 /* Gimplify the contents of an OMP_PARALLEL statement.  This involves
    gimplification of the body, as well as scanning the body for used
    variables.  We need to do this scan now, because variable-sized
@@ -6458,11 +6477,29 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
-
+  bool is_cilk_for = false;
+  tree c = NULL_TREE;
+  for (c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
+  
+  /* The SCHEDULE clause is not necessary anymore.  */
+  if (is_cilk_for) 
+    omp_remove_clause (c, &OMP_PARALLEL_CLAUSES (expr));
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6498,7 +6535,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6563,8 +6600,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
-  gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+    gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
+			       simd ? ORT_SIMD : ORT_WORKSHARE,
+			       TREE_CODE (for_stmt) == CILK_FOR);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6825,6 +6863,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6895,7 +6934,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6955,7 +6994,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7897,6 +7936,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2c35751..cbc8549
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statments, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,223 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (TREE_TYPE (fd->loop.v), t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+  
+  t = build2 (MULT_EXPR, type, ind_var, step);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7347,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..8cf1b4e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-07 22:14                 ` Iyer, Balaji V
@ 2014-02-10 17:57                   ` Jakub Jelinek
  2014-02-10 22:07                     ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-10 17:57 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Fri, Feb 07, 2014 at 10:14:21PM +0000, Iyer, Balaji V wrote:
> 	Attached, please find a fixed patch. Along with it, I have also
> added 2 changelog files for C and C++ respectively.

Have you even looked at the second testcase I've posted?
gimplification ICEs on it with your latest patch, because firstprivate
clause is added for the same variable multiple times, and it seems parallel
still isn't around _Cilk_for.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-10 17:57                   ` Jakub Jelinek
@ 2014-02-10 22:07                     ` Iyer, Balaji V
  2014-02-12 14:59                       ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-10 22:07 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

Hi Jakub,

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Monday, February 10, 2014 12:58 PM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Fri, Feb 07, 2014 at 10:14:21PM +0000, Iyer, Balaji V wrote:
> > 	Attached, please find a fixed patch. Along with it, I have also added
> > 2 changelog files for C and C++ respectively.
> 
> Have you even looked at the second testcase I've posted?
> gimplification ICEs on it with your latest patch, because firstprivate clause is
> added for the same variable multiple times, and it seems parallel still isn't
> around _Cilk_for.

I looked at both but forgot to test them with my implementation. Sorry about this. I have fixed the ICE issue. To make sure this does not happen further, I have added your test cf3.C into test suite (renamed to cf3.cc). I hope that is OK with you.

I have attached a fixed patch and Changelogs. Is this OK?

Thanks,

Balaji V. Iyer.

> 
> 	Jakub

[-- Attachment #2: diff.txt --]
[-- Type: text/plain, Size: 95510 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..328f014 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,53 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+      OMP_FOR_CLAUSES (*tp) = NULL_TREE;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statment in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 5cf285b..413f162 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -416,6 +416,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..33e1929 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index dd0a45d..8259979 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,18 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count used solely by a _Cilk_for statment.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *count)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +423,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +439,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +530,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1); 
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end)); 
+		  if (TREE_CODE (cond) == LE_EXPR) 
+		    add_expr = build_one_cst (TREE_TYPE (orig_end)); 
+		  else if (TREE_CODE (cond) == GE_EXPR) 
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1); 
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end), 
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +576,18 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					   orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +606,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +646,17 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  /* Count is used by _Cilk_for and that will always have
+	     collapse = 1.  */
+	  *count = fold_build2 (MINUS_EXPR, TREE_TYPE (orig_end), orig_end,
+				orig_init);
+	  *count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (*count), *count,
+				orig_step);
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 07d23ac..e0f3561 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1394,6 +1394,11 @@ init_pragma (void)
 
   cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 66625aa..d8c12b1 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11827,7 +11863,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   if (!fail)
     {
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &count);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11903,24 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11985,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12066,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12550,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13828,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index f2a3b75..50cd7c7 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -106,6 +106,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -270,6 +291,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..037f9bc 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,150 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Helper function for walk_tree, used by found_cilk_for_p.  Sets data (of type
+   bool) to true of *TP is of type CILK_FOR.  If so, then WALK_SUBTREES is 
+   set to zero.  */
+
+static tree
+find_cilk_for_stmt (tree *tp, int *walk_subtrees, void *data)
+{
+  bool *found = (bool *) data;
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      *found = true;
+      data = (void *) found;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if T is of type CILK_FOR or one of its subtrees is of type
+   CILK_FOR.  */
+
+static bool
+found_cilk_for_p (tree t)
+{
+  bool found = false;
+  walk_tree (&t, find_cilk_for_stmt, (void *) &found, NULL);
+  return found;
+}
+
+/* Returns all the statments till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+tree
+copy_tree_till_cilk_for (tree *stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  tree new_stmt_list  = alloc_stmt_list ();
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (!found_cilk_for_p (tsi_stmt (tsi)))
+      {
+	append_to_statement_list (tsi_stmt (tsi), &new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+    
+  return new_stmt_list;
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..c665384 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,8 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern tree copy_tree_till_cilk_for             (tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index f0722d6..d661d4b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28835,7 +28847,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29131,7 +29143,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29160,11 +29172,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code == CILK_SIMD
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29173,13 +29192,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29337,7 +29369,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29378,7 +29410,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (code == CILK_FOR && cfor_block != NULL)
+	*cfor_block = copy_tree_till_cilk_for (&t);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29434,7 +29476,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29522,7 +29564,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29994,7 +30036,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31290,6 +31332,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31469,9 +31543,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31789,31 +31884,102 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+  cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7967db8..7b60b6e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13584,6 +13584,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 				args, complain, in_decl);
       stmt = begin_omp_parallel ();
       RECUR (OMP_PARALLEL_BODY (t));
+      if (flag_cilkplus
+	  && TREE_CODE (OMP_PARALLEL_BODY (t)) == CILK_FOR)
+	cilk_for_move_clauses_upward (&tmp, stmt);
       OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
 	= OMP_PARALLEL_COMBINED (t);
       break;
@@ -13599,6 +13602,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 9fb4fc0..8388a6b 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6058,6 +6058,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6470,12 +6471,20 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree count = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
+			      body, pre_body, &count);
 
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR)
+    {
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 957a82f..f34bc97 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5856,7 +5856,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6086,8 +6087,12 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6454,6 +6459,20 @@ gimplify_adjust_omp_clauses (tree *list_p)
   delete_omp_context (ctx);
 }
 
+static void
+omp_remove_clause (tree c, tree *list_p)
+{
+  tree ii = NULL_TREE;
+  while ((ii = *list_p) != NULL)
+    {
+      if (simple_cst_equal (ii, c) == 1)
+	*list_p = OMP_CLAUSE_CHAIN (ii);
+      else
+	list_p = &OMP_CLAUSE_CHAIN (ii);
+    }
+}
+	
+
 /* Gimplify the contents of an OMP_PARALLEL statement.  This involves
    gimplification of the body, as well as scanning the body for used
    variables.  We need to do this scan now, because variable-sized
@@ -6465,11 +6484,29 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
-
+  bool is_cilk_for = false;
+  tree c = NULL_TREE;
+  for (c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
+  
+  /* The SCHEDULE clause is not necessary anymore.  */
+  if (is_cilk_for) 
+    omp_remove_clause (c, &OMP_PARALLEL_CLAUSES (expr));
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6505,7 +6542,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6570,8 +6607,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
-  gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+    gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
+			       simd ? ORT_SIMD : ORT_WORKSHARE,
+			       TREE_CODE (for_stmt) == CILK_FOR);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6832,6 +6870,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6902,7 +6941,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6962,7 +7001,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7904,6 +7943,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f99b2a6..b2e65ab
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statments, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,223 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (TREE_TYPE (fd->loop.v), t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+  
+  t = build2 (MULT_EXPR, type, ind_var, step);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7347,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..8cf1b4e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
new file mode 100644
index 0000000..8d88c5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
@@ -0,0 +1,96 @@
+/* { dg-options "-fcilkplus" } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+template <typename T>
+void baz (I<T> &i);
+
+void
+foo (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
+    baz (i);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

[-- Attachment #3: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3875 bytes --]

gcc/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
	* gimplify.c (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #4: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1782 bytes --]

gcc/cp/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(found_cilk_for_p): Likewise.
	(find_cilk_for_stmt): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-10  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/cf3.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-10 22:07                     ` Iyer, Balaji V
@ 2014-02-12 14:59                       ` Jakub Jelinek
  2014-02-12 15:14                         ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-12 14:59 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Mon, Feb 10, 2014 at 10:07:18PM +0000, Iyer, Balaji V wrote:
> I looked at both but forgot to test them with my implementation. Sorry
> about this.  I have fixed the ICE issue.  To make sure this does not
> happen further, I have added your test cf3.C into test suite (renamed to
> cf3.cc).  I hope that is OK with you.

The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
Perhaps it would be much better though if instead of having a compile time
testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
#pragma omp parallel for in there with _Cilk_for and turn it into a runtime
testcase.

> I have attached a fixed patch and Changelogs. Is this OK?

Looks better (note, still looking just at the dumps), but not completely ok
yet.  On cf3.cc, I see in *.gimple:

        D.2883 = J<int>::begin (j);
        I<int>::I (&i, D.2883);
        D.2885 = J<int>::end (j);
        retval.0 = operator-<int> (D.2885, &i);
        D.2886 = retval.0 / 2;
        #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j)
          {
            difference_type retval.1;
            const struct I & D.2888;
            const difference_type D.2866;
            long int D.2889;
            struct I & D.2890;

            try
              {

                _Cilk_for (D.2864 = 0; D.2864 < retval.1; D.2864 = D.2864 + 2) private(D.2864)
                  {
                    D.2889 = D.2864 - D.2865;
                    D.2866 = D.2889;
                    try
                      {
                        D.2890 = I<int>::operator+= (&i, &D.2866);

First a minor nit, there is extra newline before _Cilk_for:
          newline_and_indent (buffer, spc);
          if (flag_cilkplus
              && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
            pp_string (buffer, "_Cilk_for (");
          else
            pp_string (buffer, "for (");
I guess for _Cilk_for collapse is never > 1, right?  If that is the case,
then perhaps you should move the newline_and_indent (buffer, spc); call
into the else block.

More importantly, what is retval.1?  I'd expect you should be using retval.0
there and have it also as firstprivate(retval.0) on the parallel.
In *.omplower dump I actually see:
        retval.0 = operator-<int> (D.2885, &i);
...
                            retval.1 = operator-<int> (D.2888, &i);
i.e. operator-<int> is called twice.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 14:59                       ` Jakub Jelinek
@ 2014-02-12 15:14                         ` Iyer, Balaji V
  2014-02-12 15:28                           ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-12 15:14 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'



> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, February 12, 2014 9:59 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Mon, Feb 10, 2014 at 10:07:18PM +0000, Iyer, Balaji V wrote:
> > I looked at both but forgot to test them with my implementation. Sorry
> > about this.  I have fixed the ICE issue.  To make sure this does not
> > happen further, I have added your test cf3.C into test suite (renamed
> > to cf3.cc).  I hope that is OK with you.
> 
> The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
> Perhaps it would be much better though if instead of having a compile time
> testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
> #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
> testcase.
> 
I really don't want to do that because I don't think there is a 1:1 match-up between the rules of #pragma omp for and _Cilk_for. 

> > I have attached a fixed patch and Changelogs. Is this OK?
> 
> Looks better (note, still looking just at the dumps), but not completely ok
> yet.  On cf3.cc, I see in *.gimple:
> 
>         D.2883 = J<int>::begin (j);
>         I<int>::I (&i, D.2883);
>         D.2885 = J<int>::end (j);
>         retval.0 = operator-<int> (D.2885, &i);
>         D.2886 = retval.0 / 2;
>         #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j)
>           {
>             difference_type retval.1;
>             const struct I & D.2888;
>             const difference_type D.2866;
>             long int D.2889;
>             struct I & D.2890;
> 
>             try
>               {
> 
>                 _Cilk_for (D.2864 = 0; D.2864 < retval.1; D.2864 = D.2864 + 2)
> private(D.2864)
>                   {
>                     D.2889 = D.2864 - D.2865;
>                     D.2866 = D.2889;
>                     try
>                       {
>                         D.2890 = I<int>::operator+= (&i, &D.2866);
> 
> First a minor nit, there is extra newline before _Cilk_for:
>           newline_and_indent (buffer, spc);
>           if (flag_cilkplus
>               && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
>             pp_string (buffer, "_Cilk_for (");
>           else
>             pp_string (buffer, "for ("); I guess for _Cilk_for collapse is never > 1,
> right?  If that is the case, then perhaps you should move the
> newline_and_indent (buffer, spc); call into the else block.
> 

OK. I will fix this and send you a patch? 

> More importantly, what is retval.1?  I'd expect you should be using retval.0
> there and have it also as firstprivate(retval.0) on the parallel.
> In *.omplower dump I actually see:
>         retval.0 = operator-<int> (D.2885, &i); ...
>                             retval.1 = operator-<int> (D.2888, &i); i.e. operator-<int> is
> called twice.
> 

Yes, one is for the if-clause and one is for the condition. It really doesn't matter because we get of the stuff in the condition and replace with our own for loop with something like the for-loop shown  below. So retval1 doesn't come into picture. It is only alive from parser to the expand_cilk_for function.

For (i = low; i < high; i++)
{
	<_Cilk_for_body>
}

So, is there any other changes that you need me to make?

Thanks,

Balaji V. Iyer.


> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 15:14                         ` Iyer, Balaji V
@ 2014-02-12 15:28                           ` Jakub Jelinek
  2014-02-12 17:05                             ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-12 15:28 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Wed, Feb 12, 2014 at 03:14:23PM +0000, Iyer, Balaji V wrote:
> > The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
> > Perhaps it would be much better though if instead of having a compile time
> > testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
> > #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
> > testcase.
> > 

> I really don't want to do that because I don't think there is a 1:1
> match-up between the rules of #pragma omp for and _Cilk_for.

But there is nothing OpenMP specific on any of the tests, all could as well
be tested by removing all the #pragma omp ... lines and just be tested as
normal C+++ loops.  Is there anything that _Cilk_for wouldn't handle?

IMNSHO if you remove all the #pragma omp parallel for lines (even with any
clauses it sometimes has) and replace for with _Cilk_for on the following
line, you should have a valid Cilk+ program.

Only f11 would need to be changed from:
#pragma omp parallel
  {
#pragma omp for nowait
    for (T i = x; i <= y; i += 3)
      baz (i);
#pragma omp single
    {
      T j = y + 3;
      baz (j);
    }
  }
to say:
  _Cilk_for (T i = x; i <= y; i += 3)
    baz (i);
  {
    T j = y + 3;
    baz (j);
  }
so that it performs the same thing.

> > First a minor nit, there is extra newline before _Cilk_for:
> >           newline_and_indent (buffer, spc);
> >           if (flag_cilkplus
> >               && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
> >             pp_string (buffer, "_Cilk_for (");
> >           else
> >             pp_string (buffer, "for ("); I guess for _Cilk_for collapse is never > 1,
> > right?  If that is the case, then perhaps you should move the
> > newline_and_indent (buffer, spc); call into the else block.
> > 
> 
> OK. I will fix this and send you a patch? 

Sure.

> > More importantly, what is retval.1?  I'd expect you should be using retval.0
> > there and have it also as firstprivate(retval.0) on the parallel.
> > In *.omplower dump I actually see:
> >         retval.0 = operator-<int> (D.2885, &i); ...
> >                             retval.1 = operator-<int> (D.2888, &i); i.e. operator-<int> is
> > called twice.
> > 
> 
> Yes, one is for the if-clause and one is for the condition. It really doesn't matter because we get of the stuff in the condition and replace with our own for loop with something like the for-loop shown  below. So retval1 doesn't come into picture. It is only alive from parser to the expand_cilk_for function.
> 
> For (i = low; i < high; i++)
> {
> 	<_Cilk_for_body>
> }

No, it really does matter.  Just look at the *.optimized dump with -O0 -fcilkplus:

  retval.0_4 = operator-<int> (_3, &i);
in _Z3foo1JIiE function and
  _4 = .omp_data_i_2(D)->j;
  _5 = J<int>::end (_4);
  retval.1_6 = operator-<int> (_5, &i);
  retval.3_7 = retval.1_6;
in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
shouldn't emit them, even when optimizing, if e.g. the operator- isn't
inline, g++ won't be able to optimize it away.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 15:28                           ` Jakub Jelinek
@ 2014-02-12 17:05                             ` Iyer, Balaji V
  2014-02-12 17:09                               ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-12 17:05 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

> > > More importantly, what is retval.1?  I'd expect you should be using
> > > retval.0 there and have it also as firstprivate(retval.0) on the parallel.
> > > In *.omplower dump I actually see:
> > >         retval.0 = operator-<int> (D.2885, &i); ...
> > >                             retval.1 = operator-<int> (D.2888, &i);
> > > i.e. operator-<int> is called twice.
> > >
> >
> > Yes, one is for the if-clause and one is for the condition. It really doesn't
> matter because we get of the stuff in the condition and replace with our own
> for loop with something like the for-loop shown  below. So retval1 doesn't
> come into picture. It is only alive from parser to the expand_cilk_for function.
> >
> > For (i = low; i < high; i++)
> > {
> > 	<_Cilk_for_body>
> > }
> 
> No, it really does matter.  Just look at the *.optimized dump with -O0 -
> fcilkplus:
> 
>   retval.0_4 = operator-<int> (_3, &i);
> in _Z3foo1JIiE function and
>   _4 = .omp_data_i_2(D)->j;
>   _5 = J<int>::end (_4);
>   retval.1_6 = operator-<int> (_5, &i);
>   retval.3_7 = retval.1_6;
> in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
> shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline,
> g++ won't be able to optimize it away.
> 

I looked at the test code you send me (cf3.cc) at -O1 and it is removing all the lines you have shown above. Yes, I would imagine -O0 to have code that can be redundant or unnecessary. Some of it could be the artifact of internal code insertion. But isn't the main job of the instruction scheduler to remove all these redundant work? Besides, it is just a function call. The compiler at -O2, -O and -O3 removes the chunk of code that you mentioned.

-Balaji V. Iyer.


> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 17:05                             ` Iyer, Balaji V
@ 2014-02-12 17:09                               ` Jakub Jelinek
  2014-02-12 17:15                                 ` Iyer, Balaji V
  2014-02-17  6:42                                 ` Iyer, Balaji V
  0 siblings, 2 replies; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-12 17:09 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Wed, Feb 12, 2014 at 05:04:38PM +0000, Iyer, Balaji V wrote:
> I looked at the test code you send me (cf3.cc) at -O1 and it is removing
> all the lines you have shown above.  Yes, I would imagine -O0 to have code
> that can be redundant or unnecessary.  Some of it could be the artifact of
> internal code insertion.  But isn't the main job of the instruction
> scheduler to remove all these redundant work?  Besides, it is just a
> function call.  The compiler at -O2, -O and -O3 removes the chunk of code
> that you mentioned.

As I said, just change the testcase so that the operator isn't inline, and
suddenly even -O3 will not be able to remove the call.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 17:09                               ` Jakub Jelinek
@ 2014-02-12 17:15                                 ` Iyer, Balaji V
  2014-02-17  6:42                                 ` Iyer, Balaji V
  1 sibling, 0 replies; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-12 17:15 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'



> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, February 12, 2014 12:10 PM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 12, 2014 at 05:04:38PM +0000, Iyer, Balaji V wrote:
> > I looked at the test code you send me (cf3.cc) at -O1 and it is
> > removing all the lines you have shown above.  Yes, I would imagine -O0
> > to have code that can be redundant or unnecessary.  Some of it could
> > be the artifact of internal code insertion.  But isn't the main job of
> > the instruction scheduler to remove all these redundant work?
> > Besides, it is just a function call.  The compiler at -O2, -O and -O3
> > removes the chunk of code that you mentioned.
> 
> As I said, just change the testcase so that the operator isn't inline, and
> suddenly even -O3 will not be able to remove the call.

I am sorry, I do not see any operators being asked to inline explicitly..

class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template <typename S> friend bool operator == (I<S> &, I<S> &);
  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
  template <typename S> friend bool operator < (I<S> &, I<S> &);
  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
  template <typename S> friend bool operator <= (I<S> &, I<S> &);
  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
  template <typename S> friend bool operator > (I<S> &, I<S> &);
  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
  template <typename S> friend bool operator >= (I<S> &, I<S> &);
  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
private:
  T *p;
};


> 
> 	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-12 17:09                               ` Jakub Jelinek
  2014-02-12 17:15                                 ` Iyer, Balaji V
@ 2014-02-17  6:42                                 ` Iyer, Balaji V
  2014-02-19  4:43                                   ` Iyer, Balaji V
  1 sibling, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-17  6:42 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

Hi Jakub,
	I still couldn't reproduce the issue you pointed me below, but I have fixed the other issues you have mentioned. I have also ported the test case that you mentioned (for1.C), but I have some questions about the changes and would like to confirm it with a colleague to make sure what I am doing is correct. Monday is a holiday here, and so I won't be able to do it till Tuesday. But, in the mean time I am attaching the fixed patch. Can you please look at it and let me know the other things I need to change?

Thanks,

Balaji V. Iyer.

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, February 12, 2014 12:10 PM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 12, 2014 at 05:04:38PM +0000, Iyer, Balaji V wrote:
> > I looked at the test code you send me (cf3.cc) at -O1 and it is
> > removing all the lines you have shown above.  Yes, I would imagine -O0
> > to have code that can be redundant or unnecessary.  Some of it could
> > be the artifact of internal code insertion.  But isn't the main job of
> > the instruction scheduler to remove all these redundant work?
> > Besides, it is just a function call.  The compiler at -O2, -O and -O3
> > removes the chunk of code that you mentioned.
> 
> As I said, just change the testcase so that the operator isn't inline, and
> suddenly even -O3 will not be able to remove the call.
> 
> 	Jakub

[-- Attachment #2: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3914 bytes --]

gcc/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
	* gimplify.c (omp_remove_clause): Likewise.
        (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #3: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1831 bytes --]

gcc/cp/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(found_cilk_for_p): Likewise.
	(find_cilk_for_stmt): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	Modified OMP_PARALLEL case to handle _Cilk_for.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-17  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/cf3.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.

[-- Attachment #4: diff.txt --]
[-- Type: text/plain, Size: 98840 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..1be12bd 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,52 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statement in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index bfc5797..eb6e2fb 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -416,6 +416,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
old mode 100644
new mode 100755
index f074ab1..509490c
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *, tree *, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
old mode 100644
new mode 100755
index dd0a45d..0b4259c
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count and is used solely by a _Cilk_for 
+   statement.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *cinit, tree *cend, tree *cstep)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +424,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +440,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +531,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1);
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end));
+		  if (TREE_CODE (cond) == LE_EXPR)
+		    add_expr = build_one_cst (TREE_TYPE (orig_end));
+		  else if (TREE_CODE (cond) == GE_EXPR)
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1);
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end),
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +577,18 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					   orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +607,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +647,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  *cinit = orig_init;
+	  *cend = orig_end;
+	  *cstep = orig_step;
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 07d23ac..e0f3561 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1394,6 +1394,11 @@ init_pragma (void)
 
   cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
old mode 100644
new mode 100755
index 66625aa..ff2c224
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11826,8 +11862,9 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
      an error from the initialization parsing.  */
   if (!fail)
     {
+      tree cf_init = NULL_TREE, cf_end = NULL_TREE, cf_step = NULL_TREE;
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &cf_init, &cf_end, &cf_step);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11904,28 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				   cf_init);
+	      count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (count), count,
+				   cf_step);
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11990,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12071,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12555,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13833,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index a6a1aa2..d604651 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -105,6 +105,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..0825777 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,163 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Helper function for walk_tree, used by found_cilk_for_p.  Sets data (of type
+   bool) to true of *TP is of type CILK_FOR.  If so, then WALK_SUBTREES is 
+   set to zero.  */
+
+static tree
+find_cilk_for_stmt (tree *tp, int *walk_subtrees, void *data)
+{
+  bool *found = (bool *) data;
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      *found = true;
+      data = (void *) found;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if T is of type CILK_FOR or one of its subtrees is of type
+   CILK_FOR.  */
+
+bool
+found_cilk_for_p (tree t)
+{
+  bool found = false;
+  walk_tree (&t, find_cilk_for_stmt, (void *) &found, NULL);
+  return found;
+}
+
+/* Returns all the statements till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+void
+copy_tree_till_cilk_for (tree *stmt_list, tree *new_stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  gcc_assert (new_stmt_list != NULL);
+
+  if (*new_stmt_list == NULL_TREE)
+    *new_stmt_list = alloc_stmt_list ();
+
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (!found_cilk_for_p (tsi_stmt (tsi)))
+      {
+	append_to_statement_list (tsi_stmt (tsi), new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == STATEMENT_LIST)
+      {
+	copy_tree_till_cilk_for (tsi_stmt_ptr (tsi), new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == BIND_EXPR)
+      {
+	copy_tree_till_cilk_for (&BIND_EXPR_BODY (tsi_stmt (tsi)),
+				 new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..0fde703 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,9 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern void copy_tree_till_cilk_for             (tree *, tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
+extern bool found_cilk_for_p                    (tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index f0722d6..94b7063 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28835,7 +28847,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29131,7 +29143,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29160,11 +29172,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code == CILK_SIMD
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29173,13 +29192,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29337,7 +29369,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29378,7 +29410,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (!processing_template_decl && code == CILK_FOR && cfor_block != NULL)
+	copy_tree_till_cilk_for (&t, cfor_block);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29434,7 +29476,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29522,7 +29564,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29994,7 +30036,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31290,6 +31332,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31469,9 +31543,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31789,31 +31884,104 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+
+  if (!processing_template_decl)
+    cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
old mode 100644
new mode 100755
index 7967db8..3b52897
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13580,13 +13580,51 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       break;
 
     case OMP_PARALLEL:
-      tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
-				args, complain, in_decl);
-      stmt = begin_omp_parallel ();
-      RECUR (OMP_PARALLEL_BODY (t));
-      OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
-	= OMP_PARALLEL_COMBINED (t);
-      break;
+      {
+	tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
+				  args, complain, in_decl);
+	
+	tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+	bool is_cilk_for = false;
+	if (flag_cilkplus && found_cilk_for_p (OMP_PARALLEL_BODY (t)))
+	  {
+	    is_cilk_for = true;
+	    topmost_blk = push_stmt_list ();
+	    top_block = begin_omp_parallel ();
+	  }
+	else
+	  stmt = begin_omp_parallel ();
+    
+	RECUR (OMP_PARALLEL_BODY (t));
+	tree cfor_blk = NULL_TREE;
+	if (is_cilk_for)
+	  {
+	    tree sb_blk_body = top_block;
+	    if (TREE_CODE (sb_blk_body) == BIND_EXPR) 
+	      sb_blk_body = BIND_EXPR_BODY (sb_blk_body);
+
+	    copy_tree_till_cilk_for (&sb_blk_body, &cfor_blk);
+	    cilk_for_move_clauses_upward (&tmp, top_block);
+	    top_block = finish_omp_parallel (tmp, sb_blk_body);
+	  }
+	else
+	  {
+	    stmt = finish_omp_parallel (tmp, stmt);
+	    OMP_PARALLEL_COMBINED (stmt) = OMP_PARALLEL_COMBINED (t);
+	  }
+	if (is_cilk_for)
+	  {
+	    OMP_PARALLEL_COMBINED (top_block) = 1;
+	    topmost_blk = pop_stmt_list (topmost_blk);
+	    if (cfor_blk != NULL_TREE) 
+	      stmt = cilk_for_create_bind_expr (NULL_TREE, cfor_blk, 
+						topmost_blk);
+	    else
+	      stmt = topmost_blk;
+	    add_stmt (stmt);
+	  }	
+      } 
+    break;
 
     case OMP_TASK:
       tmp = tsubst_omp_clauses (OMP_TASK_CLAUSES (t), false,
@@ -13599,6 +13637,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
old mode 100644
new mode 100755
index 9fb4fc0..ec47d0e
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6058,6 +6058,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6470,12 +6471,22 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree cf_step = NULL_TREE, cf_init = NULL_TREE, cf_end = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
-
+			      body, pre_body, &cf_init, &cf_end, &cf_step);
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR && !processing_template_decl)
+    {
+      tree count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				cf_init);
+      count = fold_build2 (CEIL_DIV_EXPR, TREE_TYPE (count), count, cf_step);
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ff341d4..7488563 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5856,7 +5856,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6086,8 +6087,12 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6454,6 +6459,21 @@ gimplify_adjust_omp_clauses (tree *list_p)
   delete_omp_context (ctx);
 }
 
+/* Removes the OMP clause C from a list of clauses in *LIST_P.  */
+
+static void
+omp_remove_clause (tree c, tree *list_p)
+{
+  tree ii = NULL_TREE;
+  while ((ii = *list_p) != NULL)
+    {
+      if (simple_cst_equal (ii, c) == 1)
+	*list_p = OMP_CLAUSE_CHAIN (ii);
+      else
+	list_p = &OMP_CLAUSE_CHAIN (ii);
+    }
+}
+
 /* Gimplify the contents of an OMP_PARALLEL statement.  This involves
    gimplification of the body, as well as scanning the body for used
    variables.  We need to do this scan now, because variable-sized
@@ -6465,11 +6485,29 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
-
+  bool is_cilk_for = false;
+  tree c = NULL_TREE;
+  for (c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
+  
+  /* The SCHEDULE clause is not necessary anymore.  */
+  if (is_cilk_for) 
+    omp_remove_clause (c, &OMP_PARALLEL_CLAUSES (expr));
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6505,7 +6543,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6570,8 +6608,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
-  gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+    gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
+			       simd ? ORT_SIMD : ORT_WORKSHARE,
+			       TREE_CODE (for_stmt) == CILK_FOR);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6627,7 +6666,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       tree c = NULL_TREE;
       if (orig_for_stmt != for_stmt)
 	/* Do this only on innermost construct for combined ones.  */;
-      else if (simd)
+      else if (simd || TREE_CODE (for_stmt) == CILK_FOR)
 	{
 	  splay_tree_node n = splay_tree_lookup (gimplify_omp_ctxp->variables,
 						 (splay_tree_key)decl);
@@ -6832,6 +6871,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6865,7 +6905,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), i);
       gimple_omp_for_set_incr (gfor, i, TREE_OPERAND (t, 1));
     }
-
+  
   gimplify_seq_add_stmt (pre_p, gfor);
   if (ret != GS_ALL_DONE)
     return GS_ERROR;
@@ -6902,7 +6942,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6962,7 +7002,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7904,6 +7944,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
old mode 100644
new mode 100755
index 91c8656..3454dc9
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statements, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,227 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (type, t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+
+  tree step_var = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (step_var, 
+					       fold_convert (type, step)), 
+		    GSI_NEW_STMT);
+  t = build2 (MULT_EXPR, type, ind_var, step_var);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __cilkrts_cilk_for_64 or __cilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7351,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..7a779f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10000
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
new file mode 100644
index 0000000..8d88c5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
@@ -0,0 +1,96 @@
+/* { dg-options "-fcilkplus" } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+template <typename T>
+void baz (I<T> &i);
+
+void
+foo (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
+    baz (i);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-17  6:42                                 ` Iyer, Balaji V
@ 2014-02-19  4:43                                   ` Iyer, Balaji V
  2014-02-19 11:24                                     ` Jakub Jelinek
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-19  4:43 UTC (permalink / raw)
  To: 'Jakub Jelinek'
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]

Hi Jakub,
	Attached, please find a patch with the test case attached (for1.cc). The patch is the same but the cp-changelog has been modified to reflect the new test-case. Is this OK to install?

Thanks,

Balaji V. Iyer.





> -----Original Message-----
> From: Iyer, Balaji V
> Sent: Monday, February 17, 2014 1:42 AM
> To: Jakub Jelinek
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
> 	I still couldn't reproduce the issue you pointed me below, but I have
> fixed the other issues you have mentioned. I have also ported the test case
> that you mentioned (for1.C), but I have some questions about the changes
> and would like to confirm it with a colleague to make sure what I am doing is
> correct. Monday is a holiday here, and so I won't be able to do it till Tuesday.
> But, in the mean time I am attaching the fixed patch. Can you please look at it
> and let me know the other things I need to change?
> 
> Thanks,
> 
> Balaji V. Iyer.
> 
> > -----Original Message-----
> > From: Jakub Jelinek [mailto:jakub@redhat.com]
> > Sent: Wednesday, February 12, 2014 12:10 PM
> > To: Iyer, Balaji V
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'rth@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Wed, Feb 12, 2014 at 05:04:38PM +0000, Iyer, Balaji V wrote:
> > > I looked at the test code you send me (cf3.cc) at -O1 and it is
> > > removing all the lines you have shown above.  Yes, I would imagine
> > > -O0 to have code that can be redundant or unnecessary.  Some of it
> > > could be the artifact of internal code insertion.  But isn't the
> > > main job of the instruction scheduler to remove all these redundant
> work?
> > > Besides, it is just a function call.  The compiler at -O2, -O and
> > > -O3 removes the chunk of code that you mentioned.
> >
> > As I said, just change the testcase so that the operator isn't inline,
> > and suddenly even -O3 will not be able to remove the call.
> >
> > 	Jakub

[-- Attachment #2: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3914 bytes --]

gcc/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
	* gimplify.c (omp_remove_clause): Likewise.
        (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #3: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1873 bytes --]

gcc/cp/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(found_cilk_for_p): Likewise.
	(find_cilk_for_stmt): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	Modified OMP_PARALLEL case to handle _Cilk_for.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-19  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/cf3.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.
	* g++.dg/cilk-plus/CK/for1.cc: Likewise.

[-- Attachment #4: diff.txt --]
[-- Type: text/plain, Size: 109973 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..1be12bd 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,52 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statement in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index bfc5797..eb6e2fb 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -416,6 +416,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..509490c
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *, tree *, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index dd0a45d..0b4259c
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count and is used solely by a _Cilk_for 
+   statement.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *cinit, tree *cend, tree *cstep)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +424,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +440,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +531,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1);
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end));
+		  if (TREE_CODE (cond) == LE_EXPR)
+		    add_expr = build_one_cst (TREE_TYPE (orig_end));
+		  else if (TREE_CODE (cond) == GE_EXPR)
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1);
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end),
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +577,18 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					   orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +607,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +647,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  *cinit = orig_init;
+	  *cend = orig_end;
+	  *cstep = orig_step;
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 07d23ac..e0f3561 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1394,6 +1394,11 @@ init_pragma (void)
 
   cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 66625aa..ff2c224
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11826,8 +11862,9 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
      an error from the initialization parsing.  */
   if (!fail)
     {
+      tree cf_init = NULL_TREE, cf_end = NULL_TREE, cf_step = NULL_TREE;
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &cf_init, &cf_end, &cf_step);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11904,28 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				   cf_init);
+	      count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (count), count,
+				   cf_step);
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11990,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12071,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12555,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13833,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index a6a1aa2..d604651 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -105,6 +105,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..0825777 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,163 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Helper function for walk_tree, used by found_cilk_for_p.  Sets data (of type
+   bool) to true of *TP is of type CILK_FOR.  If so, then WALK_SUBTREES is 
+   set to zero.  */
+
+static tree
+find_cilk_for_stmt (tree *tp, int *walk_subtrees, void *data)
+{
+  bool *found = (bool *) data;
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      *found = true;
+      data = (void *) found;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if T is of type CILK_FOR or one of its subtrees is of type
+   CILK_FOR.  */
+
+bool
+found_cilk_for_p (tree t)
+{
+  bool found = false;
+  walk_tree (&t, find_cilk_for_stmt, (void *) &found, NULL);
+  return found;
+}
+
+/* Returns all the statements till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+void
+copy_tree_till_cilk_for (tree *stmt_list, tree *new_stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  gcc_assert (new_stmt_list != NULL);
+
+  if (*new_stmt_list == NULL_TREE)
+    *new_stmt_list = alloc_stmt_list ();
+
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (!found_cilk_for_p (tsi_stmt (tsi)))
+      {
+	append_to_statement_list (tsi_stmt (tsi), new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == STATEMENT_LIST)
+      {
+	copy_tree_till_cilk_for (tsi_stmt_ptr (tsi), new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == BIND_EXPR)
+      {
+	copy_tree_till_cilk_for (&BIND_EXPR_BODY (tsi_stmt (tsi)),
+				 new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..0fde703 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,9 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern void copy_tree_till_cilk_for             (tree *, tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
+extern bool found_cilk_for_p                    (tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 9818213..1530139 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28833,7 +28845,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29129,7 +29141,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29158,11 +29170,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code == CILK_SIMD
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29171,13 +29190,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29335,7 +29367,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29376,7 +29408,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (!processing_template_decl && code == CILK_FOR && cfor_block != NULL)
+	copy_tree_till_cilk_for (&t, cfor_block);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29432,7 +29474,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29520,7 +29562,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29992,7 +30034,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31288,6 +31330,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31467,9 +31541,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31787,31 +31882,104 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+
+  if (!processing_template_decl)
+    cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7967db8..3b52897
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13580,13 +13580,51 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       break;
 
     case OMP_PARALLEL:
-      tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
-				args, complain, in_decl);
-      stmt = begin_omp_parallel ();
-      RECUR (OMP_PARALLEL_BODY (t));
-      OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
-	= OMP_PARALLEL_COMBINED (t);
-      break;
+      {
+	tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
+				  args, complain, in_decl);
+	
+	tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+	bool is_cilk_for = false;
+	if (flag_cilkplus && found_cilk_for_p (OMP_PARALLEL_BODY (t)))
+	  {
+	    is_cilk_for = true;
+	    topmost_blk = push_stmt_list ();
+	    top_block = begin_omp_parallel ();
+	  }
+	else
+	  stmt = begin_omp_parallel ();
+    
+	RECUR (OMP_PARALLEL_BODY (t));
+	tree cfor_blk = NULL_TREE;
+	if (is_cilk_for)
+	  {
+	    tree sb_blk_body = top_block;
+	    if (TREE_CODE (sb_blk_body) == BIND_EXPR) 
+	      sb_blk_body = BIND_EXPR_BODY (sb_blk_body);
+
+	    copy_tree_till_cilk_for (&sb_blk_body, &cfor_blk);
+	    cilk_for_move_clauses_upward (&tmp, top_block);
+	    top_block = finish_omp_parallel (tmp, sb_blk_body);
+	  }
+	else
+	  {
+	    stmt = finish_omp_parallel (tmp, stmt);
+	    OMP_PARALLEL_COMBINED (stmt) = OMP_PARALLEL_COMBINED (t);
+	  }
+	if (is_cilk_for)
+	  {
+	    OMP_PARALLEL_COMBINED (top_block) = 1;
+	    topmost_blk = pop_stmt_list (topmost_blk);
+	    if (cfor_blk != NULL_TREE) 
+	      stmt = cilk_for_create_bind_expr (NULL_TREE, cfor_blk, 
+						topmost_blk);
+	    else
+	      stmt = topmost_blk;
+	    add_stmt (stmt);
+	  }	
+      } 
+    break;
 
     case OMP_TASK:
       tmp = tsubst_omp_clauses (OMP_TASK_CLAUSES (t), false,
@@ -13599,6 +13637,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index eb1c44e..9861c5c
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6058,6 +6058,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6470,12 +6471,22 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree cf_step = NULL_TREE, cf_init = NULL_TREE, cf_end = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
-
+			      body, pre_body, &cf_init, &cf_end, &cf_step);
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR && !processing_template_decl)
+    {
+      tree count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				cf_init);
+      count = fold_build2 (CEIL_DIV_EXPR, TREE_TYPE (count), count, cf_step);
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ff341d4..7488563 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5856,7 +5856,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6086,8 +6087,12 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6454,6 +6459,21 @@ gimplify_adjust_omp_clauses (tree *list_p)
   delete_omp_context (ctx);
 }
 
+/* Removes the OMP clause C from a list of clauses in *LIST_P.  */
+
+static void
+omp_remove_clause (tree c, tree *list_p)
+{
+  tree ii = NULL_TREE;
+  while ((ii = *list_p) != NULL)
+    {
+      if (simple_cst_equal (ii, c) == 1)
+	*list_p = OMP_CLAUSE_CHAIN (ii);
+      else
+	list_p = &OMP_CLAUSE_CHAIN (ii);
+    }
+}
+
 /* Gimplify the contents of an OMP_PARALLEL statement.  This involves
    gimplification of the body, as well as scanning the body for used
    variables.  We need to do this scan now, because variable-sized
@@ -6465,11 +6485,29 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
-
+  bool is_cilk_for = false;
+  tree c = NULL_TREE;
+  for (c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
+  
+  /* The SCHEDULE clause is not necessary anymore.  */
+  if (is_cilk_for) 
+    omp_remove_clause (c, &OMP_PARALLEL_CLAUSES (expr));
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6505,7 +6543,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6570,8 +6608,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
-  gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+    gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
+			       simd ? ORT_SIMD : ORT_WORKSHARE,
+			       TREE_CODE (for_stmt) == CILK_FOR);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6627,7 +6666,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       tree c = NULL_TREE;
       if (orig_for_stmt != for_stmt)
 	/* Do this only on innermost construct for combined ones.  */;
-      else if (simd)
+      else if (simd || TREE_CODE (for_stmt) == CILK_FOR)
 	{
 	  splay_tree_node n = splay_tree_lookup (gimplify_omp_ctxp->variables,
 						 (splay_tree_key)decl);
@@ -6832,6 +6871,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6865,7 +6905,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), i);
       gimple_omp_for_set_incr (gfor, i, TREE_OPERAND (t, 1));
     }
-
+  
   gimplify_seq_add_stmt (pre_p, gfor);
   if (ret != GS_ALL_DONE)
     return GS_ERROR;
@@ -6902,7 +6942,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6962,7 +7002,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7904,6 +7944,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 91c8656..3454dc9
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statements, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,227 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (type, t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+
+  tree step_var = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (step_var, 
+					       fold_convert (type, step)), 
+		    GSI_NEW_STMT);
+  t = build2 (MULT_EXPR, type, ind_var, step_var);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7351,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..7a779f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10000
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
new file mode 100644
index 0000000..8d88c5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
@@ -0,0 +1,96 @@
+/* { dg-options "-fcilkplus" } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+template <typename T>
+void baz (I<T> &i);
+
+void
+foo (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
+    baz (i);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
new file mode 100644
index 0000000..78b8cf1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
@@ -0,0 +1,378 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <cstdio>
+#endif
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+extern "C" void abort ();
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+int results[2000];
+
+template <typename T>
+void
+baz (I<T> &i)
+{
+  if (*i < 0 || *i >= 2000)
+    {
+#if HAVE_IO
+      printf ("*i(%d) is < 0 or >= 2000\n", *i);
+      fflush (stdout);
+#endif
+     __builtin_abort ();
+    }
+  else 
+    results[*i]++;
+}
+
+void
+f1 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 6)
+    { 
+      baz (i);
+    }
+
+#if HAVE_IO
+  printf("===== Starting F1 =========\n");
+  for (I<int> i = x; i <= y; i+= 6) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]); 
+    fflush (stdout);
+  }
+#endif
+}
+
+void
+f2 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i < y - 1; i += 2) 
+    baz (i);
+
+#if HAVE_IO
+  printf("===== Starting F2 =========\n");
+  for (int ii = 0; ii < 1998; ii += 2) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f3 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 1)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (int ii = 20; ii < 1987; ii += 1) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+
+#endif
+}
+
+template <typename T>
+void
+f4 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + (2000 - 64); i > y + 10; --i)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (I<int> i = x + (2000 - 64); i > y + 10; --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+void
+f5 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <int N>
+void
+f6 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    {
+      I<int> j = i + N;
+      baz (j);
+    }
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i = i - 12 + 2)
+    {
+      I<int> j = i + N;
+      printf("Result[%4d] = %2d\n", *j, results[*j]);
+      fflush (stdout);
+    }
+#endif
+}
+template <int N>
+void
+f7 (I<int> ii, const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I <int> i = x - 10; i <= y + 10; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x - 10; i <= y + 10; i += N) 
+    {
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <int N>
+void
+f8 (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i <= j.end () + N; i += 2)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = j.begin (); i <= j.end () + N; i += 2) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+
+}
+
+template <typename T, int N>
+void
+f9 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i <= y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i <= y;  i  = i + N)
+    { 
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <typename T, int N>
+void
+f10 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i > y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i > y;  i  = i + N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f11 (const T &x, const T &y)
+{
+    _Cilk_for (T i = x; i <= y; i += 3)
+      baz (i);
+
+#if HAVE_IO
+  for (T i = x; i <= y;  i  += 3) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+      T j = y + 3;
+      baz (j);
+
+}
+
+template <typename T>
+void
+f12 (const T &x, const T &y)
+{
+  _Cilk_for (T i = x; i > y; --i)
+    baz (i);
+#if HAVE_IO
+  for (T i = x; i > y;  --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+template <int N>
+struct K
+{
+  template <typename T>
+  static void
+  f13 (const T &x, const T &y)
+  {
+    _Cilk_for (T i = x; i <= y + N; i += N)
+      baz (i);
+#if HAVE_IO
+  for (T i = x; i < y+N;  i += N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+  }
+};
+
+#define check(expr) \
+  for (int i = 0; i < 2000; i++)			\
+    if (expr)						\
+      {							\
+	if (results[i] != 1)				\
+	  __builtin_abort ();				\
+	results[i] = 0;					\
+      }							\
+    else if (results[i])				\
+      abort ()
+
+int
+main ()
+{
+  int a[2000];
+  long b[2000];
+  for (int i = 0; i < 2000; i++)
+    {
+      a[i] = i;
+      b[i] = i;
+    }
+  f1 (&a[10], &a[1990]);
+  check (i >= 10 && i <= 1990 && (i - 10) % 6 == 0);
+  f2 (&a[0], &a[1999]);
+  check (i < 1998 && (i & 1) == 0);
+  f3<int> (&a[20], &a[1837]);
+  check (i >= 20 && i <= 1837);
+  f4<int> (&a[0], &a[30]);
+  check (i > 40 && i <= 2000 - 64);
+
+  /* f5 and f6 calls below are invalid since it will do a wrapround.
+     If this can be caught during compile time (i.e. the values are constant) 
+     then the compiler will emit errors.  */
+  //f5 (&a[0], &a[100]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+  //f6<-10> (&a[10], &a[110]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+
+  f7<6> (I<int> (), &a[12], &a[1800]);
+  check (i >= 2 && i <= 1808 && (i - 2) % 6 == 0);
+
+  f8<121> (J<int> (&a[14], &a[1803]));
+  check (i >= 14 && i <= 1924 && (i & 1) == 0);
+  f9<int, 7> (&a[33], &a[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<int, -7> (&a[1939], &a[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<int> > (&a[16], &a[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<int> > (&a[1761], &a[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<int> > (&a[1], &a[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  f9<long, 7> (&b[33], &b[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<long, -7> (&b[1939], &b[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<long> > (&b[16], &b[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<long> > (&b[1761], &b[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<long> > (&b[1], &b[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-19  4:43                                   ` Iyer, Balaji V
@ 2014-02-19 11:24                                     ` Jakub Jelinek
  2014-02-21  4:38                                       ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2014-02-19 11:24 UTC (permalink / raw)
  To: Iyer, Balaji V
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

On Wed, Feb 19, 2014 at 04:43:06AM +0000, Iyer, Balaji V wrote:
> Attached, please find a patch with the test case attached (for1.cc). The
> patch is the same but the cp-changelog has been modified to reflect the
> new test-case.  Is this OK to install?

1) have you tested the patch at all?  I see
FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
regressions caused by the patch, that is of course unacceptable.

2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out
why calling something multiple times is a bad idea, actually the latest patch
is even worse than the older one, you now create 3 calls to the end method
and 3 calls to operator-.  There should be just one call to that, before the
#pragma omp parallel obviously, anything that doesn't do that is just bad.
I don't see a point in having if clause on the _Cilk_for, just keep it on
the #pragma omp parallel only, at ompexp time you can easily find it there,
there is no point to check it again in the parallel body of the function.

typedef __PTRDIFF_TYPE__ ptrdiff_t;

template <typename T>
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template <typename S> friend bool operator == (I<S> &, I<S> &);
  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
  template <typename S> friend bool operator < (I<S> &, I<S> &);
  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
  template <typename S> friend bool operator <= (I<S> &, I<S> &);
  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
  template <typename S> friend bool operator > (I<S> &, I<S> &);
  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
  template <typename S> friend bool operator >= (I<S> &, I<S> &);
  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
private:
  T *p;
};
template <typename T> I<T>::I () : p (0) {}
template <typename T> I<T>::~I () {}
template <typename T> I<T>::I (T *x) : p (x) {}
template <typename T> I<T>::I (const I &x) : p (x.p) {}
template <typename T> T &I<T>::operator * () { return *p; }
template <typename T> T *I<T>::operator -> () { return p; }
template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
template <typename T> __attribute__((noinline)) I<T> I<T>::operator - (const difference_type &x) const { __asm (""); return I (p - x); }
template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
template <typename T> __attribute__((noinline)) typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { __asm (""); return x.p - y.p; }
template <typename T> __attribute__((noinline)) typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { __asm (""); return x.p - y.p; }
template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }

template <typename T>
class J
{
public:
  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
  const I<T> &begin ();
  const I<T> &end ();
private:
  I<T> b, e;
};

template <typename T> const I<T> &J<T>::begin () { return b; }
template <typename T> __attribute__((noinline)) const I<T> &J<T>::end () { __asm (""); return e; }

template <typename T>
void baz (I<T> &i);

void
foo (J<int> j)
{
  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
    baz (i);
}

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-19 11:24                                     ` Jakub Jelinek
@ 2014-02-21  4:38                                       ` Iyer, Balaji V
  2014-02-24 23:16                                         ` Iyer, Balaji V
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-21  4:38 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 3797 bytes --]

Hi Jakub,
	I have attached the fixed patch and have answered your questions below.

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, February 19, 2014 6:24 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 19, 2014 at 04:43:06AM +0000, Iyer, Balaji V wrote:
> > Attached, please find a patch with the test case attached (for1.cc).
> > The patch is the same but the cp-changelog has been modified to
> > reflect the new test-case.  Is this OK to install?
> 
> 1) have you tested the patch at all?  I see
> FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
> FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
> FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
> FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
> FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
> regressions caused by the patch, that is of course unacceptable.
> 

Fixed. I apologize for them. I have confirmed that it is OK now. 


> 2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out why
> calling something multiple times is a bad idea, actually the latest patch is even
> worse than the older one, you now create 3 calls to the end method and 3
> calls to operator-.  There should be just one call to that, before the #pragma
> omp parallel obviously, anything that doesn't do that is just bad.
> I don't see a point in having if clause on the _Cilk_for, just keep it on the
> #pragma omp parallel only, at ompexp time you can easily find it there, there
> is no point to check it again in the parallel body of the function.
> 

I have removed the if-clause from the _Cilk_for and now it is just in #pragma omp parallel.

I have removed the 3rd operator-, but I am not able to remove the 2nd. I am looking into it, but I am not able to do it. The thing is, first operator- was for the if-clause, which I need to calculate the loop-count for the __cilkrts_cilk_for_64 function. The second one is not necessary because the end-value does not matter for _Cilk_for since they will be replaced with the low and high values.  I tried several things such as stopping gimplifcation of the cond value, or replacing it with a constant, etc and those are causing other problems elsewhere.

	 The thing is, I am not able to find a way to automate this. I can't assume the cond's  end-value is same as count, because this is only true if we have an iterator and the handle_omp_for_iterator function modifies the cond value correctly. I need to use the count value (retval.0) as retval.1 but count-value is computed at a later time than handle_omp_for_iterator (since it does not have any knowledge about the #pragma omp parallel). It is giving the correct answers for the benchmarks and is removing the 2nd operator-  when optimization is turned on for the inlinable operator-.

Can you provide me some advice about how to do it?

Thanks,

Balaji V. Iyer.



[-- Attachment #2: diff.txt --]
[-- Type: text/plain, Size: 110190 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..1be12bd 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,52 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statement in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index e23a9df..d451072 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -416,6 +416,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..509490c
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *, tree *, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index dd0a45d..e10dcf4
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count and is used solely by a _Cilk_for 
+   statement.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *cinit, tree *cend, tree *cstep)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +424,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +440,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +531,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1);
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end));
+		  if (TREE_CODE (cond) == LE_EXPR)
+		    add_expr = build_one_cst (TREE_TYPE (orig_end));
+		  else if (TREE_CODE (cond) == GE_EXPR)
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1);
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end),
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +577,19 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  if (code == CILK_FOR)
+		    orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					     orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +608,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +648,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  *cinit = orig_init;
+	  *cend = orig_end;
+	  *cstep = orig_step;
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 91fffdb..3155fea 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1395,6 +1395,11 @@ init_pragma (void)
   if (!flag_preprocess_only)
     cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				  false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d0d35c5..9830ead
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11826,8 +11862,9 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
      an error from the initialization parsing.  */
   if (!fail)
     {
+      tree cf_init = NULL_TREE, cf_end = NULL_TREE, cf_step = NULL_TREE;
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &cf_init, &cf_end, &cf_step);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11904,28 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				   cf_init);
+	      count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (count), count,
+				   cf_step);
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11990,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12071,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12555,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13833,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index a6a1aa2..d604651 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -105,6 +105,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..0825777 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,163 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Helper function for walk_tree, used by found_cilk_for_p.  Sets data (of type
+   bool) to true of *TP is of type CILK_FOR.  If so, then WALK_SUBTREES is 
+   set to zero.  */
+
+static tree
+find_cilk_for_stmt (tree *tp, int *walk_subtrees, void *data)
+{
+  bool *found = (bool *) data;
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      *found = true;
+      data = (void *) found;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if T is of type CILK_FOR or one of its subtrees is of type
+   CILK_FOR.  */
+
+bool
+found_cilk_for_p (tree t)
+{
+  bool found = false;
+  walk_tree (&t, find_cilk_for_stmt, (void *) &found, NULL);
+  return found;
+}
+
+/* Returns all the statements till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+void
+copy_tree_till_cilk_for (tree *stmt_list, tree *new_stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  gcc_assert (new_stmt_list != NULL);
+
+  if (*new_stmt_list == NULL_TREE)
+    *new_stmt_list = alloc_stmt_list ();
+
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (!found_cilk_for_p (tsi_stmt (tsi)))
+      {
+	append_to_statement_list (tsi_stmt (tsi), new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == STATEMENT_LIST)
+      {
+	copy_tree_till_cilk_for (tsi_stmt_ptr (tsi), new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == BIND_EXPR)
+      {
+	copy_tree_till_cilk_for (&BIND_EXPR_BODY (tsi_stmt (tsi)),
+				 new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..0fde703 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,9 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern void copy_tree_till_cilk_for             (tree *, tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
+extern bool found_cilk_for_p                    (tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4673f78..8bf97aa 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28836,7 +28848,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29132,7 +29144,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29161,11 +29173,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code != CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29174,13 +29193,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29338,7 +29370,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29379,7 +29411,17 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (!processing_template_decl && code == CILK_FOR && cfor_block != NULL)
+	copy_tree_till_cilk_for (&t, cfor_block);
+      add_stmt (t);
+
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29435,7 +29477,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29523,7 +29565,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29995,7 +30037,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31291,6 +31333,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31470,9 +31544,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31790,31 +31885,104 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+
+  if (!processing_template_decl)
+    cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6477fce..5c92fe5
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13580,13 +13580,51 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       break;
 
     case OMP_PARALLEL:
-      tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
-				args, complain, in_decl);
-      stmt = begin_omp_parallel ();
-      RECUR (OMP_PARALLEL_BODY (t));
-      OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
-	= OMP_PARALLEL_COMBINED (t);
-      break;
+      {
+	tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
+				  args, complain, in_decl);
+	
+	tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+	bool is_cilk_for = false;
+	if (flag_cilkplus && found_cilk_for_p (OMP_PARALLEL_BODY (t)))
+	  {
+	    is_cilk_for = true;
+	    topmost_blk = push_stmt_list ();
+	    top_block = begin_omp_parallel ();
+	  }
+	else
+	  stmt = begin_omp_parallel ();
+    
+	RECUR (OMP_PARALLEL_BODY (t));
+	tree cfor_blk = NULL_TREE;
+	if (is_cilk_for)
+	  {
+	    tree sb_blk_body = top_block;
+	    if (TREE_CODE (sb_blk_body) == BIND_EXPR) 
+	      sb_blk_body = BIND_EXPR_BODY (sb_blk_body);
+
+	    copy_tree_till_cilk_for (&sb_blk_body, &cfor_blk);
+	    cilk_for_move_clauses_upward (&tmp, top_block);
+	    top_block = finish_omp_parallel (tmp, sb_blk_body);
+	  }
+	else
+	  {
+	    stmt = finish_omp_parallel (tmp, stmt);
+	    OMP_PARALLEL_COMBINED (stmt) = OMP_PARALLEL_COMBINED (t);
+	  }
+	if (is_cilk_for)
+	  {
+	    OMP_PARALLEL_COMBINED (top_block) = 1;
+	    topmost_blk = pop_stmt_list (topmost_blk);
+	    if (cfor_blk != NULL_TREE) 
+	      stmt = cilk_for_create_bind_expr (NULL_TREE, cfor_blk, 
+						topmost_blk);
+	    else
+	      stmt = topmost_blk;
+	    add_stmt (stmt);
+	  }	
+      } 
+    break;
 
     case OMP_TASK:
       tmp = tsubst_omp_clauses (OMP_TASK_CLAUSES (t), false,
@@ -13599,6 +13637,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 6f32496..4c0f8d4
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6059,6 +6059,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6471,12 +6472,22 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree cf_step = NULL_TREE, cf_init = NULL_TREE, cf_end = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
-
+			      body, pre_body, &cf_init, &cf_end, &cf_step);
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR && !processing_template_decl)
+    {
+      tree count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				cf_init);
+      count = fold_build2 (CEIL_DIV_EXPR, TREE_TYPE (count), count, cf_step);
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ff341d4..1531bd5
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5856,7 +5856,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6086,8 +6087,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* It is not necessary to gimplify the IF clause for _Cilk_for
+	     since it is something that was artifically inserted.  */
+	  if (is_cilk_for && region_type == ORT_WORKSHARE)
+	    {
+	      remove = true;
+	      break;
+	    }
+
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6096,8 +6109,13 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	case OMP_CLAUSE_THREAD_LIMIT:
 	case OMP_CLAUSE_DIST_SCHEDULE:
 	case OMP_CLAUSE_DEVICE:
-	  if (gimplify_expr (&OMP_CLAUSE_OPERAND (c, 0), pre_p, NULL,
-			     is_gimple_val, fb_rvalue) == GS_ERROR)
+	  /* Remove the SCHEDULE clause from #pragma omp parallel of a
+	     _Cilk_for statement.  */
+	  if (is_cilk_for && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE 
+	      && region_type == ORT_COMBINED_PARALLEL)
+	    remove = true;
+	  else if (gimplify_expr (&OMP_CLAUSE_OPERAND (c, 0), pre_p, NULL, 
+				  is_gimple_val, fb_rvalue) == GS_ERROR)
 	    remove = true;
 	  break;
 
@@ -6465,11 +6483,25 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   tree expr = *expr_p;
   gimple g;
   gimple_seq body = NULL;
-
+  bool is_cilk_for = false;
+  tree c = NULL_TREE;
+  for (c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6505,7 +6537,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6567,11 +6599,13 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   bitmap has_decl_expr = NULL;
 
   orig_for_stmt = for_stmt = *expr_p;
+  bool is_cilk_for = flag_cilkplus && TREE_CODE (for_stmt) == CILK_FOR;
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
+  
   gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+			     simd ? ORT_SIMD : ORT_WORKSHARE, is_cilk_for);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6627,7 +6661,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       tree c = NULL_TREE;
       if (orig_for_stmt != for_stmt)
 	/* Do this only on innermost construct for combined ones.  */;
-      else if (simd)
+      else if (simd || is_cilk_for)
 	{
 	  splay_tree_node n = splay_tree_lookup (gimplify_omp_ctxp->variables,
 						 (splay_tree_key)decl);
@@ -6832,6 +6866,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6865,7 +6900,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), i);
       gimple_omp_for_set_incr (gfor, i, TREE_OPERAND (t, 1));
     }
-
+  
   gimplify_seq_add_stmt (pre_p, gfor);
   if (ret != GS_ALL_DONE)
     return GS_ERROR;
@@ -6902,7 +6937,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6962,7 +6997,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7904,6 +7939,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 91c8656..3454dc9
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statements, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,227 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (type, t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+
+  tree step_var = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (step_var, 
+					       fold_convert (type, step)), 
+		    GSI_NEW_STMT);
+  t = build2 (MULT_EXPR, type, ind_var, step_var);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7351,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..7a779f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10000
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
new file mode 100644
index 0000000..8d88c5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
@@ -0,0 +1,96 @@
+/* { dg-options "-fcilkplus" } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+template <typename T>
+void baz (I<T> &i);
+
+void
+foo (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
+    baz (i);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
new file mode 100644
index 0000000..78b8cf1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
@@ -0,0 +1,378 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <cstdio>
+#endif
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+extern "C" void abort ();
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+int results[2000];
+
+template <typename T>
+void
+baz (I<T> &i)
+{
+  if (*i < 0 || *i >= 2000)
+    {
+#if HAVE_IO
+      printf ("*i(%d) is < 0 or >= 2000\n", *i);
+      fflush (stdout);
+#endif
+     __builtin_abort ();
+    }
+  else 
+    results[*i]++;
+}
+
+void
+f1 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 6)
+    { 
+      baz (i);
+    }
+
+#if HAVE_IO
+  printf("===== Starting F1 =========\n");
+  for (I<int> i = x; i <= y; i+= 6) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]); 
+    fflush (stdout);
+  }
+#endif
+}
+
+void
+f2 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i < y - 1; i += 2) 
+    baz (i);
+
+#if HAVE_IO
+  printf("===== Starting F2 =========\n");
+  for (int ii = 0; ii < 1998; ii += 2) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f3 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 1)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (int ii = 20; ii < 1987; ii += 1) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+
+#endif
+}
+
+template <typename T>
+void
+f4 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + (2000 - 64); i > y + 10; --i)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (I<int> i = x + (2000 - 64); i > y + 10; --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+void
+f5 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <int N>
+void
+f6 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    {
+      I<int> j = i + N;
+      baz (j);
+    }
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i = i - 12 + 2)
+    {
+      I<int> j = i + N;
+      printf("Result[%4d] = %2d\n", *j, results[*j]);
+      fflush (stdout);
+    }
+#endif
+}
+template <int N>
+void
+f7 (I<int> ii, const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I <int> i = x - 10; i <= y + 10; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x - 10; i <= y + 10; i += N) 
+    {
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <int N>
+void
+f8 (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i <= j.end () + N; i += 2)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = j.begin (); i <= j.end () + N; i += 2) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+
+}
+
+template <typename T, int N>
+void
+f9 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i <= y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i <= y;  i  = i + N)
+    { 
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <typename T, int N>
+void
+f10 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i > y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i > y;  i  = i + N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f11 (const T &x, const T &y)
+{
+    _Cilk_for (T i = x; i <= y; i += 3)
+      baz (i);
+
+#if HAVE_IO
+  for (T i = x; i <= y;  i  += 3) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+      T j = y + 3;
+      baz (j);
+
+}
+
+template <typename T>
+void
+f12 (const T &x, const T &y)
+{
+  _Cilk_for (T i = x; i > y; --i)
+    baz (i);
+#if HAVE_IO
+  for (T i = x; i > y;  --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+template <int N>
+struct K
+{
+  template <typename T>
+  static void
+  f13 (const T &x, const T &y)
+  {
+    _Cilk_for (T i = x; i <= y + N; i += N)
+      baz (i);
+#if HAVE_IO
+  for (T i = x; i < y+N;  i += N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+  }
+};
+
+#define check(expr) \
+  for (int i = 0; i < 2000; i++)			\
+    if (expr)						\
+      {							\
+	if (results[i] != 1)				\
+	  __builtin_abort ();				\
+	results[i] = 0;					\
+      }							\
+    else if (results[i])				\
+      abort ()
+
+int
+main ()
+{
+  int a[2000];
+  long b[2000];
+  for (int i = 0; i < 2000; i++)
+    {
+      a[i] = i;
+      b[i] = i;
+    }
+  f1 (&a[10], &a[1990]);
+  check (i >= 10 && i <= 1990 && (i - 10) % 6 == 0);
+  f2 (&a[0], &a[1999]);
+  check (i < 1998 && (i & 1) == 0);
+  f3<int> (&a[20], &a[1837]);
+  check (i >= 20 && i <= 1837);
+  f4<int> (&a[0], &a[30]);
+  check (i > 40 && i <= 2000 - 64);
+
+  /* f5 and f6 calls below are invalid since it will do a wrapround.
+     If this can be caught during compile time (i.e. the values are constant) 
+     then the compiler will emit errors.  */
+  //f5 (&a[0], &a[100]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+  //f6<-10> (&a[10], &a[110]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+
+  f7<6> (I<int> (), &a[12], &a[1800]);
+  check (i >= 2 && i <= 1808 && (i - 2) % 6 == 0);
+
+  f8<121> (J<int> (&a[14], &a[1803]));
+  check (i >= 14 && i <= 1924 && (i & 1) == 0);
+  f9<int, 7> (&a[33], &a[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<int, -7> (&a[1939], &a[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<int> > (&a[16], &a[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<int> > (&a[1761], &a[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<int> > (&a[1], &a[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  f9<long, 7> (&b[33], &b[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<long, -7> (&b[1939], &b[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<long> > (&b[16], &b[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<long> > (&b[1761], &b[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<long> > (&b[1], &b[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

[-- Attachment #3: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3986 bytes --]

gcc/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
        (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.  Also, remove the IF clause from _Cilk_for and
	schedule clause from the #pragma omp parallel inserted by _Cilk_for.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #4: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1873 bytes --]

gcc/cp/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(found_cilk_for_p): Likewise.
	(find_cilk_for_stmt): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	Modified OMP_PARALLEL case to handle _Cilk_for.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-20  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/cf3.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.
	* g++.dg/cilk-plus/CK/for1.cc: Likewise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PING] [PATCH] _Cilk_for for C and C++
  2014-02-21  4:38                                       ` Iyer, Balaji V
@ 2014-02-24 23:16                                         ` Iyer, Balaji V
       [not found]                                           ` <BF230D13CA30DD48930C31D4099330003A4D2123@FMSMSX101.amr.corp.intel.com>
  0 siblings, 1 reply; 26+ messages in thread
From: Iyer, Balaji V @ 2014-02-24 23:16 UTC (permalink / raw)
  To: 'Jakub Jelinek'
  Cc: 'Jason Merrill', 'Jeff Law',
	'Aldy Hernandez', 'gcc-patches@gcc.gnu.org',
	'rth@redhat.com'

[-- Attachment #1: Type: text/plain, Size: 4401 bytes --]

Hi Jakub,
	Please see my responses below.

> -----Original Message-----
> From: Iyer, Balaji V
> Sent: Thursday, February 20, 2014 11:38 PM
> To: Jakub Jelinek
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'rth@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
> 	I have attached the fixed patch and have answered your questions
> below.
> 
> > -----Original Message-----
> > From: Jakub Jelinek [mailto:jakub@redhat.com]
> > Sent: Wednesday, February 19, 2014 6:24 AM
> > To: Iyer, Balaji V
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'rth@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Wed, Feb 19, 2014 at 04:43:06AM +0000, Iyer, Balaji V wrote:
> > > Attached, please find a patch with the test case attached (for1.cc).
> > > The patch is the same but the cp-changelog has been modified to
> > > reflect the new test-case.  Is this OK to install?
> >
> > 1) have you tested the patch at all?  I see
> > FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
> > FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
> > FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
> > FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
> > FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
> > regressions caused by the patch, that is of course unacceptable.
> >
> 
> Fixed. I apologize for them. I have confirmed that it is OK now.
> 
> 
> > 2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find
> > out why calling something multiple times is a bad idea, actually the
> > latest patch is even worse than the older one, you now create 3 calls
> > to the end method and 3 calls to operator-.  There should be just one
> > call to that, before the #pragma omp parallel obviously, anything that
> doesn't do that is just bad.
> > I don't see a point in having if clause on the _Cilk_for, just keep it
> > on the #pragma omp parallel only, at ompexp time you can easily find
> > it there, there is no point to check it again in the parallel body of the
> function.
> >
> 
> I have removed the if-clause from the _Cilk_for and now it is just in #pragma
> omp parallel.
> 
> I have removed the 3rd operator-, but I am not able to remove the 2nd. I am
> looking into it, but I am not able to do it. The thing is, first operator- was for
> the if-clause, which I need to calculate the loop-count for the
> __cilkrts_cilk_for_64 function. The second one is not necessary because the
> end-value does not matter for _Cilk_for since they will be replaced with the
> low and high values.  I tried several things such as stopping gimplifcation of
> the cond value, or replacing it with a constant, etc and those are causing
> other problems elsewhere.
> 
> 	 The thing is, I am not able to find a way to automate this. I can't
> assume the cond's  end-value is same as count, because this is only true if we
> have an iterator and the handle_omp_for_iterator function modifies the
> cond value correctly. I need to use the count value (retval.0) as retval.1 but
> count-value is computed at a later time than handle_omp_for_iterator (since
> it does not have any knowledge about the #pragma omp parallel). It is giving
> the correct answers for the benchmarks and is removing the 2nd operator-
> when optimization is turned on for the inlinable operator-.
> 
> Can you provide me some advice about how to do it?

I have fixed the issue. Now the 2nd operator- does not occur. Attached, please fixed a fixed patch. Is this OK for trunk?

Thanks,

Balaji V. Iyer.

[-- Attachment #2: diff.txt --]
[-- Type: text/plain, Size: 110435 bytes --]

diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 1a16f66..1be12bd 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -91,3 +91,52 @@ c_finish_cilk_clauses (tree clauses)
     }
   return clauses;
 }
+
+/* Structure used to pass information into a walk_tree function and
+   find_cilk_for.  */
+struct clause_struct
+{
+  bool is_set;
+  tree clauses;
+};
+
+/* Helper function for walk_tree used in cilk_for_move_clauses_upward.
+   If *TP is a CILK_FOR statement, then set *DATA (type-casted to 
+   struct clause_struct) with its clauses.  */
+
+static tree
+find_cilk_for (tree *tp, int *walk_subtrees, void *data)
+{
+  struct clause_struct *cstruct = (struct clause_struct *) data;
+  if (*tp && TREE_CODE (*tp) == CILK_FOR && !cstruct->is_set)
+    {
+      cstruct->is_set = true;
+      cstruct->clauses = OMP_FOR_CLAUSES (*tp);
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Moves the IF-CLAUSE and SCHEDULE clause from _CILK_FOR statement in
+   STMT into *PARALLEL_CLAUSES.  */
+ 
+void
+cilk_for_move_clauses_upward (tree *parallel_clauses, tree stmt)
+{
+  struct clause_struct cstruct;
+  cstruct.is_set = false;
+  cstruct.clauses = NULL_TREE;
+  walk_tree (&stmt, find_cilk_for, (void *) &cstruct, NULL);
+
+  tree clauses = cstruct.clauses;
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+      {
+	if (*parallel_clauses)
+	  OMP_CLAUSE_CHAIN (*parallel_clauses) = c;
+	else
+	  *parallel_clauses = c;
+      }
+}
+
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index e23a9df..d451072 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -416,6 +416,7 @@ const struct c_common_resword c_common_reswords[] =
   { "_Complex",		RID_COMPLEX,	0 },
   { "_Cilk_spawn",      RID_CILK_SPAWN, 0 },
   { "_Cilk_sync",       RID_CILK_SYNC,  0 },
+  { "_Cilk_for",        RID_CILK_FOR,   0 },
   { "_Imaginary",	RID_IMAGINARY, D_CONLY },
   { "_Decimal32",       RID_DFLOAT32,  D_CONLY | D_EXT },
   { "_Decimal64",       RID_DFLOAT64,  D_CONLY | D_EXT },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f074ab1..509490c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -149,7 +149,7 @@ enum rid
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
   /* Cilk Plus keywords.  */
-  RID_CILK_SPAWN, RID_CILK_SYNC,
+  RID_CILK_SPAWN, RID_CILK_SYNC, RID_CILK_FOR,
   
   /* Objective-C ("AT" reserved words - they are only keywords when
      they follow '@')  */
@@ -1203,7 +1203,7 @@ extern void c_finish_omp_flush (location_t);
 extern void c_finish_omp_taskwait (location_t);
 extern void c_finish_omp_taskyield (location_t);
 extern tree c_finish_omp_for (location_t, enum tree_code, tree, tree, tree,
-			      tree, tree, tree);
+			      tree, tree, tree, tree *, tree *, tree *);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 				 tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
@@ -1389,4 +1389,5 @@ extern tree make_cilk_frame (tree);
 extern tree create_cilk_function_exit (tree, bool, bool);
 extern tree cilk_install_body_pedigree_operations (tree);
 extern void cilk_outline (tree, tree *, void *);
+extern void cilk_for_move_clauses_upward (tree *, tree);
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index dd0a45d..e10dcf4 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -386,17 +386,19 @@ c_omp_for_incr_canonicalize_ptr (location_t loc, tree decl, tree incr)
    INITV, CONDV and INCRV are vectors containing initialization
    expressions, controlling predicates and increment expressions.
    BODY is the body of the loop and PRE_BODY statements that go before
-   the loop.  */
+   the loop.  *COUNT is the loop-count and is used solely by a _Cilk_for 
+   statement.  */
 
 tree
 c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
-		  tree initv, tree condv, tree incrv, tree body, tree pre_body)
+		  tree initv, tree condv, tree incrv, tree body,
+		  tree pre_body, tree *cinit, tree *cend, tree *cstep)
 {
   location_t elocus;
   bool fail = false;
   int i;
-
-  if (code == CILK_SIMD
+  tree orig_init = NULL_TREE, orig_end = NULL_TREE, orig_step = NULL_TREE;
+  if ((code == CILK_SIMD || code == CILK_FOR) 
       && !c_check_cilk_loop (locus, TREE_VEC_ELT (declv, 0)))
     fail = true;
 
@@ -422,6 +424,8 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	  fail = true;
 	}
 
+      if (TREE_CODE (init) == MODIFY_EXPR)
+	orig_init = TREE_OPERAND (init, 1);
       /* In the case of "for (int i = 0...)", init will be a decl.  It should
 	 have a DECL_INITIAL that we can turn into an assignment.  */
       if (init == decl)
@@ -436,6 +440,7 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      fail = true;
 	    }
 
+	  orig_init = init;
 	  init = build_modify_expr (elocus, decl, NULL_TREE, NOP_EXPR,
 	      			    /* FIXME diagnostics: This should
 				       be the location of the INIT.  */
@@ -526,9 +531,20 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 					    0))
 		    TREE_SET_CODE (cond, TREE_CODE (cond) == NE_EXPR
 					 ? LT_EXPR : GE_EXPR);
-		  else if (code != CILK_SIMD)
+		  else if (code != CILK_SIMD && code != CILK_FOR)
 		    cond_ok = false;
 		}
+	      if (flag_cilkplus && code == CILK_FOR)
+		{ 
+		  orig_end = TREE_OPERAND (cond, 1);
+		  tree add_expr = build_zero_cst (TREE_TYPE (orig_end));
+		  if (TREE_CODE (cond) == LE_EXPR)
+		    add_expr = build_one_cst (TREE_TYPE (orig_end));
+		  else if (TREE_CODE (cond) == GE_EXPR)
+		    add_expr = build_int_cst (TREE_TYPE (orig_end), -1);
+		  orig_end = fold_build2 (PLUS_EXPR, TREE_TYPE (orig_end),
+					  orig_end, add_expr);
+		}
 	    }
 
 	  if (!cond_ok)
@@ -561,6 +577,19 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_OPERAND (incr, 0) != decl)
 		break;
 
+	      if (TREE_CODE (incr) == POSTINCREMENT_EXPR
+		  || TREE_CODE (incr) == PREINCREMENT_EXPR)
+		orig_step = build_one_cst (TREE_TYPE (incr));
+	      else
+		orig_step = integer_minus_one_node;
+ 
+	      if (POINTER_TYPE_P (TREE_TYPE (incr)))
+		{
+		  tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (incr)));
+		  if (code == CILK_FOR)
+		    orig_step = fold_build2 (MULT_EXPR, TREE_TYPE (orig_step),
+					     orig_step, unit);
+		}
 	      incr_ok = true;
 	      incr = c_omp_for_incr_canonicalize_ptr (elocus, decl, incr);
 	      break;
@@ -579,14 +608,24 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	      if (TREE_CODE (TREE_OPERAND (incr, 1)) == PLUS_EXPR
 		  && (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl
 		      || TREE_OPERAND (TREE_OPERAND (incr, 1), 1) == decl))
-		incr_ok = true;
+		{
+		  if (TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  else
+		    orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 0);
+		  incr_ok = true;
+		}
 	      else if ((TREE_CODE (TREE_OPERAND (incr, 1)) == MINUS_EXPR
 			|| (TREE_CODE (TREE_OPERAND (incr, 1))
 			    == POINTER_PLUS_EXPR))
 		       && TREE_OPERAND (TREE_OPERAND (incr, 1), 0) == decl)
-		incr_ok = true;
+		{
+		  orig_step = TREE_OPERAND (TREE_OPERAND (incr, 1), 1);
+		  incr_ok = true;
+		}
 	      else
 		{
+		  orig_step = TREE_OPERAND (incr, 1);
 		  tree t = check_omp_for_incr_expr (elocus,
 						    TREE_OPERAND (incr, 1),
 						    decl);
@@ -609,6 +648,14 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 	    }
 	}
 
+      /* These variables could be NULL if an error occurred.  */
+      if (flag_cilkplus && code == CILK_FOR 
+	  && orig_end && orig_init && orig_step)
+	{
+	  *cinit = orig_init;
+	  *cend = orig_end;
+	  *cstep = orig_step;
+	}
       TREE_VEC_ELT (initv, i) = init;
       TREE_VEC_ELT (incrv, i) = incr;
     }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 91fffdb..3155fea 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1395,6 +1395,11 @@ init_pragma (void)
   if (!flag_preprocess_only)
     cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false,
 				  false);
+
+  if (flag_cilkplus && !flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "cilk", "grainsize",
+				  PRAGMA_CILK_GRAINSIZE, true, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6f1bf74..b9f09ba 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,9 @@ typedef enum pragma_kind {
   /* Top level clause to handle all Cilk Plus pragma simd clauses.  */
   PRAGMA_CILK_SIMD,
 
+  /* This pragma handles setting of grainsize for a _Cilk_for.  */
+  PRAGMA_CILK_GRAINSIZE,
+
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d0d35c5..9830ead 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1248,10 +1248,11 @@ static bool c_parser_objc_diagnose_bad_element_prefix
   (c_parser *, struct c_declspecs *);
 
 /* Cilk Plus supporting routines.  */
-static void c_parser_cilk_simd (c_parser *);
+static void c_parser_cilk_simd (c_parser *, bool, tree);
 static bool c_parser_cilk_verify_simd (c_parser *, enum pragma_context);
 static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
+static void c_parser_cilk_grainsize (c_parser *);
 
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
@@ -4878,6 +4879,16 @@ c_parser_statement_after_labels (c_parser *parser)
 	case RID_FOR:
 	  c_parser_for_statement (parser, false);
 	  break;
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (c_parser_peek_token (parser)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      c_parser_skip_to_end_of_block_or_statement (parser);
+	    }
+	  else
+	    c_parser_cilk_simd (parser, true, integer_zero_node);
+	  break;
 	case RID_CILK_SYNC:
 	  c_parser_consume_token (parser);
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON, "expected %<;%>");
@@ -9496,7 +9507,24 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       if (!c_parser_cilk_verify_simd (parser, context))
 	return false;
       c_parser_consume_pragma (parser);
-      c_parser_cilk_simd (parser);
+      c_parser_cilk_simd (parser, false, NULL_TREE);
+      return false;
+    case PRAGMA_CILK_GRAINSIZE:
+      if (!flag_cilkplus)
+	{
+	  warning (0, "%<#pragma grainsize%> ignored because -fcilkplus is not"
+		   " enabled");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      if (context == pragma_external)
+	{
+	  error_at (c_parser_peek_token (parser)->location,
+		    "%<#pragma grainsize%> must be inside a function");
+	  c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
+	  return false;
+	}
+      c_parser_cilk_grainsize (parser);
       return false;
 
     default:
@@ -11591,7 +11619,7 @@ c_parser_omp_flush (c_parser *parser)
 
 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-		       tree clauses, tree *cclauses)
+		       tree clauses, tree grain, tree *cclauses)
 {
   tree decl, cond, incr, save_break, save_cont, body, init, stmt, cl;
   tree declv, condv, incrv, initv, ret = NULL;
@@ -11599,6 +11627,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   int i, collapse = 1, nbraces = 0;
   location_t for_loc;
   vec<tree, va_gc> *for_block = make_tree_vector ();
+  tree count = NULL_TREE;
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
     if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
@@ -11611,11 +11640,18 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
   condv = make_tree_vec (collapse);
   incrv = make_tree_vec (collapse);
 
-  if (!c_parser_next_token_is_keyword (parser, RID_FOR))
+  if (code != CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_FOR))
     {
       c_parser_error (parser, "for statement expected");
       return NULL;
     }
+  if (code == CILK_FOR
+      && !c_parser_next_token_is_keyword (parser, RID_CILK_FOR))
+    {
+      c_parser_error (parser, "_Cilk_for statement expected");
+      return NULL;
+    }
   for_loc = c_parser_peek_token (parser)->location;
   c_parser_consume_token (parser);
 
@@ -11693,7 +11729,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	    case LE_EXPR:
 	      break;
 	    case NE_EXPR:
-	      if (code == CILK_SIMD)
+	      if (code == CILK_SIMD || code == CILK_FOR)
 		break;
 	      /* FALLTHRU.  */
 	    default:
@@ -11826,8 +11862,9 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
      an error from the initialization parsing.  */
   if (!fail)
     {
+      tree cf_init = NULL_TREE, cf_end = NULL_TREE, cf_step = NULL_TREE;
       stmt = c_finish_omp_for (loc, code, declv, initv, condv,
-			       incrv, body, NULL);
+			       incrv, body, NULL, &cf_init, &cf_end, &cf_step);
       if (stmt)
 	{
 	  if (cclauses != NULL
@@ -11867,6 +11904,28 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 		  }
 	    }
 	  OMP_FOR_CLAUSES (stmt) = clauses;
+	  /* If it is a _Cilk_for statement, then the OMP_FOR_CLAUSES location
+	     stores the user-defined grain value or an integer_zero_node 
+	     indicating that the runtime must compute a suitable grain, inside
+	     a SCHEDULE clause.  Similarly the loop-count is also stored in
+	     a IF clause.  These clauses do not make sense for _Cilk_for but
+	     it is just used to transmit information.  */
+	  if (code == CILK_FOR)
+	    {
+	      count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				   cf_init);
+	      count = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (count), count,
+				   cf_step);
+	      tree l = build_omp_clause (EXPR_LOCATION (grain),
+					 OMP_CLAUSE_SCHEDULE);
+	      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+	      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+	      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (stmt);
+	      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+	      OMP_CLAUSE_IF_EXPR (c) = count;
+	      OMP_CLAUSE_CHAIN (c) = l;
+	      OMP_FOR_CLAUSES (stmt) = c;
+	    }
 	}
       ret = stmt;
     }
@@ -11931,7 +11990,8 @@ c_parser_omp_simd (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_SIMD, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12011,7 +12071,8 @@ c_parser_omp_for (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, cclauses);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_FOR, clauses, NULL_TREE,
+			       cclauses);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -12494,7 +12555,8 @@ c_parser_omp_distribute (location_t loc, c_parser *parser,
     }
 
   block = c_begin_compound_stmt (true);
-  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = c_parser_omp_for_loop (loc, parser, OMP_DISTRIBUTE, clauses, NULL_TREE,
+			       NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
 
@@ -13771,18 +13833,84 @@ c_parser_cilk_all_clauses (c_parser *parser)
   return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry point for parsing Cilk Plus <#pragma simd> for
-   loops.  */
+/* This function helps parse the grainsize pragma for a _Cilk_for statement. 
+   Here is the correct syntax of this pragma: 
+	    #pragma cilk grainsize = <EXP> 
+ */
 
 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_grainsize (c_parser *parser)
 {
-  tree clauses = c_parser_cilk_all_clauses (parser);
+  extern tree convert_to_integer (tree, tree);
+
+  /* consume the 'grainsize' keyword.  */
+  c_parser_consume_pragma (parser);
+
+  if (c_parser_require (parser, CPP_EQ, "expected %<=%>") != 0)
+    {
+      struct c_expr g_expr = c_parser_binary_expression (parser, NULL, NULL);
+      if (g_expr.value && TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+	{
+	  error_at (input_location, "cannot convert grain to long integer.\n");
+	  c_parser_skip_to_pragma_eol (parser);
+	}   
+      else if (g_expr.value && g_expr.value != error_mark_node)
+	{
+	  c_parser_skip_to_pragma_eol (parser);
+	  c_token *token = c_parser_peek_token (parser);
+	  if (token && token->type == CPP_KEYWORD
+	      && token->keyword == RID_CILK_FOR)
+	    {
+	      /* Remove EXCESS_PRECISION_EXPR since we are going to convert
+		 it to long int.  */
+	      if (TREE_CODE (g_expr.value) == EXCESS_PRECISION_EXPR)
+		g_expr.value = TREE_OPERAND (g_expr.value, 0);
+	      tree grain = convert_to_integer (long_integer_type_node,
+					       g_expr.value);
+	      if (grain && grain != error_mark_node) 
+		c_parser_cilk_simd (parser, true, grain);
+	    }
+	  else
+	    warning (0, "grainsize pragma is not followed by %<_Cilk_for%>");
+	}
+      else
+	c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    c_parser_skip_to_pragma_eol (parser);
+}
+
+/* Main entry point for parsing Cilk Plus <#pragma simd> for and
+   _Cilk_for loops.  If IS_CILK_FOR is true then it is a _Cilk_for loop 
+   and GRAIN is the grain value passed in through pragma or 0.  */
+
+static void
+c_parser_cilk_simd (c_parser *parser, bool is_cilk_for, tree grain)
+{
+  tree super_block = NULL_TREE;
+  tree clauses = NULL_TREE;
+  
+  if (!is_cilk_for)
+    clauses = c_parser_cilk_all_clauses (parser);
+  else
+    super_block = c_begin_omp_parallel ();
   tree block = c_begin_compound_stmt (true);
   location_t loc = c_parser_peek_token (parser)->location;
-  c_parser_omp_for_loop (loc, parser, CILK_SIMD, clauses, NULL);
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  c_parser_omp_for_loop (loc, parser, code, clauses, grain, NULL);
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);
+  if (is_cilk_for)
+    {
+      /* Move all the clauses from the #pragma OMP for to #pragma omp parallel.
+	 This is because if these values are not integers and it is placed in
+	 OMP_FOR then the compiler will insert value chains for them.  */
+      tree parallel_clauses = NULL_TREE;
+      cilk_for_move_clauses_upward (&parallel_clauses, super_block);
+    /* The term super_block is not used in scheduling terms but in 
+       set-theory, i.e. set vs. super-set.  */ 
+      c_finish_omp_parallel (loc, parallel_clauses, super_block);
+    }
 }
 \f
 /* Parse a transaction attribute (GCC Extension).
diff --git a/gcc/cilk-builtins.def b/gcc/cilk-builtins.def
index 9f3240a..bf319d5 100644
--- a/gcc/cilk-builtins.def
+++ b/gcc/cilk-builtins.def
@@ -31,3 +31,5 @@ DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SYNC, "__cilkrts_sync")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_LEAVE_FRAME, "__cilkrts_leave_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_POP_FRAME, "__cilkrts_pop_frame")
 DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_SAVE_FP, "__cilkrts_save_fp_ctrl_state")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_32, "__cilkrts_cilk_for_32")
+DEF_CILK_BUILTIN_STUB (BUILT_IN_CILK_FOR_64, "__cilkrts_cilk_for_64")
diff --git a/gcc/cilk-common.c b/gcc/cilk-common.c
index a6a1aa2..d604651 100644
--- a/gcc/cilk-common.c
+++ b/gcc/cilk-common.c
@@ -105,6 +105,27 @@ install_builtin (const char *name, tree fntype, enum built_in_function code,
   return fndecl;
 }
 
+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree 
+declare_cilk_for_builtin (const char *name, tree type, 
+			  enum built_in_function code)
+{
+  tree cb, ft, fn;
+
+  cb = build_function_type_list (void_type_node,
+                                 ptr_type_node, type, type,
+                                 NULL_TREE);
+  cb = build_pointer_type (cb);
+  ft = build_function_type_list (void_type_node,
+                                 cb, ptr_type_node, type,
+                                 integer_type_node, NULL_TREE);
+  fn = install_builtin (name, ft, code, false);
+  TREE_NOTHROW (fn) = 0;
+
+  return fn;
+}
+
 /* Creates and initializes all the built-in Cilk keywords functions and three
    structures: __cilkrts_stack_frame, __cilkrts_pedigree and __cilkrts_worker.
    Detailed information about __cilkrts_stack_frame and
@@ -269,6 +290,14 @@ cilk_init_builtins (void)
   cilk_save_fp_fndecl = install_builtin ("__cilkrts_save_fp_ctrl_state", 
 					 fptr_fun, BUILT_IN_CILK_SAVE_FP,
 					 false);
+  /* __cilkrts_cilk_for_32 (...);  */
+  cilk_for_32_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_32", 
+						 unsigned_intSI_type_node, 
+						 BUILT_IN_CILK_FOR_32);
+  /* __cilkrts_cilk_for_64 (...);  */
+  cilk_for_64_fndecl = declare_cilk_for_builtin ("__cilkrts_cilk_for_64", 
+						 unsigned_intDI_type_node, 
+						 BUILT_IN_CILK_FOR_64);
 }
 
 /* Get the appropriate frame arguments for CALL that is of type CALL_EXPR.  */
diff --git a/gcc/cilk.h b/gcc/cilk.h
index ae96f53..1fee929 100644
--- a/gcc/cilk.h
+++ b/gcc/cilk.h
@@ -40,6 +40,9 @@ enum cilk_tree_index  {
   CILK_TI_F_POP,                      /* __cilkrts_pop_frame (...).  */
   CILK_TI_F_RETHROW,                  /* __cilkrts_rethrow (...).  */
   CILK_TI_F_SAVE_FP,                  /* __cilkrts_save_fp_ctrl_state (...).  */
+  CILK_TI_F_LOOP_32,                  /* __cilkrts_cilk_for_32 (...).  */
+  CILK_TI_F_LOOP_64,                  /* __cilkrts_cilk_for_64 (...).  */
+
   /* __cilkrts_stack_frame struct fields.  */
   CILK_TI_FRAME_FLAGS,                /* stack_frame->flags.  */
   CILK_TI_FRAME_PARENT,               /* stack_frame->parent.  */
@@ -77,6 +80,8 @@ extern GTY (()) tree cilk_trees[CILK_TI_MAX];
 #define cilk_rethrow_fndecl           cilk_trees[CILK_TI_F_RETHROW]
 #define cilk_pop_fndecl               cilk_trees[CILK_TI_F_POP]
 #define cilk_save_fp_fndecl           cilk_trees[CILK_TI_F_SAVE_FP]
+#define cilk_for_32_fndecl            cilk_trees[CILK_TI_F_LOOP_32]
+#define cilk_for_64_fndecl            cilk_trees[CILK_TI_F_LOOP_64]
 
 #define cilk_worker_type_fndecl       cilk_trees[CILK_TI_WORKER_TYPE]
 #define cilk_frame_type_decl          cilk_trees[CILK_TI_FRAME_TYPE]
diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index f3a2aff..0825777 100644
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -143,3 +143,163 @@ cilk_install_body_with_frame_cleanup (tree fndecl, tree orig_body, void *wd)
 			    &list);
 }
 
+/* Helper function for walk_tree, used by found_cilk_for_p.  Sets data (of type
+   bool) to true of *TP is of type CILK_FOR.  If so, then WALK_SUBTREES is 
+   set to zero.  */
+
+static tree
+find_cilk_for_stmt (tree *tp, int *walk_subtrees, void *data)
+{
+  bool *found = (bool *) data;
+  if (TREE_CODE (*tp) == CILK_FOR)
+    {
+      *found = true;
+      data = (void *) found;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if T is of type CILK_FOR or one of its subtrees is of type
+   CILK_FOR.  */
+
+bool
+found_cilk_for_p (tree t)
+{
+  bool found = false;
+  walk_tree (&t, find_cilk_for_stmt, (void *) &found, NULL);
+  return found;
+}
+
+/* Returns all the statements till CILK_FOR statement in *STMT_LIST.  Removes
+   those statements from STMT_LIST and upate STMT_LIST accordingly.  */
+
+void
+copy_tree_till_cilk_for (tree *stmt_list, tree *new_stmt_list)
+{
+  gcc_assert (TREE_CODE (*stmt_list) == STATEMENT_LIST);
+  gcc_assert (new_stmt_list != NULL);
+
+  if (*new_stmt_list == NULL_TREE)
+    *new_stmt_list = alloc_stmt_list ();
+
+  tree_stmt_iterator tsi;
+  for (tsi = tsi_start (*stmt_list); !tsi_end_p (tsi);)
+    if (!found_cilk_for_p (tsi_stmt (tsi)))
+      {
+	append_to_statement_list (tsi_stmt (tsi), new_stmt_list); 
+	tsi_delink (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == STATEMENT_LIST)
+      {
+	copy_tree_till_cilk_for (tsi_stmt_ptr (tsi), new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else if (TREE_CODE (tsi_stmt (tsi)) == BIND_EXPR)
+      {
+	copy_tree_till_cilk_for (&BIND_EXPR_BODY (tsi_stmt (tsi)),
+				 new_stmt_list);
+	tsi_next (&tsi);
+      }
+    else
+      tsi_next (&tsi);
+}
+
+/* Structure to hold the list of variables that are being killed in a
+   statement list.  This structure is only used in a WALK_TREE function.  */
+struct cilk_for_var_list
+{
+  vec <tree, va_gc> *list;
+};
+
+/* Helper function for WALK_TREE used in find_killed_vars function.  
+   Returns all the variables that are being killed (or set) in *TP.  
+   *DATA holds the structure to hold the variable list.  */
+
+static tree
+find_vars (tree *tp, int *walk_subtrees, void *data)
+{
+  struct cilk_for_var_list *vlist = (struct cilk_for_var_list *) data;
+
+  if (!tp || !*tp)
+    return NULL_TREE;
+
+  if (TREE_CODE (*tp) == INIT_EXPR || TREE_CODE (*tp) == MODIFY_EXPR)
+    {
+      vec_safe_push (vlist->list, TREE_OPERAND (*tp, 0));
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Returns a vector of TREES that will hold the variable that
+   is killed (i.e. written or set) in STMT_LIST.  */
+
+static vec <tree, va_gc> *
+find_killed_vars (tree stmt_list)
+{
+  struct cilk_for_var_list vlist;
+  memset (&vlist, 0, sizeof (vlist));
+  cp_walk_tree (&stmt_list, find_vars, &vlist, NULL);
+  return vlist.list;
+}
+
+/* Inserts OMP_CLAUSE_FIRSTPRIVATE clauses into *CLAUSES for each variables
+   in *LIST.  */
+
+static void
+insert_firstpriv_clauses (vec <tree, va_gc> *list, tree *clauses)
+{
+  if (vec_safe_is_empty (list))
+    return;
+
+  tree lhs;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (list, ix, lhs)
+    {
+      tree new_clause = build_omp_clause (EXPR_LOCATION (lhs),
+					  OMP_CLAUSE_FIRSTPRIVATE);
+      OMP_CLAUSE_DECL (new_clause) = lhs;
+      OMP_CLAUSE_CHAIN (new_clause) = *clauses;
+      *clauses = new_clause;
+    }
+}
+
+/* Returns a BIND_EXPR with BIND_EXPR_VARS holding VARS and BIND_EXPR_BODY
+   contains STMT_LIST and CFOR_PAR_LIST.  */
+
+tree
+cilk_for_create_bind_expr (tree vars, tree stmt_list, tree cfor_par_list)
+{
+  gcc_assert (TREE_CODE (stmt_list) == STATEMENT_LIST);
+  tree_stmt_iterator tsi;
+  tree return_expr = make_node (BIND_EXPR);
+  BIND_EXPR_BODY (return_expr) = alloc_stmt_list ();
+  bool found = false; 
+  vec <tree, va_gc> *cfor_vars = find_killed_vars (stmt_list);
+
+  insert_firstpriv_clauses (cfor_vars, &OMP_PARALLEL_CLAUSES (cfor_par_list));
+
+  /* If there is a supplied list of vars then there is no reason to find them 
+     again.  */
+  if (vars != NULL_TREE)
+    found = true;
+
+  BIND_EXPR_VARS (return_expr) = vars;
+  for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi); tsi_next (&tsi))
+    {
+      /* Only do the adding of BIND_EXPR_VARS the first time since they are
+	 already "chained-on."  */
+      if (!found && TREE_CODE (tsi_stmt (tsi)) == DECL_EXPR)
+	{
+	  tree var = DECL_EXPR_DECL (tsi_stmt (tsi));
+	  BIND_EXPR_VARS (return_expr) = var;
+	  found = true;
+	}
+      else
+	append_to_statement_list (tsi_stmt (tsi),
+				  &BIND_EXPR_BODY (return_expr));
+    }
+  append_to_statement_list (cfor_par_list, &BIND_EXPR_BODY (return_expr));
+  return return_expr;
+}
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7681b27..0fde703 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6206,6 +6206,9 @@ extern void vtv_build_vtable_verify_fndecl      (void);
 
 /* In cp-cilkplus.c.  */
 extern bool cpp_validate_cilk_plus_loop		(tree);
+extern void copy_tree_till_cilk_for             (tree *, tree *);
+extern tree cilk_for_create_bind_expr           (tree, tree, tree);
+extern bool found_cilk_for_p                    (tree);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4673f78..576346a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -237,8 +237,8 @@ static void cp_parser_initial_pragma
 static tree cp_literal_operator_id
   (const char *);
 
-static void cp_parser_cilk_simd
-  (cp_parser *, cp_token *);
+static tree cp_parser_cilk_simd
+  (cp_parser *, cp_token *, tree);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 static tree cp_parser_cilk_simd_vectorlength 
@@ -9368,6 +9368,18 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
 
+	case RID_CILK_FOR:
+	  if (!flag_cilkplus)
+	    {
+	      error_at (cp_lexer_peek_token (parser->lexer)->location,
+			"-fcilkplus must be enabled to use %<_Cilk_for%>");
+	      cp_lexer_consume_token (parser->lexer);
+	      statement = error_mark_node;
+	    }
+	  else
+	    statement = cp_parser_cilk_simd (parser, NULL, integer_zero_node);
+	  break;
+	  
 	case RID_BREAK:
 	case RID_CONTINUE:
 	case RID_RETURN:
@@ -28836,7 +28848,7 @@ cp_parser_omp_for_cond (cp_parser *parser, tree decl, enum tree_code code)
     case LE_EXPR:
       break;
     case NE_EXPR:
-      if (code == CILK_SIMD)
+      if (code == CILK_SIMD || code == CILK_FOR)
 	break;
       /* Fall through: OpenMP disallows NE_EXPR.  */
     default:
@@ -29132,7 +29144,7 @@ cp_parser_omp_for_loop_init (cp_parser *parser,
 
 static tree
 cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
-			tree *cclauses)
+			tree *cclauses, tree *cfor_block)
 {
   tree init, cond, incr, body, decl, pre_body = NULL_TREE, ret;
   tree real_decl, initv, condv, incrv, declv;
@@ -29161,11 +29173,18 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       bool add_private_clause = false;
       location_t loc;
 
-      if (!cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
+      if (code != CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_FOR))
 	{
 	  cp_parser_error (parser, "for statement expected");
 	  return NULL;
 	}
+      if (code == CILK_FOR
+	  && !cp_lexer_next_token_is_keyword (parser->lexer, RID_CILK_FOR))
+	{
+	  cp_parser_error (parser, "_Cilk_for statement expected");
+	  return NULL;
+	}
       loc = cp_lexer_consume_token (parser->lexer)->location;
 
       if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
@@ -29174,13 +29193,26 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
       init = decl = real_decl = NULL;
       this_pre_body = push_stmt_list ();
 
+      if (code == CILK_FOR
+	  && cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "induction variable cannot be static");
+	  cp_lexer_consume_token (parser->lexer);
+	}
       add_private_clause
 	|= cp_parser_omp_for_loop_init (parser,
-					/*parsing_openmp=*/code != CILK_SIMD,
+					/*parsing_openmp=*/
+					(code != CILK_SIMD && code != CILK_FOR),
 					this_pre_body, for_block,
 					init, decl, real_decl);
 
-      cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+      if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON)
+	  && code == CILK_FOR)
+	{
+	  cp_parser_skip_to_end_of_statement (parser);
+	  cp_parser_consume_semicolon_at_end_of_statement (parser);
+	}
       if (this_pre_body)
 	{
 	  this_pre_body = pop_stmt_list (this_pre_body);
@@ -29338,7 +29370,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 
   /* Note that we saved the original contents of this flag when we entered
      the structured block, and so we don't need to re-save it here.  */
-  if (code == CILK_SIMD)
+  if (code == CILK_SIMD || code == CILK_FOR)
     parser->in_statement = IN_CILK_SIMD_FOR;
   else
     parser->in_statement = IN_OMP_FOR;
@@ -29379,7 +29411,16 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
     }
 
   while (!for_block->is_empty ())
-    add_stmt (pop_stmt_list (for_block->pop ()));
+    {
+      tree t = pop_stmt_list (for_block->pop ());
+
+      /* Remove all the statements between the head of statement list and
+	 _Cilk_for statement and store them in *cfor_block.  These statements
+	 are hoisted above the #pragma parallel.  */
+      if (!processing_template_decl && code == CILK_FOR && cfor_block != NULL)
+	copy_tree_till_cilk_for (&t, cfor_block);
+      add_stmt (t);
+    }
   release_tree_vector (for_block);
 
   return ret;
@@ -29435,7 +29476,7 @@ cp_parser_omp_simd (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_SIMD, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29523,7 +29564,7 @@ cp_parser_omp_for (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses);
+  ret = cp_parser_omp_for_loop (parser, OMP_FOR, clauses, cclauses, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -29995,7 +30036,7 @@ cp_parser_omp_distribute (cp_parser *parser, cp_token *pragma_tok,
   sb = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
 
-  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL);
+  ret = cp_parser_omp_for_loop (parser, OMP_DISTRIBUTE, clauses, NULL, NULL);
 
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (sb));
@@ -31291,6 +31332,38 @@ cp_parser_initial_pragma (cp_token *first_token)
   cp_lexer_get_preprocessor_token (NULL, first_token);
 }
 
+/* Parses the grainsize pragma for the _Cilk_for statement.
+   Syntax:
+   #pragma cilk grainsize = <VALUE>.  */
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+                                              PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+        {
+          error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+          return;
+        }
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD
+	  && n_tok->keyword == RID_CILK_FOR) 
+	cp_parser_cilk_simd (parser, NULL, exp);
+      else
+	warning_at (cp_lexer_peek_token (parser->lexer)->location, 0,
+		    "%<#pragma cilk grainsize%> is not followed by "
+		    "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -31470,9 +31543,30 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
 		    "%<#pragma simd%> must be inside a function");
 	  break;
 	}
-      cp_parser_cilk_simd (parser, pragma_tok);
+      cp_parser_cilk_simd (parser, pragma_tok, NULL_TREE);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+        {
+          error_at (pragma_tok->location,
+                    "%<#pragma cilk grainsize%> must be inside a function");
+          break;
+        }
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_cilkplus)
+        {
+          cp_parser_cilk_grainsize (parser, pragma_tok);
+          return true;
+        }
+      else
+        {
+          error_at (pragma_tok->location, "-fcilkplus must be enabled to use "
+                    "%<#pragma cilk grainsize%>");
+          break;
+	}
+
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31790,31 +31884,104 @@ cp_parser_cilk_simd_all_clauses (cp_parser *parser, cp_token *pragma_token)
     return c_finish_cilk_clauses (clauses);
 }
 
-/* Main entry-point for parsing Cilk Plus <#pragma simd> for loops.  */
+/* Main entry-point for parsing Cilk Plus <#pragma simd> for and _Cilk_for
+   loops.  This function returns NULL_TREE whenever it is parsing the
+   <#pragma simd> for because the caller does not check the return value.
+   _Cilk_for's caller checks this value and thus return error_mark_node
+   when errors happen and a valid value when things go well.  */
 
-static void
-cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
+static tree
+cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token, tree grain)
 {
-  tree clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
-
+  bool is_cilk_for = !pragma_token ? true : false;
+  
+  tree clauses = NULL_TREE;
+  if (!is_cilk_for)
+    clauses = cp_parser_cilk_simd_all_clauses (parser, pragma_token);
+  
   if (clauses == error_mark_node)
-    return;
+    return NULL_TREE;
   
-  if (cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
+  if (!is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_FOR))
     {
       error_at (cp_lexer_peek_token (parser->lexer)->location,
 		"for statement expected");
-      return;
+      return NULL_TREE;
+    }
+  if (is_cilk_for
+      && cp_lexer_next_token_is_not_keyword (parser->lexer, RID_CILK_FOR))
+    {
+      error_at (cp_lexer_peek_token (parser->lexer)->location,
+		"_Cilk_for statement expected");
+      return error_mark_node;
     }
 
+  tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+  if (is_cilk_for)
+    {
+      topmost_blk = push_stmt_list ();
+      top_block = begin_omp_parallel ();
+    }
+  
   tree sb = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
-  tree ret = cp_parser_omp_for_loop (parser, CILK_SIMD, clauses, NULL);
+   
+  enum tree_code code = is_cilk_for ? CILK_FOR : CILK_SIMD;
+  tree cfor_blk = NULL_TREE;
+  tree ret = cp_parser_omp_for_loop (parser, code, clauses, NULL, &cfor_blk);
   if (ret)
     cpp_validate_cilk_plus_loop (OMP_FOR_BODY (ret));
+  
+  /* For _Cilk_for statements, the grain value is stored in a SCHEDULE
+     clause.  */
+  if (is_cilk_for && ret)
+    {
+      tree l = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
+      OMP_CLAUSE_SCHEDULE_KIND (l) = OMP_CLAUSE_SCHEDULE_CILKFOR;
+      OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (l) = grain;
+      OMP_CLAUSE_CHAIN (l) = OMP_FOR_CLAUSES (ret);
+      OMP_FOR_CLAUSES (ret) = l;
+    }
   cp_parser_end_omp_structured_block (parser, save);
-  add_stmt (finish_omp_structured_block (sb));
-  return;
+
+  if (!is_cilk_for)
+    {
+      add_stmt (finish_omp_structured_block (sb));
+      return NULL_TREE;
+    }
+
+  tree sb_block = finish_omp_structured_block (sb);
+  tree vars = NULL_TREE, sb_blk_body = sb_block;
+
+  /* For iterators, cfor_blk holds the mapping from orginal vector 
+     iterators to the integer ones that the c_finish_omp_for remaps.
+     This info. must be pushed above the #pragma omp parallel so that
+     the IF_CLAUSE (that holds the loop-count) can use them to compute the
+     loop-count.  */
+  if (TREE_CODE (sb_block) == BIND_EXPR && cfor_blk != NULL_TREE)
+    {
+      vars = BIND_EXPR_VARS (sb_block);
+      sb_blk_body = BIND_EXPR_BODY (sb_block);
+    }
+
+  add_stmt (sb_blk_body);
+  tree parallel_clauses = NULL_TREE;
+
+  if (!processing_template_decl)
+    cilk_for_move_clauses_upward (&parallel_clauses, ret);
+  tree stmt = finish_omp_parallel (parallel_clauses, top_block);
+  OMP_PARALLEL_COMBINED (stmt) = 1;
+  topmost_blk = pop_stmt_list (topmost_blk);
+
+  if (cfor_blk != NULL_TREE)
+    {
+      tree bind_expr = cilk_for_create_bind_expr (vars, cfor_blk, topmost_blk);
+      add_stmt (bind_expr);
+      return bind_expr;
+    }
+  add_stmt (topmost_blk);
+  return topmost_blk;
 }
 
 /* Create an identifier for a generic parameter type (a synthesized
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6477fce..5c92fe5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13580,13 +13580,51 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       break;
 
     case OMP_PARALLEL:
-      tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
-				args, complain, in_decl);
-      stmt = begin_omp_parallel ();
-      RECUR (OMP_PARALLEL_BODY (t));
-      OMP_PARALLEL_COMBINED (finish_omp_parallel (tmp, stmt))
-	= OMP_PARALLEL_COMBINED (t);
-      break;
+      {
+	tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false,
+				  args, complain, in_decl);
+	
+	tree top_block = NULL_TREE, topmost_blk = NULL_TREE;
+	bool is_cilk_for = false;
+	if (flag_cilkplus && found_cilk_for_p (OMP_PARALLEL_BODY (t)))
+	  {
+	    is_cilk_for = true;
+	    topmost_blk = push_stmt_list ();
+	    top_block = begin_omp_parallel ();
+	  }
+	else
+	  stmt = begin_omp_parallel ();
+    
+	RECUR (OMP_PARALLEL_BODY (t));
+	tree cfor_blk = NULL_TREE;
+	if (is_cilk_for)
+	  {
+	    tree sb_blk_body = top_block;
+	    if (TREE_CODE (sb_blk_body) == BIND_EXPR) 
+	      sb_blk_body = BIND_EXPR_BODY (sb_blk_body);
+
+	    copy_tree_till_cilk_for (&sb_blk_body, &cfor_blk);
+	    cilk_for_move_clauses_upward (&tmp, top_block);
+	    top_block = finish_omp_parallel (tmp, sb_blk_body);
+	  }
+	else
+	  {
+	    stmt = finish_omp_parallel (tmp, stmt);
+	    OMP_PARALLEL_COMBINED (stmt) = OMP_PARALLEL_COMBINED (t);
+	  }
+	if (is_cilk_for)
+	  {
+	    OMP_PARALLEL_COMBINED (top_block) = 1;
+	    topmost_blk = pop_stmt_list (topmost_blk);
+	    if (cfor_blk != NULL_TREE) 
+	      stmt = cilk_for_create_bind_expr (NULL_TREE, cfor_blk, 
+						topmost_blk);
+	    else
+	      stmt = topmost_blk;
+	    add_stmt (stmt);
+	  }	
+      } 
+    break;
 
     case OMP_TASK:
       tmp = tsubst_omp_clauses (OMP_TASK_CLAUSES (t), false,
@@ -13599,6 +13637,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case OMP_FOR:
     case OMP_SIMD:
     case CILK_SIMD:
+    case CILK_FOR:
     case OMP_DISTRIBUTE:
       {
 	tree clauses, body, pre_body;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 6f32496..df0f736 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6059,6 +6059,7 @@ handle_omp_for_class_iterator (int i, location_t locus, tree declv, tree initv,
     case GE_EXPR:
     case LT_EXPR:
     case LE_EXPR:
+    case NE_EXPR:
       if (TREE_OPERAND (cond, 1) == iter)
 	cond = build2 (swap_tree_comparison (TREE_CODE (cond)),
 		       TREE_TYPE (cond), iter, TREE_OPERAND (cond, 0));
@@ -6471,12 +6472,33 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
   if (IS_EMPTY_STMT (pre_body))
     pre_body = NULL;
 
+  tree cf_step = NULL_TREE, cf_init = NULL_TREE, cf_end = NULL_TREE;
   omp_for = c_finish_omp_for (locus, code, declv, initv, condv, incrv,
-			      body, pre_body);
+			      body, pre_body, &cf_init, &cf_end, &cf_step);
 
   if (omp_for == NULL)
     return NULL;
 
+  if (code == CILK_FOR && !processing_template_decl)
+    {
+      tree count = fold_build2 (MINUS_EXPR, TREE_TYPE (cf_end), cf_end,
+				cf_init);
+      count = fold_build2 (CEIL_DIV_EXPR, TREE_TYPE (count), count, cf_step);
+      tree c = build_omp_clause (EXPR_LOCATION (count), OMP_CLAUSE_IF);
+      OMP_CLAUSE_IF_EXPR (c) = count;
+      clauses = chainon (clauses, c);
+      tree cfor_cond = TREE_VEC_ELT (condv, 0);
+
+      /* As of this point, the end-expression of _Cilk_for cond is not
+	 necessary.  If they are not integer then the compiler will gimplify
+	 the value and add extra computations that can add extra code.
+	 If it is replaced by a constant zero, then this will not happen.  */
+      if (TREE_CODE (TREE_OPERAND (cfor_cond, 1)) != INTEGER_CST)
+	TREE_OPERAND (cfor_cond, 1)
+	  = build_zero_cst (TREE_TYPE (TREE_OPERAND (cfor_cond, 1)));
+      TREE_VEC_ELT (condv, 0) = cfor_cond;
+    }
+
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INCR (omp_for)); i++)
     {
       decl = TREE_OPERAND (TREE_VEC_ELT (OMP_FOR_INIT (omp_for), i), 0);
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 2d1e1c7..f87c0cf 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "value-prof.h"
 #include "trans-mem.h"
 
+static void dump_gimple_omp_parallel (pretty_printer *, gimple, int, int,
+				      bool);
 #define INDENT(SPACE)							\
   do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0)
 
@@ -1124,6 +1126,10 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  kind = " distribute";
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  kind = "";
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1158,16 +1164,25 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_DISTRIBUTE:
 	  pp_string (buffer, "#pragma omp distribute");
 	  break;
+	case GF_OMP_FOR_KIND_CILKFOR:
+	  gcc_assert (flag_cilkplus);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-      dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
+      if (!flag_cilkplus
+	  || gimple_omp_for_kind (gs) != GF_OMP_FOR_KIND_CILKFOR) 
+	dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags);
       for (i = 0; i < gimple_omp_for_collapse (gs); i++)
 	{
 	  if (i)
 	    spc += 2;
 	  newline_and_indent (buffer, spc);
-	  pp_string (buffer, "for (");
+	  if (flag_cilkplus 
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
+	    pp_string (buffer, "_Cilk_for (");
+	  else
+	    pp_string (buffer, "for (");
 	  dump_generic_node (buffer, gimple_omp_for_index (gs, i), spc,
 			     flags, false);
 	  pp_string (buffer, " = ");
@@ -1192,6 +1207,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 	    case GE_EXPR:
 	      pp_greater_equal (buffer);
 	      break;
+	    case NE_EXPR:
+	      pp_string (buffer, "!=");
+	      break;
 	    default:
 	      gcc_unreachable ();
 	    }
@@ -1210,6 +1228,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags)
 
       if (!gimple_seq_empty_p (gimple_omp_body (gs)))
 	{
+	  if (flag_cilkplus
+	      && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) 
+	    dump_omp_clauses (buffer, gimple_omp_for_clauses (gs), spc, flags); 
 	  newline_and_indent (buffer, spc + 2);
 	  pp_left_brace (buffer);
 	  pp_newline (buffer);
@@ -1846,7 +1867,7 @@ dump_gimple_phi (pretty_printer *buffer, gimple phi, int spc, bool comment,
 
 static void
 dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
-                          int flags)
+                          int flags, bool is_cilk_for)
 {
   if (flags & TDF_RAW)
     {
@@ -1860,7 +1881,10 @@ dump_gimple_omp_parallel (pretty_printer *buffer, gimple gs, int spc,
   else
     {
       gimple_seq body;
-      pp_string (buffer, "#pragma omp parallel");
+      if (is_cilk_for) 
+	pp_string (buffer, "compiler-inserted clauses for cilk-for body: ");
+      else
+	pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, gimple_omp_parallel_clauses (gs), spc, flags);
       if (gimple_omp_parallel_child_fn (gs))
 	{
@@ -2137,7 +2161,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple gs, int spc, int flags)
       break;
 
     case GIMPLE_OMP_PARALLEL:
-      dump_gimple_omp_parallel (buffer, gs, spc, flags);
+      dump_gimple_omp_parallel (buffer, gs, spc, flags, false);
       break;
 
     case GIMPLE_OMP_TASK:
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0e80d2e..194045c 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -91,13 +91,14 @@ enum gf_mask {
     GF_CALL_ALLOCA_FOR_VAR	= 1 << 5,
     GF_CALL_INTERNAL		= 1 << 6,
     GF_OMP_PARALLEL_COMBINED	= 1 << 0,
-    GF_OMP_FOR_KIND_MASK	= 3 << 0,
+    GF_OMP_FOR_KIND_MASK	= 7 << 0,
     GF_OMP_FOR_KIND_FOR		= 0 << 0,
     GF_OMP_FOR_KIND_DISTRIBUTE	= 1 << 0,
     GF_OMP_FOR_KIND_SIMD	= 2 << 0,
     GF_OMP_FOR_KIND_CILKSIMD	= 3 << 0,
-    GF_OMP_FOR_COMBINED		= 1 << 2,
-    GF_OMP_FOR_COMBINED_INTO	= 1 << 3,
+    GF_OMP_FOR_KIND_CILKFOR     = 4 << 0,
+    GF_OMP_FOR_COMBINED		= 1 << 3,
+    GF_OMP_FOR_COMBINED_INTO	= 1 << 4,
     GF_OMP_TARGET_KIND_MASK	= 3 << 0,
     GF_OMP_TARGET_KIND_REGION	= 0 << 0,
     GF_OMP_TARGET_KIND_DATA	= 1 << 0,
@@ -4563,6 +4564,16 @@ gimple_omp_for_set_pre_body (gimple gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Returns the induction variable of type TREE from GS that is of type 
+   GIMPLE_STATEMENT_OMP_FOR.  */
+
+static inline tree
+gimple_cilk_for_induction_var (const_gimple gs)
+{
+  const gimple_statement_omp_for *cilk_for_stmt =
+    as_a <const gimple_statement_omp_for> (gs);
+  return cilk_for_stmt->iter->index;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ff341d4..a55e316 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5856,7 +5856,8 @@ omp_check_private (struct gimplify_omp_ctx *ctx, tree decl, bool copyprivate)
 
 static void
 gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
-			   enum omp_region_type region_type)
+			   enum omp_region_type region_type,
+			   bool is_cilk_for)
 {
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
@@ -6086,8 +6087,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	case OMP_CLAUSE_FINAL:
 	case OMP_CLAUSE_IF:
-	  OMP_CLAUSE_OPERAND (c, 0)
-	    = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
+	  /* It is not necessary to gimplify the IF clause for _Cilk_for
+	     since it is something that was artifically inserted.  */
+	  if (is_cilk_for && region_type == ORT_WORKSHARE)
+	    {
+	      remove = true;
+	      break;
+	    }
+
+	  /* In _Cilk_for we insert an IF clause as a mechanism to
+	     pass in the count information.  So, there is no reason to
+	     boolify them.  */
+	  if (!is_cilk_for) 
+	    OMP_CLAUSE_OPERAND (c, 0) 
+	      = gimple_boolify (OMP_CLAUSE_OPERAND (c, 0));
 	  /* Fall through.  */
 
 	case OMP_CLAUSE_SCHEDULE:
@@ -6096,8 +6109,13 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	case OMP_CLAUSE_THREAD_LIMIT:
 	case OMP_CLAUSE_DIST_SCHEDULE:
 	case OMP_CLAUSE_DEVICE:
-	  if (gimplify_expr (&OMP_CLAUSE_OPERAND (c, 0), pre_p, NULL,
-			     is_gimple_val, fb_rvalue) == GS_ERROR)
+	  /* Remove the SCHEDULE clause from #pragma omp parallel of a
+	     _Cilk_for statement.  */
+	  if (is_cilk_for && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE 
+	      && region_type == ORT_COMBINED_PARALLEL)
+	    remove = true;
+	  else if (gimplify_expr (&OMP_CLAUSE_OPERAND (c, 0), pre_p, NULL, 
+				  is_gimple_val, fb_rvalue) == GS_ERROR)
 	    remove = true;
 	  break;
 
@@ -6466,10 +6484,24 @@ gimplify_omp_parallel (tree *expr_p, gimple_seq *pre_p)
   gimple g;
   gimple_seq body = NULL;
 
+  bool is_cilk_for = false;
+  for (tree c = OMP_PARALLEL_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+    if (flag_cilkplus && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SCHEDULE
+	&& OMP_CLAUSE_SCHEDULE_KIND (c) == OMP_CLAUSE_SCHEDULE_CILKFOR)
+      {
+	/* The schedule clause is kept upto this point so that it can 
+	   indicate whether this #pragma omp parallel is something a 
+	   _Cilk_for statement inserted.  If so, then indicate
+	   is_cilk_for is true so that the gimplify_scan_omp_clauses does 
+	   not boolify the IF CLAUSE, which stores the count value.  */
+	gcc_assert (flag_cilkplus);
+	is_cilk_for = true;
+	break;
+      } 
   gimplify_scan_omp_clauses (&OMP_PARALLEL_CLAUSES (expr), pre_p,
 			     OMP_PARALLEL_COMBINED (expr)
 			     ? ORT_COMBINED_PARALLEL
-			     : ORT_PARALLEL);
+			     : ORT_PARALLEL, is_cilk_for);
 
   push_gimplify_context ();
 
@@ -6505,7 +6537,7 @@ gimplify_omp_task (tree *expr_p, gimple_seq *pre_p)
   gimplify_scan_omp_clauses (&OMP_TASK_CLAUSES (expr), pre_p,
 			     find_omp_clause (OMP_TASK_CLAUSES (expr),
 					      OMP_CLAUSE_UNTIED)
-			     ? ORT_UNTIED_TASK : ORT_TASK);
+			     ? ORT_UNTIED_TASK : ORT_TASK, false);
 
   push_gimplify_context ();
 
@@ -6567,11 +6599,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   bitmap has_decl_expr = NULL;
 
   orig_for_stmt = for_stmt = *expr_p;
+  bool is_cilk_for = flag_cilkplus && TREE_CODE (for_stmt) == CILK_FOR;
 
   simd = TREE_CODE (for_stmt) == OMP_SIMD
     || TREE_CODE (for_stmt) == CILK_SIMD;
   gimplify_scan_omp_clauses (&OMP_FOR_CLAUSES (for_stmt), pre_p,
-			     simd ? ORT_SIMD : ORT_WORKSHARE);
+			     simd ? ORT_SIMD : ORT_WORKSHARE, is_cilk_for);
 
   /* Handle OMP_FOR_INIT.  */
   for_pre_body = NULL;
@@ -6627,7 +6660,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       tree c = NULL_TREE;
       if (orig_for_stmt != for_stmt)
 	/* Do this only on innermost construct for combined ones.  */;
-      else if (simd)
+      else if (simd || is_cilk_for)
 	{
 	  splay_tree_node n = splay_tree_lookup (gimplify_omp_ctxp->variables,
 						 (splay_tree_key)decl);
@@ -6832,6 +6865,7 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
     case OMP_FOR: kind = GF_OMP_FOR_KIND_FOR; break;
     case OMP_SIMD: kind = GF_OMP_FOR_KIND_SIMD; break;
     case CILK_SIMD: kind = GF_OMP_FOR_KIND_CILKSIMD; break;
+    case CILK_FOR: kind = GF_OMP_FOR_KIND_CILKFOR; break;
     case OMP_DISTRIBUTE: kind = GF_OMP_FOR_KIND_DISTRIBUTE; break;
     default:
       gcc_unreachable ();
@@ -6902,7 +6936,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
     default:
       gcc_unreachable ();
     }
-  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort);
+  gimplify_scan_omp_clauses (&OMP_CLAUSES (expr), pre_p, ort, false);
   if (ort == ORT_TARGET || ort == ORT_TARGET_DATA)
     {
       push_gimplify_context ();
@@ -6962,7 +6996,7 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   gimple stmt;
 
   gimplify_scan_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr), pre_p,
-			     ORT_WORKSHARE);
+			     ORT_WORKSHARE, false);
   gimplify_adjust_omp_clauses (&OMP_TARGET_UPDATE_CLAUSES (expr));
   stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_UPDATE,
 				  OMP_TARGET_UPDATE_CLAUSES (expr));
@@ -7904,6 +7938,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	case OMP_FOR:
 	case OMP_SIMD:
 	case CILK_SIMD:
+	case CILK_FOR:
 	case OMP_DISTRIBUTE:
 	  ret = gimplify_omp_for (expr_p, pre_p);
 	  break;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 91c8656..3454dc9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -71,6 +71,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-prop.h"
 #include "tree-nested.h"
 #include "tree-eh.h"
+#include "cilk.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -198,6 +199,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO->INFO.  */
+struct cilk_for_info 
+{
+  bool found;
+  tree induction_var;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -314,6 +322,8 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
   fd->have_ordered = false;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   fd->chunk_size = NULL_TREE;
+  if (gimple_omp_for_kind (fd->for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+    fd->sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;
   collapse_iter = NULL;
   collapse_count = NULL;
 
@@ -392,7 +402,9 @@ extract_omp_for_data (gimple for_stmt, struct omp_for_data *fd,
 	  break;
 	case NE_EXPR:
 	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD);
+		      == GF_OMP_FOR_KIND_CILKSIMD
+		      || gimple_omp_for_kind (for_stmt)
+		      == GF_OMP_FOR_KIND_CILKFOR);
 	  break;
 	case LE_EXPR:
 	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
@@ -1818,27 +1830,120 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c), ctx);
 }
 
-/* Create a new name for omp child function.  Returns an identifier.  */
+/* Create a new name for omp child function.  Returns an identifier.  If 
+   IS_CILK_FOR is true then the suffix for the child function is 
+   "_cilk_for_fn."  */
 
 static tree
-create_omp_child_function_name (bool task_copy)
+create_omp_child_function_name (bool task_copy, bool is_cilk_for)
 {
+  if (is_cilk_for)
+    return clone_function_name (current_function_decl, "_cilk_for_fn");
   return (clone_function_name (current_function_decl,
 			       task_copy ? "_omp_cpyfn" : "_omp_fn"));
 }
 
+/* Helper function for walk_gimple_seq function.  *GSI_P is the gimple stmt.
+   iterator passed by walk_gimple_seq and *WI->INFO holds the CILK_FOR_INFO
+   structure.  This function sets the values inside this structure if it
+   finds a _Cilk_for statement in *GSI_P.  HANDLED_OPS_P is unused.  */
+
+static tree
+find_cilk_for_stmt (gimple_stmt_iterator *gsi_p,
+		    bool *handled_ops_p ATTRIBUTE_UNUSED,
+		    struct walk_stmt_info *wi)
+{
+  struct cilk_for_info *cf_info = (struct cilk_for_info *) wi->info;
+  gimple stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_CILKFOR)
+      /* For nested _Cilk_for statements, just look into the
+	 outer-most one.  */
+      && cf_info->found == false)
+    {
+      cf_info->found = true;
+      cf_info->induction_var = gimple_cilk_for_induction_var (stmt);
+    }
+  return NULL_TREE;
+}
+
+/* Returns true if STMT contains a CILK_FOR statement.  If found then
+   populate *IND_VAR and *LOOP_COUNT with induction variable
+   and loop-count value.  Otherwise these values remain untouched.  
+   IND_VAR and LOOP_COUNT can be NULL and if so then they are also 
+   left untouched.  */
+
+static bool
+is_cilk_for_stmt (gimple stmt, tree *ind_var)
+{
+  if (!flag_cilkplus)
+    return false;
+  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+    stmt = gimple_omp_body (stmt);
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (stmt);
+      struct walk_stmt_info wi;
+      struct cilk_for_info cf_info;
+      memset (&cf_info, 0, sizeof (struct cilk_for_info));
+      memset (&wi, 0, sizeof (wi));
+      wi.info = &cf_info;
+      walk_gimple_seq (body, find_cilk_for_stmt, NULL, &wi);
+      if (cf_info.found)
+	{
+	  if (ind_var)
+	    *ind_var = cf_info.induction_var;
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Returns the type of the induction variable for the child function for
+   _Cilk_for and the types for _high and _low variables based on TYPE.  */
+
+static tree
+cilk_for_check_loop_diff_type (tree type)
+{
+  if (type == integer_type_node)
+    return type;
+  else if (TYPE_PRECISION (type) <= TYPE_PRECISION (uint32_type_node))
+    { 
+      if (TYPE_UNSIGNED (type)) 
+	return uint32_type_node;
+      else
+	return integer_type_node;
+    }
+  else
+    {
+      if (TYPE_UNSIGNED (type)) 
+	return uint64_type_node;
+      else
+	return long_long_integer_type_node;
+    }
+  gcc_unreachable ();
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
 static void
 create_omp_child_function (omp_context *ctx, bool task_copy)
 {
-  tree decl, type, name, t;
+  tree decl, type, name, t, ind_var = NULL_TREE;
 
-  name = create_omp_child_function_name (task_copy);
+  bool is_cilk_for = is_cilk_for_stmt (ctx->stmt, &ind_var);
+  tree cilk_var_type = (is_cilk_for ?
+    cilk_for_check_loop_diff_type (TREE_TYPE (ind_var)) : NULL_TREE);
+  
+  name = create_omp_child_function_name (task_copy, is_cilk_for);
   if (task_copy)
     type = build_function_type_list (void_type_node, ptr_type_node,
 				     ptr_type_node, NULL_TREE);
+  else if (is_cilk_for)
+    type = build_function_type_list (void_type_node, ptr_type_node,
+				     cilk_var_type, cilk_var_type, NULL_TREE);
   else
     type = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
 
@@ -1888,13 +1993,44 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_CONTEXT (t) = decl;
   DECL_RESULT (decl) = t;
 
-  t = build_decl (DECL_SOURCE_LOCATION (decl),
-		  PARM_DECL, get_identifier (".omp_data_i"), ptr_type_node);
+  /* _Cilk_for's child function requires two extra parameters called 
+     __low and __high that are set the by Cilk runtime when it calls this 
+     function.  */
+  if (is_cilk_for)
+    {
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__high"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+
+      t = build_decl (DECL_SOURCE_LOCATION (decl),
+		      PARM_DECL, get_identifier ("__low"), cilk_var_type);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_ARG_TYPE (t) = ptr_type_node;
+      DECL_CONTEXT (t) = current_function_decl;
+      TREE_USED (t) = 1;
+      TREE_ADDRESSABLE (t) = 1;
+      DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
+      DECL_ARGUMENTS (decl) = t;
+    }
+
+  tree data_name = get_identifier (".omp_data_i");
+  t = build_decl (DECL_SOURCE_LOCATION (decl), PARM_DECL, data_name,
+		  ptr_type_node);
   DECL_ARTIFICIAL (t) = 1;
   DECL_NAMELESS (t) = 1;
   DECL_ARG_TYPE (t) = ptr_type_node;
   DECL_CONTEXT (t) = current_function_decl;
   TREE_USED (t) = 1;
+  if (is_cilk_for)
+    DECL_CHAIN (t) = DECL_ARGUMENTS (decl);
   DECL_ARGUMENTS (decl) = t;
   if (!task_copy)
     ctx->receiver_decl = t;
@@ -4313,6 +4449,44 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
 			    false, GSI_CONTINUE_LINKING);
 }
 
+/* Insert a function call whose name is FUNC_NAME with the information from
+   ENTRY_STMT into the basic_block BB.  */
+
+static void
+expand_cilk_for_call (basic_block bb, gimple entry_stmt,
+		      vec <tree, va_gc> *ws_args)
+{
+  tree t, t1, t2;
+  gimple_stmt_iterator gsi;
+  vec <tree, va_gc> *args;
+
+  gcc_assert (vec_safe_length (ws_args) == 2);
+  tree func_name = (*ws_args)[0];
+  tree grain = (*ws_args)[1];
+
+  tree clauses = gimple_omp_parallel_clauses (entry_stmt); 
+  tree count = find_omp_clause (clauses, OMP_CLAUSE_IF);
+  gcc_assert (count != NULL_TREE);
+  count = OMP_CLAUSE_IF_EXPR (count);
+  
+  gsi = gsi_last_bb (bb);
+  t = gimple_omp_parallel_data_arg (entry_stmt);
+  if (t == NULL)
+    t1 = null_pointer_node;
+  else
+    t1 = build_fold_addr_expr (t);
+  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  
+  vec_alloc (args, 4);
+  args->quick_push (t2);
+  args->quick_push (t1);
+  args->quick_push (count);
+  args->quick_push (grain);
+  t = build_call_expr_loc_vec (UNKNOWN_LOCATION, func_name, args);
+
+  force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, false, 
+			    GSI_CONTINUE_LINKING);
+}
 
 /* Build the function call to GOMP_task to actually
    generate the task operation.  BB is the block where to insert the code.  */
@@ -4648,7 +4822,38 @@ expand_omp_taskreg (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (is_combined_parallel (region))
+  /* The way _Cilk_for is constructed in this compiler can be thought of
+     as a parallel omp_for.  But the inner workings between them are very
+     different so we need a way to differenciate between them.  Thus, we
+     added a new schedule type called OMP_CLAUSE_SCHEDULE_CILKFOR, which 
+     pretty much says that this is not a parallel omp for but a _Cilk_for
+     statement.  */
+  bool is_cilk_for =
+    (flag_cilkplus && region->inner &&
+     (region->inner->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR));
+
+  /* Extract the __high and __low parameter from the function.  */
+  tree high_arg = NULL_TREE, low_arg = NULL_TREE;
+  if (is_cilk_for)
+    {
+      for (tree ii_arg = DECL_ARGUMENTS (child_fn); ii_arg != NULL_TREE;
+	   ii_arg = TREE_CHAIN (ii_arg))
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__high"))
+	    high_arg = ii_arg;
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (ii_arg)), "__low"))
+	    low_arg = ii_arg;
+	}
+      gcc_assert (high_arg);
+      gcc_assert (low_arg);
+    }
+  
+  if (is_cilk_for) 
+    /* If it is a _Cilk_for statement, it is modelled *like* a parallel for,
+       and the inner statement contains the name of the built-in function
+       and grain.  */
+    ws_args = region->inner->ws_args;
+  else if (is_combined_parallel (region))
     ws_args = region->ws_args;
   else
     ws_args = NULL;
@@ -4755,6 +4960,49 @@ expand_omp_taskreg (struct omp_region *region)
 	    }
 	}
 
+      /* In here the calls to the GET_NUM_THREADS and GET_THREAD_NUM are
+	 removed.  Further, they will be replaced by __low and __high
+	 parameter values.  */
+      gimple high_assign = NULL, low_assign = NULL;
+      if (is_cilk_for)
+	{
+	  gimple_stmt_iterator gsi2 = gsi_start_bb (single_succ (entry_bb));
+	  while (!gsi_end_p (gsi2))
+	    {
+	      gimple stmt = gsi_stmt (gsi2);
+	
+	      if (gimple_call_builtin_p (stmt, BUILT_IN_OMP_GET_NUM_THREADS))
+		{
+		  /* There can only be one one call to these two functions
+		     If there are multiple, then something went wrong
+		     somewhere.  */
+		  gcc_assert (low_assign == NULL);
+		  tree ltype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (low_arg), NULL);
+		  low_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (ltype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, low_arg);
+		  gsi_insert_before (&gsi2, low_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      else if (gimple_call_builtin_p (stmt,
+					      BUILT_IN_OMP_GET_THREAD_NUM))
+		{
+		  gcc_assert (high_assign == NULL);
+		  tree htype = TREE_TYPE (gimple_get_lhs (stmt));
+		  tree tmp2 = create_tmp_reg (TREE_TYPE (high_arg), NULL);
+		  
+		  high_assign = gimple_build_assign 
+		    (gimple_get_lhs (stmt), fold_convert (htype, tmp2));
+		  gsi_remove (&gsi2, true);
+		  gimple tmp_stmt = gimple_build_assign (tmp2, high_arg);
+		  gsi_insert_before (&gsi2, high_assign, GSI_NEW_STMT);
+		  gsi_insert_before (&gsi2, tmp_stmt, GSI_NEW_STMT);
+		}
+	      gsi_next (&gsi2);
+	    }
+	}      
       /* Declare local variables needed in CHILD_CFUN.  */
       block = DECL_INITIAL (child_fn);
       BLOCK_VARS (block) = vec2chain (child_cfun->local_decls);
@@ -4862,7 +5110,9 @@ expand_omp_taskreg (struct omp_region *region)
     }
 
   /* Emit a library call to launch the children threads.  */
-  if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
+  if (is_cilk_for)
+    expand_cilk_for_call (new_bb, entry_stmt, ws_args);
+  else if (gimple_code (entry_stmt) == GIMPLE_OMP_PARALLEL)
     expand_parallel_call (region, new_bb, entry_stmt, ws_args);
   else
     expand_task_call (new_bb, entry_stmt);
@@ -6540,6 +6790,227 @@ expand_omp_for_static_chunk (struct omp_region *region,
     }
 }
 
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.  
+   Given parameters: 
+   for (V = N1; V cond N2; V += STEP) BODY; 
+   
+   where COND is "<" or ">", we generate pseudocode
+    
+   for (ind_var = low; ind_var < high; ind_var++)
+   {  
+      if (n1 < n2)
+	V = n1 + (ind_var * STEP)
+      else
+        V = n2 - (ind_var * STEP);
+
+      <BODY>
+    }  
+  
+    In the above pseudocode, low and high are function parameters of the
+    child function.  In the function below, we are inserting a temp.
+    variable that will be making a call to two OMP functions that will not be
+    found in the body of _Cilk_for (since OMP_FOR cannot be mixed 
+    with _Cilk_for).  These functions are replaced with low and high 
+    by the function that handleds taskreg.  */
+
+
+static void
+expand_cilk_for (struct omp_region *region, struct omp_for_data *fd)
+{
+  bool broken_loop = region->cont == NULL;
+  tree type = cilk_for_check_loop_diff_type (TREE_TYPE (fd->loop.v));
+  basic_block entry_bb = region->entry;
+  basic_block cont_bb = region->cont;
+  
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (broken_loop
+	      || BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  basic_block l0_bb = FALLTHRU_EDGE (entry_bb)->dest;
+  basic_block l1_bb, l2_bb;
+
+  if (!broken_loop)
+    {
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == l0_bb);
+      gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+      l1_bb = split_block (cont_bb, last_stmt (cont_bb))->dest;
+      l2_bb = BRANCH_EDGE (entry_bb)->dest;
+    }
+  else
+    {
+      BRANCH_EDGE (entry_bb)->flags &= ~EDGE_ABNORMAL;
+      l1_bb = split_edge (BRANCH_EDGE (entry_bb));
+      l2_bb = single_succ (l1_bb);
+    }
+  basic_block exit_bb = region->exit;
+  basic_block l2_dom_bb = NULL;
+
+  gimple_stmt_iterator gsi = gsi_last_bb (entry_bb);
+
+  /* Below statements until the "tree high_val = ..." are pseudo statements 
+     used to pass information to be used by expand_omp_taskreg.
+     low_val and high_val will be replaced by the __low and __high
+     parameter from the child function.
+
+     The call_exprs part is a place-holder, it is mainly used 
+     to distinctly identify to the top-level part that this is
+     where we should put low and high (reasoning given in header 
+     comment).  */
+
+  tree t = build_call_expr
+    (builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS), 0);
+  t = fold_convert (type, t);
+  tree low_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+  t = build_call_expr (builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM),
+		       0);
+  t = fold_convert (type, t);
+  tree high_val = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true,
+					   GSI_SAME_STMT);
+
+  tree ind_var = create_tmp_reg (type, "__cilk_ind_var");
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+  
+  /* Not needed in SSA form right now.  */
+  gcc_assert (!gimple_in_ssa_p (cfun));
+  if (l2_dom_bb == NULL)
+    l2_dom_bb = l1_bb;
+
+  tree n1 = low_val;
+  tree n2 = high_val;
+  
+  expand_omp_build_assign (&gsi, ind_var, n1);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  gimple stmt;
+  if (!broken_loop)
+    {
+      /* Code to control the increment goes in the CONT_BB.  */
+      gsi = gsi_last_bb (cont_bb);
+      stmt = gsi_stmt (gsi);
+      gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+      enum tree_code code = PLUS_EXPR;
+      if (POINTER_TYPE_P (type))
+	t = fold_build_pointer_plus (ind_var, build_one_cst (type)); 
+      else
+	t = fold_build2 (code, type, ind_var, build_one_cst (type));
+      expand_omp_build_assign (&gsi, ind_var, t);
+
+      /* Remove GIMPLE_OMP_CONTINUE.  */
+      gsi_remove (&gsi, true);
+    }
+
+  /* Emit the condition in L1_BB.  */
+  gsi = gsi_start_bb (l1_bb);
+
+  tree step = fold_convert (type, fd->loop.step);
+  if ((TREE_CODE (step) == INTEGER_CST && tree_int_cst_sgn (step) < 1)) 
+    step = fold_build1_loc (UNKNOWN_LOCATION, NEGATE_EXPR, type, step);
+
+  tree step_var = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (step_var, 
+					       fold_convert (type, step)), 
+		    GSI_NEW_STMT);
+  t = build2 (MULT_EXPR, type, ind_var, step_var);
+  tree tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  tree tmp2 = create_tmp_reg (type, NULL);
+  tree cvtd = fold_convert (type, fd->loop.n1);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp2, cvtd), GSI_NEW_STMT);
+  
+  if (fd->loop.cond_code == GE_EXPR || fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (MINUS_EXPR, type, tmp2, tmp);
+ else
+   t = fold_build2 (PLUS_EXPR, type, tmp2, tmp);
+
+  tmp = create_tmp_reg (type, NULL);
+  gsi_insert_after (&gsi, gimple_build_assign (tmp, t), GSI_NEW_STMT);
+
+  cvtd = fold_convert (TREE_TYPE (fd->loop.v), tmp);
+  gsi_insert_after (&gsi, gimple_build_assign (fd->loop.v, cvtd), 
+		    GSI_NEW_STMT);
+  
+  t = fold_convert (type, n2);
+  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  /* The condition is always '<' since the runtime will fill in the low
+     and high values.  */
+  t = build2 (LT_EXPR, boolean_type_node, ind_var, t);
+  stmt = gimple_build_cond_empty (t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (walk_tree (gimple_cond_lhs_ptr (stmt), expand_omp_regimplify_p,
+		 NULL, NULL)
+      || walk_tree (gimple_cond_rhs_ptr (stmt), expand_omp_regimplify_p,
+		    NULL, NULL))
+    {
+      gsi = gsi_for_stmt (stmt);
+      gimple_regimplify_operands (stmt, &gsi);
+    }
+
+  /* Remove GIMPLE_OMP_RETURN.  */
+  gsi = gsi_last_bb (exit_bb);
+  gsi_remove (&gsi, true);
+
+  /* Connect the new blocks.  */
+  remove_edge (FALLTHRU_EDGE (entry_bb));
+
+  edge e, ne;
+  if (!broken_loop)
+    {
+      remove_edge (BRANCH_EDGE (entry_bb));
+      make_edge (entry_bb, l1_bb, EDGE_FALLTHRU);
+
+      e = BRANCH_EDGE (l1_bb);
+      ne = FALLTHRU_EDGE (l1_bb);
+      e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU;
+
+      ne = single_succ_edge (l1_bb);
+      e = make_edge (l1_bb, l0_bb, EDGE_TRUE_VALUE);
+
+    }
+  ne->flags = EDGE_FALSE_VALUE;
+  e->probability = REG_BR_PROB_BASE * 7 / 8;
+  ne->probability = REG_BR_PROB_BASE / 8;
+
+  set_immediate_dominator (CDI_DOMINATORS, l1_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l2_bb, l2_dom_bb);
+  set_immediate_dominator (CDI_DOMINATORS, l0_bb, l1_bb);
+
+  if (!broken_loop)
+    {
+      struct loop *loop = alloc_loop ();
+      loop->header = l1_bb;
+      loop->latch = cont_bb;
+      add_loop (loop, l1_bb->loop_father);
+      loop->safelen = INT_MAX;
+    }
+
+  /* Pick the correct library function based on the precision of the
+     induction variable type.  */
+  tree lib_fun = NULL_TREE;
+  if (TYPE_PRECISION (type) == 32)
+    lib_fun = cilk_for_32_fndecl;
+  else if (TYPE_PRECISION (type) == 64)
+    lib_fun = cilk_for_64_fndecl;
+  else
+    gcc_unreachable ();
+
+  gcc_assert (fd->sched_kind == OMP_CLAUSE_SCHEDULE_CILKFOR);
+  
+  /* WS_ARGS contains the library function flavor to call: 
+     __libcilkrts_cilk_for_64 or __libcilkrts_cilk_for_32), and the
+     user-defined grain value.   If the user does not define one, then zero
+     is passed in by the parser.  */
+  vec_alloc (region->ws_args, 2);
+  region->ws_args->quick_push (lib_fun);
+  region->ws_args->quick_push (fd->chunk_size);
+}
 
 /* A subroutine of expand_omp_for.  Generate code for a simd non-worksharing
    loop.  Given parameters:
@@ -6880,6 +7351,8 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 
   if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_SIMD)
     expand_omp_simd (region, &fd);
+  else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_KIND_CILKFOR)
+    expand_cilk_for (region, &fd);
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
 	   && !fd.have_ordered)
     {
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
new file mode 100644
index 0000000..8b6112b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk-fors.c
@@ -0,0 +1,87 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+static void check (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start;  ii < end; ii = ii + incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+static void check_reverse (int *Array, int start, int end, int incr, int value)
+{
+  int ii = 0;
+  for (ii = start; ii >= end; ii = ii - incr)
+    if (Array[ii] != value)
+      __builtin_abort ();
+#if HAVE_IO
+  printf ("Passed\n");
+#endif
+}
+
+
+int main (void)
+{
+  int Array[10];
+  int x = 9, y = 0, z = 3;
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 1133;
+  check (Array, 0, 10, 1, 1133);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 9; ii > -1; ii--)
+    Array[ii] = 4433;
+  check_reverse (Array, 9, 0, 1, 4433);
+
+  _Cilk_for (int ii = 9; ii > -1; --ii)
+    Array[ii] = 9988;
+  check_reverse (Array, 9, 0, 1, 9988);
+
+  _Cilk_for (int ii = 0; ii < 10; ++ii)
+    Array[ii] = 3311;
+  check (Array, 0, 10, 1, 3311);
+
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    Array[ii] = 1328;
+  check (Array, 0, 10, 2, 1328);
+
+  _Cilk_for (int ii = 9; ii >= 0; ii -= 2)
+    Array[ii] = 1738;
+  check_reverse (Array, 9, 0, 2, 1738);
+
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if (ii % 2)
+	Array[ii] = 1343;
+      else
+	Array[ii] = 3413;
+    }
+
+  check (Array, 1, 10, 2, 1343); 
+  check (Array, 0, 10, 2, 3413); 
+
+  _Cilk_for (short cc = 0; cc < 10; cc++) 
+    Array[cc] = 1343;
+  check (Array, 0, 10,  1,1343);
+
+  _Cilk_for (short cc = 9; cc >= 0; cc--)
+    Array[cc] = 1348;
+  check_reverse (Array, 9, 0, 1, 1348);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
new file mode 100644
index 0000000..ed73c34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_errors.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+int main (void)
+{
+  int q = 0, ii = 0, jj = 0;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */
+    /* { dg-error "expected" "" { target c++ } 10 } */
+    q = 5;
+
+  _Cilk_for (; ii < 10; ii++) /* { dg-error "expected iteration declaration" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ; ii++) /* { dg-error "missing controlling predicate" } */
+    q = 2;
+
+  _Cilk_for (int ii = 0; ii < 10, jj < 10; ii++)  /* { dg-error "expected ';' before ',' token" "" { target c } } */
+    /* { dg-error "invalid controlling predicate" "" { target c++ }  20 } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ) /* { dg-error "missing increment" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0, jj = 0; ii < 10; ii++) /* { dg-error "expected" } */ 
+    q = 5;
+
+  _Cilk_for (volatile int vii = 0; vii < 10; vii++) /* { dg-error "iteration variable cannot be volatile" } */
+    q = 5;
+
+ 
+  _Cilk_for (static int sii = 0; sii < 10; sii++) /* { dg-error "static" } */
+
+    q = 5;
+
+
+  _Cilk_for (float fii = 3.47; fii < 5.23; fii++) /* { dg-error "invalid type for iteration variable" } */
+    q = 5;
+
+
+  _Cilk_for (int ii = 0; 10 > jj; ii++) /* { dg-error "invalid controlling predicate" } */
+    q = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii >> 1) /* { dg-error "invalid increment expression" } */
+    q = 5;
+
+  _Cilk_for (int ii = 10; ii >= 0; ii--) /* This is OK!  */
+    q = 5;
+
+  _Cilk_for (int ii; ii < 10; ii++) /* { dg-error "is not initialized" "" { target c } } */ 
+    /* { dg-error "expected" "" { target c++ }  53 } */
+    q = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
new file mode 100644
index 0000000..6cb9b03
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain.c
@@ -0,0 +1,35 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int grain_value = 2;
+int main (void)
+{
+  int Array1[200], Array1_Serial[200];
+
+  for (int ii = 0; ii < 200; ii++)
+    {
+      Array1_Serial[ii] = 2;
+      Array1[ii] = 1;
+    }
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 200; ii++)
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+#pragma cilk grainsize = grain_value
+  _Cilk_for (int ii = 0; ii < 200; ii++) 
+    Array1[ii] = 2;
+
+  for (int ii = 0; ii < 200; ii++)
+    if (Array1[ii] != Array1_Serial[ii])
+      return (ii+1);
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
new file mode 100644
index 0000000..e1e3217
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+/* { dg-additional-options "-std=c99" { target c } } */
+
+
+char Array1[26];
+
+#pragma cilk grainsize = 2 /* { dg-error "must be inside a function" } */
+
+int main(int argc, char **argv)
+{
+/* This is OK.  */
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize 2 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsiz = 2 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+
+/* This is OK, it will do a type conversion to long int.  */
+#pragma cilk grainsize = 0.5 
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array1[ii] = 0;
+
+#pragma cilk grainsize = 1 
+  while (Array1[5] != 0) /* { dg-warning "is not followed by" } */
+    {
+    /* Blah */
+    }
+
+#pragma cilk grainsize = 1 
+  int q = 0; /* { dg-warning "is not followed by" } */
+  _Cilk_for (q = 0; q < 10; q++)
+    Array1[q]  = 5;
+
+  while (Array1[5] != 0)
+    {
+    /* Blah */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
new file mode 100644
index 0000000..7a779f7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c
@@ -0,0 +1,41 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+/* <feature> loop control variable must have integer, pointer or class type
+   </feature>
+*/
+
+#define ARRAY_SIZE 10000
+int a[ARRAY_SIZE];
+
+int main(void)
+{ 
+  int ii = 0;
+
+#if 1
+  for (ii =0; ii < ARRAY_SIZE; ii++)
+    a[ii] = 5;
+#endif
+  _Cilk_for(int *aa = a; aa < a + ARRAY_SIZE; aa++) 
+    *aa = 0;
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii++) 
+    if (a[ii] != 0) 
+      __builtin_abort ();
+#endif
+
+  _Cilk_for (int *aa = a; aa < a + ARRAY_SIZE; aa = aa + 2)
+    *aa = 4;
+
+#if 1
+  for (ii = 0; ii < ARRAY_SIZE; ii = ii + 2) 
+    if (a[ii] != 4) 
+      __builtin_abort ();
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
new file mode 100644
index 0000000..cffe17e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/nested_cilk_for.c
@@ -0,0 +1,79 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-std=gnu99"  { target c } } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <stdio.h>
+#endif
+
+int main (void)
+{
+  int Array[10][10];
+
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj++)
+	{
+	  Array[ii][jj] = 0;
+	}
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 5; jj++)
+      Array[ii][jj] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 5; jj++)
+      if (Array[ii][jj] != 5)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+
+  /* One goes up and one goes down.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 9; jj >= 0; jj--)
+      Array[ii][jj] = 7;
+
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 9; jj >= 0; jj--)
+      if (Array[ii][jj] != 7)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    _Cilk_for (int jj = 0; jj < 10; jj += 2)
+      Array[ii][jj] = 9;
+  
+  for (int ii = 0; ii < 10; ii++)
+    for (int jj = 0; jj < 10; jj += 2)
+      if (Array[ii][jj] != 9)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  /* different step sizes.  */
+  _Cilk_for (int ii = 0; ii < 10; ii += 2)
+    _Cilk_for (int jj = 5; jj < 9; jj++)
+      Array[ii][jj] = 11; 
+  
+  for (int ii = 0; ii < 10; ii += 2)
+    for (int jj = 5; jj < 9; jj++)
+      if (Array[ii][jj] != 11)
+#if HAVE_IO
+	printf("Array[%d][%d] = %d\n", ii, jj, Array[ii][jj]);
+#else
+	__builtin_abort ();
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
new file mode 100644
index 0000000..8d88c5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cf3.cc
@@ -0,0 +1,96 @@
+/* { dg-options "-fcilkplus" } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+template <typename T>
+void baz (I<T> &i);
+
+void
+foo (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i < j.end (); i += 2)
+    baz (i);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
new file mode 100644
index 0000000..78b8cf1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/for1.cc
@@ -0,0 +1,378 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-additional-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#if HAVE_IO
+#include <cstdio>
+#endif
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+extern "C" void abort ();
+
+template <typename T>
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T &operator * ();
+  T *operator -> ();
+  T &operator [] (const difference_type &) const;
+  I &operator = (const I &);
+  I &operator ++ ();
+  I operator ++ (int);
+  I &operator -- ();
+  I operator -- (int);
+  I &operator += (const difference_type &);
+  I &operator -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template <typename S> friend bool operator == (I<S> &, I<S> &);
+  template <typename S> friend bool operator == (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator < (I<S> &, I<S> &);
+  template <typename S> friend bool operator < (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator <= (I<S> &, I<S> &);
+  template <typename S> friend bool operator <= (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator > (I<S> &, I<S> &);
+  template <typename S> friend bool operator > (const I<S> &, const I<S> &);
+  template <typename S> friend bool operator >= (I<S> &, I<S> &);
+  template <typename S> friend bool operator >= (const I<S> &, const I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (I<S> &, I<S> &);
+  template <typename S> friend typename I<S>::difference_type operator - (const I<S> &, const I<S> &);
+  template <typename S> friend I<S> operator + (typename I<S>::difference_type , const I<S> &);
+private:
+  T *p;
+};
+template <typename T> I<T>::I () : p (0) {}
+template <typename T> I<T>::~I () {}
+template <typename T> I<T>::I (T *x) : p (x) {}
+template <typename T> I<T>::I (const I &x) : p (x.p) {}
+template <typename T> T &I<T>::operator * () { return *p; }
+template <typename T> T *I<T>::operator -> () { return p; }
+template <typename T> T &I<T>::operator [] (const difference_type &x) const { return p[x]; }
+template <typename T> I<T> &I<T>::operator = (const I &x) { p = x.p; return *this; }
+template <typename T> I<T> &I<T>::operator ++ () { ++p; return *this; }
+template <typename T> I<T> I<T>::operator ++ (int) { return I (p++); }
+template <typename T> I<T> &I<T>::operator -- () { --p; return *this; }
+template <typename T> I<T> I<T>::operator -- (int) { return I (p--); }
+template <typename T> I<T> &I<T>::operator += (const difference_type &x) { p += x; return *this; }
+template <typename T> I<T> &I<T>::operator -= (const difference_type &x) { p -= x; return *this; }
+template <typename T> I<T> I<T>::operator + (const difference_type &x) const { return I (p + x); }
+template <typename T> I<T> I<T>::operator - (const difference_type &x) const { return I (p - x); }
+template <typename T> bool operator == (I<T> &x, I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator == (const I<T> &x, const I<T> &y) { return x.p == y.p; }
+template <typename T> bool operator != (I<T> &x, I<T> &y) { return !(x == y); }
+template <typename T> bool operator != (const I<T> &x, const I<T> &y) { return !(x == y); }
+template <typename T> bool operator < (I<T> &x, I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator < (const I<T> &x, const I<T> &y) { return x.p < y.p; }
+template <typename T> bool operator <= (I<T> &x, I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator <= (const I<T> &x, const I<T> &y) { return x.p <= y.p; }
+template <typename T> bool operator > (I<T> &x, I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator > (const I<T> &x, const I<T> &y) { return x.p > y.p; }
+template <typename T> bool operator >= (I<T> &x, I<T> &y) { return x.p >= y.p; }
+template <typename T> bool operator >= (const I<T> &x, const I<T> &y) { return x.p >= y.p; }
+template <typename T> typename I<T>::difference_type operator - (I<T> &x, I<T> &y) { return x.p - y.p; }
+template <typename T> typename I<T>::difference_type operator - (const I<T> &x, const I<T> &y) { return x.p - y.p; }
+template <typename T> I<T> operator + (typename I<T>::difference_type x, const I<T> &y) { return I<T> (x + y.p); }
+
+template <typename T>
+class J
+{
+public:
+  J(const I<T> &x, const I<T> &y) : b (x), e (y) {}
+  const I<T> &begin ();
+  const I<T> &end ();
+private:
+  I<T> b, e;
+};
+
+template <typename T> const I<T> &J<T>::begin () { return b; }
+template <typename T> const I<T> &J<T>::end () { return e; }
+
+int results[2000];
+
+template <typename T>
+void
+baz (I<T> &i)
+{
+  if (*i < 0 || *i >= 2000)
+    {
+#if HAVE_IO
+      printf ("*i(%d) is < 0 or >= 2000\n", *i);
+      fflush (stdout);
+#endif
+     __builtin_abort ();
+    }
+  else 
+    results[*i]++;
+}
+
+void
+f1 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 6)
+    { 
+      baz (i);
+    }
+
+#if HAVE_IO
+  printf("===== Starting F1 =========\n");
+  for (I<int> i = x; i <= y; i+= 6) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]); 
+    fflush (stdout);
+  }
+#endif
+}
+
+void
+f2 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i < y - 1; i += 2) 
+    baz (i);
+
+#if HAVE_IO
+  printf("===== Starting F2 =========\n");
+  for (int ii = 0; ii < 1998; ii += 2) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f3 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x; i <= y; i += 1)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (int ii = 20; ii < 1987; ii += 1) {
+    printf("Result[%4d] = %2d\n", ii, results[ii]);
+    fflush (stdout);
+  }
+
+#endif
+}
+
+template <typename T>
+void
+f4 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + (2000 - 64); i > y + 10; --i)
+    baz (i);
+#if HAVE_IO
+  printf("===== Starting F3 =========\n");
+  for (I<int> i = x + (2000 - 64); i > y + 10; --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+void
+f5 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <int N>
+void
+f6 (const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I<int> i = x + 2000 - 64; i > y + 10; i -= 10)
+    {
+      I<int> j = i + N;
+      baz (j);
+    }
+#if HAVE_IO
+  for (I<int> i = x + 2000 - 64; i > y + 10; i = i - 12 + 2)
+    {
+      I<int> j = i + N;
+      printf("Result[%4d] = %2d\n", *j, results[*j]);
+      fflush (stdout);
+    }
+#endif
+}
+template <int N>
+void
+f7 (I<int> ii, const I<int> &x, const I<int> &y)
+{
+  _Cilk_for (I <int> i = x - 10; i <= y + 10; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = x - 10; i <= y + 10; i += N) 
+    {
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <int N>
+void
+f8 (J<int> j)
+{
+  _Cilk_for (I<int> i = j.begin (); i <= j.end () + N; i += 2)
+    baz (i);
+#if HAVE_IO
+  for (I<int> i = j.begin (); i <= j.end () + N; i += 2) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+
+}
+
+template <typename T, int N>
+void
+f9 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i <= y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i <= y;  i  = i + N)
+    { 
+      printf("Result[%4d] = %2d\n", *i, results[*i]);
+      fflush (stdout);
+    }
+#endif
+}
+
+template <typename T, int N>
+void
+f10 (const I<T> &x, const I<T> &y)
+{
+  _Cilk_for (I<T> i = x; i > y; i += N)
+    baz (i);
+#if HAVE_IO
+  for (I<T> i = x; i > y;  i  = i + N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+
+template <typename T>
+void
+f11 (const T &x, const T &y)
+{
+    _Cilk_for (T i = x; i <= y; i += 3)
+      baz (i);
+
+#if HAVE_IO
+  for (T i = x; i <= y;  i  += 3) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+      T j = y + 3;
+      baz (j);
+
+}
+
+template <typename T>
+void
+f12 (const T &x, const T &y)
+{
+  _Cilk_for (T i = x; i > y; --i)
+    baz (i);
+#if HAVE_IO
+  for (T i = x; i > y;  --i) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+}
+template <int N>
+struct K
+{
+  template <typename T>
+  static void
+  f13 (const T &x, const T &y)
+  {
+    _Cilk_for (T i = x; i <= y + N; i += N)
+      baz (i);
+#if HAVE_IO
+  for (T i = x; i < y+N;  i += N) {
+    printf("Result[%4d] = %2d\n", *i, results[*i]);
+    fflush (stdout);
+  }
+#endif
+  }
+};
+
+#define check(expr) \
+  for (int i = 0; i < 2000; i++)			\
+    if (expr)						\
+      {							\
+	if (results[i] != 1)				\
+	  __builtin_abort ();				\
+	results[i] = 0;					\
+      }							\
+    else if (results[i])				\
+      abort ()
+
+int
+main ()
+{
+  int a[2000];
+  long b[2000];
+  for (int i = 0; i < 2000; i++)
+    {
+      a[i] = i;
+      b[i] = i;
+    }
+  f1 (&a[10], &a[1990]);
+  check (i >= 10 && i <= 1990 && (i - 10) % 6 == 0);
+  f2 (&a[0], &a[1999]);
+  check (i < 1998 && (i & 1) == 0);
+  f3<int> (&a[20], &a[1837]);
+  check (i >= 20 && i <= 1837);
+  f4<int> (&a[0], &a[30]);
+  check (i > 40 && i <= 2000 - 64);
+
+  /* f5 and f6 calls below are invalid since it will do a wrapround.
+     If this can be caught during compile time (i.e. the values are constant) 
+     then the compiler will emit errors.  */
+  //f5 (&a[0], &a[100]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+  //f6<-10> (&a[10], &a[110]);
+  //check (i >= 116 && i <= 2000 - 64 && (i - 116) % 10 == 0);
+
+  f7<6> (I<int> (), &a[12], &a[1800]);
+  check (i >= 2 && i <= 1808 && (i - 2) % 6 == 0);
+
+  f8<121> (J<int> (&a[14], &a[1803]));
+  check (i >= 14 && i <= 1924 && (i & 1) == 0);
+  f9<int, 7> (&a[33], &a[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<int, -7> (&a[1939], &a[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<int> > (&a[16], &a[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<int> > (&a[1761], &a[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<int> > (&a[1], &a[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  f9<long, 7> (&b[33], &b[1967]);
+  check (i >= 33 && i <= 1967 && (i - 33) % 7 == 0);
+  f10<long, -7> (&b[1939], &b[17]);
+  check (i >= 21 && i <= 1939 && (i - 21) % 7 == 0);
+  f11<I<long> > (&b[16], &b[1981]);
+  check (i >= 16 && i <= 1984 && (i - 16) % 3 == 0);
+  f12<I<long> > (&b[1761], &b[37]);
+  check (i > 37 && i <= 1761);
+  K<5>::f13<I<long> > (&b[1], &b[1935]);
+  check (i >= 1 && i <= 1936 && (i - 1) % 5 == 0);
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..2ac8c72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,52 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end();
+          iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+for (vector<int>::iterator iter = array_serial.begin(); 
+     iter != array_serial.end(); iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
new file mode 100644
index 0000000..1cf3301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_rev_iter.cc
@@ -0,0 +1,72 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+
+int main(void)
+{
+vector <int> array,array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back(ii);
+}
+#endif
+_Cilk_for (vector<int>::reverse_iterator iter4 = array.rbegin(); 
+	   iter4 != array.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+
+_Cilk_for (vector<int>::reverse_iterator iter4 = array_serial.rbegin(); 
+	   iter4 != array_serial.rend(); iter4++)
+{
+  if (*iter4 == 0x8) {
+    *iter4 = 9;
+  }
+}
+_Cilk_for (vector<int>::reverse_iterator iter2 = array.rbegin(); 
+	   iter2 != array.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+for (vector<int>::reverse_iterator iter2 = array_serial.rbegin(); 
+     iter2 != array_serial.rend();
+          iter2 += 1) 
+{
+   if ((*iter2 == 0x4) || (*iter2 == 0x7)) {
+    *iter2 = 0x3;
+   }
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+while (iter != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter != *iter_serial)
+    __builtin_abort ();
+  iter++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..8d2e61e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index e548a0d..d8c14e3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -351,6 +351,7 @@ enum omp_clause_schedule_kind {
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
   OMP_CLAUSE_SCHEDULE_RUNTIME,
+  OMP_CLAUSE_SCHEDULE_CILKFOR,
   OMP_CLAUSE_SCHEDULE_LAST
 };
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 0595499..91efd9f 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -411,6 +411,9 @@ dump_omp_clause (pretty_printer *buffer, tree clause, int spc, int flags)
 	case OMP_CLAUSE_SCHEDULE_AUTO:
 	  pp_string (buffer, "auto");
 	  break;
+	case OMP_CLAUSE_SCHEDULE_CILKFOR:
+	  pp_string (buffer, "cilk-for grain");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2392,6 +2395,12 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
       pp_string (buffer, "#pragma simd");
       goto dump_omp_loop;
 
+    case CILK_FOR:
+      /* This label points one line after dumping the clauses.  
+	 For _Cilk_for the clauses are dumped after the _Cilk_for (...) 
+	 parameters are printed out.  */
+      goto dump_omp_loop_cilk_for;
+
     case OMP_DISTRIBUTE:
       pp_string (buffer, "#pragma omp distribute");
       goto dump_omp_loop;
@@ -2420,6 +2429,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
     dump_omp_loop:
       dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 
+    dump_omp_loop_cilk_for:
+
       if (!(flags & TDF_SLIM))
 	{
 	  int i;
@@ -2440,7 +2451,10 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 		{
 		  spc += 2;
 		  newline_and_indent (buffer, spc);
-		  pp_string (buffer, "for (");
+		  if (TREE_CODE (node) == CILK_FOR)
+		    pp_string (buffer, "_Cilk_for (");
+		  else 
+		    pp_string (buffer, "for (");
 		  dump_generic_node (buffer,
 				     TREE_VEC_ELT (OMP_FOR_INIT (node), i),
 				     spc, flags, false);
@@ -2454,6 +2468,8 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
 				     spc, flags, false);
 		  pp_right_paren (buffer);
 		}
+	      if (TREE_CODE (node) == CILK_FOR) 
+		dump_omp_clauses (buffer, OMP_FOR_CLAUSES (node), spc, flags);
 	    }
 	  if (OMP_FOR_BODY (node))
 	    {
diff --git a/gcc/tree.def b/gcc/tree.def
index f8d6444..558d7c8 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1051,6 +1051,10 @@ DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
    Operands like for OMP_FOR.  */
 DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
 
+/* Cilk Plus - _Cilk_for (..)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
    Operands like for OMP_FOR.  */
 DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)

[-- Attachment #3: c-ChangeLogs --]
[-- Type: application/octet-stream, Size: 3986 bytes --]

gcc/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cilk-common.c (declare_cilk_for_builtin): New function.
	(cilk_init_builtins): Added two new built-in functions for _Cilk_for
	support.
	* cilk.h (enum cilk_tree_index): Added two new enumerators called
	CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64.
	(cilk_for_32_fndecl): New define.
	(cilk_for_64_fndecl): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_parallel): Added a new
	parameter.  If it is printing a _Cilk_for statement, then do not 
	print OMP's pragmas.
	(dump_gimple_omp_for): Added GF_OMP_FOR_KIND_CILK_FOR.  Printed out
	_Cilk_for statments without the #pragmas.  Also, added NE_EXPR case.
	* tree-pretty-print.c (dump_generic_node): Added CILK_FOR case.
	Print "_Cilk_for" if the node is of type CILK_FOR.
	(dump_omp_clause): Added a new case called OMP_CLAUSE_SCHEDULE_CILKFOR.
	* gimple.h (enum gf_mask): Added new value: GF_OMP_FOR_KIND_CILKFOR.
	Readjusted other values to satisfy the the masking rules.
	(gimple_cilk_for_induction_var): New function.
        (gimplify_scan_omp_clauses): Added a new paramter called
	is_cilk_for.  If is_cilk_for is true then do not boolify the 
	IF_CLAUSE's expression.  Also, remove the IF clause from _Cilk_for and
	schedule clause from the #pragma omp parallel inserted by _Cilk_for.
	(gimplify_omp_parallel): Added check to see if we are gimplifying
	a _Cilk_for statement.
	(gimplify_omp_for): Added support to gimplify a _Cilk_for statement.
	(gimplify_expr): Added CILK_FOR case.
	* omp-low.c (extract_omp_for_data): Added a check for CILK_FOR and
	set the schedule kind accordingly.  Added a check for CILK_FOR trees
	whereever CILKSIMD is checked.
	(create_omp_child_function_name): Added a new paramter: is_cilk_for.
	(find_cilk_for_stmt): New function.
	(is_cilk_for_stmt): Likewise.
	(cilk_for_check_loop_diff_type): Likewise.
	(expand_cilk_for_call): Likewise.
	(expand_cilk_for): Likewise.
	(create_omp_child_function): Added support to create _Cilk_for's
	child function by adding two additional parameters.
	(expand_omp_taskreg): Extracted the high and low parameters from the
	child function and set them accordingly in the child function.
	(expand_omp_for): Added a call to expand_cilk_for.
	* tree.def (CILK_FOR): New tree.
	* tree-core.h (enum omp_clause_schedule_kind): Added a new enumerator
	field OMP_CLAUSE_SCHEDULE_CILKFOR.
	* cilk-builtins.def (BUILT_IN_CILK_FOR_32): New built-in function.
	(BUILT_IN_CILK_FOR_64): Likewise.
	
gcc/c-family/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-cilkplus.c (find_cilk_for): New function.
	(cilk_for_move_clauses_upward): Likewise.
	* c-common.c (c_common_reswords[]): Added a new field called _Cilk_for.
	* c-common.h (enum rid): Added new enumerator called RID_CILK_FOR.
	* c-omp.c (c_finish_omp_for): Added a new parameter called count.
	Computed the value of loop-count based on initial, condition and
	increment information.
	* c-pragma.c (init_pragma): Added cilk grainsize pragma.
	* c-pragma.h (enum pragma_kind): Added new enumerator called
	PRAGMA_CILK_GRAINSIZE.

gcc/c/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-parser.c (c_parser_statement_after_labels): Added RID_CILK_FOR
	case.
	(c_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
	(c_parser_omp_for_loop): Added grain parameter.  Also, modified
	the function to parse _Cilk_for statement.
	(c_parser_cilk_grainsize): New function.
	(c_parser_cilk_simd): Added a new parameter called is_cilk_for.
	Modified the function to handle CILK_FOR.

gcc/testsuite/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk-fors.c: New testcase.
	* c-c++-common/cilk-plus/CK/nested_cilk_for.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Likewise.
	* c-c++-common/cilk-plus/CK/cilk_for_ptr_iter.c: Likewise.

[-- Attachment #4: cp-ChangeLogs --]
[-- Type: application/octet-stream, Size: 1873 bytes --]

gcc/cp/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* cp-cilkplus.c (copy_tree_till_cilk_for): New function.
	(find_vars): Likewise.
	(find_killed_vars): Likewise.
	(found_cilk_for_p): Likewise.
	(find_cilk_for_stmt): Likewise.
	(insert_firstpriv_clauses): Likewise.
	(cilk_for_create_bind_expr): Likewise.
	* cp-tree.h (copy_tree_till_cilk_for): New prototype.
	(cilk_for_create_bind_expr): Likewise.
	* parser.c (cp_parser_statement): Added a RID_CILK_FOR case.
	(cp_parser_omp_for_cond): Added a check for CILK_FOR tree along with
	CILK_SIMD tree.
	(cp_parser_omp_for_loop): Added a new paramter: cfor_block.  Added
	support for parsing a _Cilk_for statement.  Removed statements
	between _Cilk_for statement and the #pragma omp parallel to move
	them upward.
	(cp_parser_cilk_grainsize): New function.
	(cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE pragma.
	(cp_parser_cilk_simd): Added a new parameter called grain.  Added
	support to handle _Cilk_for statement along with #pragma simd.
	* pt.c (tsubst_expr): For _Cilk_for statement, move certain clauses
	upward to #pragma parallel statement.  Added a CILK_FOR case.
	Modified OMP_PARALLEL case to handle _Cilk_for.
	* semantics.c (handle_omp_for_class_iterator): Added a NE_EXPR case.
	(finish_omp_for): for _Cilk_for statement added a IF-CLAUSE.
	
gcc/testsuite/ChangeLog
2014-02-24  Balaji V. Iyer  <balaji.v.iyer@intel.com>

	* c-c++-common/cilk-plus/CK/cilk_for_errors.c: Made certain error
	tags C specific and inserted their C++ equivalents.
	* c-c++-common/cilk-plus/CK/cilk_for_grain_errors.c: Likewise.
	* g++.dg/cilk-plus/CK/cilk-for-tplt.cc: New testcase.
	* g++.dg/cilk-plus/CK/cf3.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_rev_iter.cc: Likewise.
	* g++.dg/cilk-plus/CK/stl_test.cc: Likewise.
	* g++.dg/cilk-plus/CK/for1.cc: Likewise.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* FW: [PING] [PATCH] _Cilk_for for C and C++
       [not found]                                             ` <20140306115443.GC22862@tucnak.redhat.com>
@ 2014-03-20 21:19                                               ` Iyer, Balaji V
  0 siblings, 0 replies; 26+ messages in thread
From: Iyer, Balaji V @ 2014-03-20 21:19 UTC (permalink / raw)
  To: gcc-patches

I mis-spelled the "org" as "og" and thus the email got bounced. So, here it is again.

Thanks,
Balaji V. Iyer.

> -----Original Message-----
> From: Iyer, Balaji V
> Sent: Thursday, March 20, 2014 4:34 PM
> To: 'Jakub Jelinek'
> Cc: gcc-patches@gcc.gnu.og
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> 
> 
> > -----Original Message-----
> > From: Jakub Jelinek [mailto:jakub@redhat.com]
> > Sent: Thursday, March 6, 2014 6:55 AM
> > To: Iyer, Balaji V
> > Cc: gcc-patches@gcc.gnu.og
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Tue, Mar 04, 2014 at 03:35:32PM +0000, Iyer, Balaji V wrote:
> > > 	Did you get a chance to look at this?
> >
> > Please look what you are emitting for cf3.cc, e.g. in *.gimple:
> >
> >         D.2883 = J<int>::begin (j);
> >         I<int>::I (&i, D.2883);
> >         D.2885 = J<int>::end (j);
> >         retval.0 = operator-<int> (D.2885, &i);
> >         D.2886 = retval.0 /[cl] 2;
> >         #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865)
> > lastprivate(D.2864)
> >           {
> >             const difference_type D.2866;
> >             long int D.2887;
> >             struct I & D.2888;
> >
> >             try
> >               {
> >
> >                 _Cilk_for (D.2864 = 0; D.2864 < 0; D.2864 = D.2864 +
> > 2) linear(D.2864:2) schedule(cilk-for grain,0)
> >
> > That is just plain wrong, you aren't iterating zero times, but
> > D.2886 times, so there should be D.2864 < retval.0.
> > In *.original that is:
> >     <<cleanup_point <<< Unknown tree: expr_stmt
> >   (void) (i = TARGET_EXPR <D.2854, <<< Unknown tree: aggr_init_expr
> >   5
> >   __comp_ctor
> >   D.2854
> >   0B
> >   (const struct I &) (const struct I *) J<int>::begin ((struct J *) j) >>>>)
> >>>>>;
> >     #pragma omp parallel firstprivate(i) schedule(cilk-for grain,0)
> > if(<<cleanup_point operator-<int> ((const struct I &) (const struct I
> > *) J<int>::end ((struct J *) j), (const struct I &) (const struct I *) &i)>> /[cl] 2)
> >       {
> >         {
> >           try                      <======================== HERE
> ========================
> >             {
> >                             difference_type D.2864;
> >                             difference_type D.2865;
> >
> >                 {
> >                   <<cleanup_point <<< Unknown tree: expr_stmt (void)
> > (D.2865 = 0)
> > >>>>>
> >                   _Cilk_for (D.2864 = 0; D.2864 < 0; D.2864 = D.2864 +
> > 2) schedule(cilk- for grain,0) if(<<cleanup_point operator-<int>
> > ((const struct I &) (const struct I *) J<int>::end ((struct J *) j),
> > (const struct I &) (const struct I *) &i)>> /[cl] 2)
> >
> > As I've said several times before, don't put the operator- expression
> > into if clause, instead evaluate it into a scalar temporary first
> > (get_temp_regvar), before the #pragma omp parallel, and use the result
> > of get_temp_regvar in the if clause (temp /[cl] 2) and in firstprivate
> > clause on omp parallel and as the high bound of _Cilk_for.
> 
> Hi Jakub,
> 	First off, I am sorry for the delay in response. I had a couple family
> issues to deal with that tied me up for the past week.
> 	Yes, I agree that it looks weird in the intermediate code. I tried to
> move the operator-(...) value into a variable (in the function
> handle_omp_for_class_iterator) and it was giving me an ICE during gimplify
> case. The issue was scoping. The if clause is sitting on #pragma omp parallel,
> but when I move the operator- to a variable it is inside the try block (marked
> by <==== HERE =====). Also, this only happens when we deal with iterators
> and overloaded opeators. All the other cases the intermediate code looks
> OK. This only happens in dump of gimple and original. All the intermediate
> codes after that (>= .012) looks OK. Since this is a specific case, can you
> accept this as a workaround for now?
> 
> Thanks,
> 
> Balaji V. Iyer.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-03-20 20:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-27 20:41 [PING] [PATCH] _Cilk_for for C and C++ Iyer, Balaji V
2014-01-27 20:53 ` Jakub Jelinek
2014-01-27 21:36   ` Iyer, Balaji V
2014-01-28 16:55     ` Iyer, Balaji V
2014-01-29 11:31       ` Jakub Jelinek
2014-01-29 15:54         ` Iyer, Balaji V
2014-01-31 15:39           ` Iyer, Balaji V
2014-02-05  5:27         ` Iyer, Balaji V
2014-02-07 14:02           ` Jakub Jelinek
2014-02-07 14:33             ` Iyer, Balaji V
2014-02-07 14:53               ` Jakub Jelinek
2014-02-07 22:14                 ` Iyer, Balaji V
2014-02-10 17:57                   ` Jakub Jelinek
2014-02-10 22:07                     ` Iyer, Balaji V
2014-02-12 14:59                       ` Jakub Jelinek
2014-02-12 15:14                         ` Iyer, Balaji V
2014-02-12 15:28                           ` Jakub Jelinek
2014-02-12 17:05                             ` Iyer, Balaji V
2014-02-12 17:09                               ` Jakub Jelinek
2014-02-12 17:15                                 ` Iyer, Balaji V
2014-02-17  6:42                                 ` Iyer, Balaji V
2014-02-19  4:43                                   ` Iyer, Balaji V
2014-02-19 11:24                                     ` Jakub Jelinek
2014-02-21  4:38                                       ` Iyer, Balaji V
2014-02-24 23:16                                         ` Iyer, Balaji V
     [not found]                                           ` <BF230D13CA30DD48930C31D4099330003A4D2123@FMSMSX101.amr.corp.intel.com>
     [not found]                                             ` <20140306115443.GC22862@tucnak.redhat.com>
2014-03-20 21:19                                               ` FW: " Iyer, Balaji V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).