clEnqueueNDRangeKernel with work dimension=2

I am writing a code to add two matrices of dimension 1024*1024 each.
So my work dimension has to be 2 and the global work size shall be 1024*1024. I want to set the size of each work group to 64*64. How do I achieve that?

So my code should be something like:-

clEnqueueNDRangeKernel(cl_command_queue command_queue,cl_kernel kernel,cl_uint work_dim,const size_t *global_work_offset,

const size_t *global_work_size,const size_t *local_work_size,
cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

where local_work_size=64*64, global_work_size=1024*1024, work_dim=2.
How do I obtain individual elements in my kernel code?

