Name _______________________________________ LL61 Professor McNally

Writing Assignment: Research Paper

Paper Due: Monday, August 24, 2020

For this writing assignment, you will combine the writing skills you have been practicing with the research you have completed on your forms to write a 5-paragraph paper in APA style. Your thesis will state the three reasons (or arguments) to support your paper. You will need to quote from four sources for your essay. The format will be as follows:

I. Introduction

1. Use the background information from your forms for your introduction, revising as needed.

2. Include at least one quotation in APA style.

3. End with your three-part thesis statement from Form 3.

II. Body Paragraph 1 (Use Form 3!) Increase in cost of living

1. Topic sentence about first thesis point (no source information)

2. Explain your view with specific supporting details and evidence.

3. Include one quotation and one paraphrase cited in APA style.

4. Closing sentence in your own words (no source information)

III. Body Paragraph 2 (repeat Body Paragraph 1 for second thesis point)

IV. Body Paragraph 3 (repeat Body Paragraph 1 for third thesis point)

V. Conclusion

1. Rephrase your thesis.

2. Include at least one quotation in APA style.

3. Explain your solution or plan of action for the thesis.

APA Style: Your essay should be completed in APA style, and must be five paragraphs. The essay must include these in order to pass.

1. An APA-style cover page

2. APA-style in-text citations for all quotations and paraphrases

3. An APA-style References page (revise Form 3 and add any extra sources)

If you have any questions, let me know!

LAB 1: OpenMP (in C++)

 

Implementation Due: 08 / 21 / 2020 - 11:55 PM

 

Parallel Prewitt Filter Edge Detection algorithm.

Implement a Prewitt filter edge detection algorithm using C++ threads. The algorithm takes a grayscale image as input and outputs an image with edge outlines.

You will be provided with a skeleton program to get a head start with the solution. The solution must work as follows:

1.     Read command line arguments to fetch names of the input and output image (.pgm images).

2.     Read the image file into a 2-D array.

3.     Generate the two 3x3 Prewitt masks, one for each dimension.

4.     Implement the Prewitt filter algorithm for an entire image, exposing parallelism.

a.     Using both static and dynamic scheduling

5.     Generate the output image.

A sample input (left) and its corresponding output (right) is shown below:

 

         

 

 Chunk Size for both parts below is passed from command line.

 

The PGM images are 2-D matrix with each element representing a pixel. Each pixel has a value from 0 to 255.  In the file, when opened as text, the first line represents the format ‘P2’. Second line defines the size of the image in pixels (ex: 250x360). Third line represent shade range (255 in our test cases).

 

 

 

Prewitt Filter Algorithm:

The algorithm applies the 3x3 mask to the neighboring pixels such that the pixels with values closer in magnitude are mapped to values around 0 that represents black and the pixels with significant differences in magnitude are mapped to values around 255 i.e. white.

 

3x3 Mask for X-direction: [ +1   0 -1; +1 0 -1; +1 0 -1 ]

3x3 Mask for Y-direction: [ +1 +1 +1;  0 0  0; -1 -1 -1 ]

For each pixel(p) in the image (except image boundaries):

            Multiply the surrounding pixels of p(X,Y) with corresponding Mask values(I,J)

for each dimension and add them together to compute the gradients (one for each dimension):

                        -1 <= I <= 1; -1 <= J <= 1

                        grad_X += Image(X+I, Y+J) * maskX(I,J)

                        grad_Y += Image(X+I, Y+J) * maskY(I,J)

            grad = sqrt( (grad_X * grad_X) + (grad_Y * grad_Y)  )  

/*NOTE: sqrt() in C++ library function*/

            If grad < 0 then grad = 0. If grad > 255 then grad = 255.

            Write grad into the output image

 

Serial Algorithm Code:

/* 3x3 Prewitt mask for X Dimension. */

maskX[0][0] = +1; maskX[0][1] = 0; maskX[0][2] = -1;

maskX[1][0] = +1; maskX[1][1] = 0; maskX[1][2] = -1;

maskX[2][0] = +1; maskX[2][1] = 0; maskX[2][2] = -1;

 

/* 3x3 Prewitt mask for Y Dimension. */

maskY[0][0] = +1; maskY[0][1] = +1; maskY[0][2] = +1;

maskY[1][0] =   0; maskY[1][1] =   0; maskY[1][2] =    0;

maskY[2][0] =  -1; maskY[2][1] =  -1; maskY[2][2] =  -1;

 

for( int x = 0; x < height; ++x ){

    for( int y = 0; y < width;  ++y ){

        grad_x = 0;

        grad_y = 0;

        /* For handling image boundaries */

        if( x == 0 || x == (height-1) || y == 0 || y == (width-1))

            grad = 0;

        else{

            /* Gradient calculation in X Dimension */

            for( int i = -1; i <= 1; i++ )  {

                for( int j = -1; j <= 1; j++ ){

                    grad_x += (inputImage[x+i][y+j] * maskX[i+1][j+1]);

                }

            }

            /* Gradient calculation in Y Dimension */

            for(i=-1; i<=1; i++)  {

                for(j=-1; j<=1; j++){

                    grad_y += (inputImage[x+i][y+j] * maskY[i+1][j+1]);

                }

            }

             /* Gradient magnitude */

             grad = (int) sqrt( (grad_x * grad_x) + (grad_y * grad_y) );

    }

    outputImage[x][y] = (0 <= grad <= 255);

    }

}

 

 

Part A1:

Use static loop scheduling to process the edge detection with OpenMP parallel for.  Print the information about the threads and the starting point of each chunk (Example: Task 1 -> Processing Chunk starting at Row 50) that they process. Store the thread Ids and the starting points in an array and print it after output is written to the PGM file.

 

Part A2:

Use dynamic loop scheduling to process the edge detection with OpenMP parallel for. The OpenMP uses chunk size to define the number of consecutive iterations that will be executed by one thread.  Each chunk is of the same size (except for the last chunk, which may be smaller). Again, print the information about the threads and the starting point of chunks that they processed.

 

Part B (Analysis):

Determine the performance of your implementation of the Prewitt Filter algorithm (on an Openlab system with at least 4 cores). Be careful when timing your code, you want to time only the processing time for processing the image.

Use omp_get_wtime() function to time you implementation, add #include <omp.h> in the source file and -fopenmp flag when compiling. Print the timing before the program ends.

 

Report the performance for all 5 testcases with 2, 4 and 8 threads. Do it for chunk size of 24 and 65.

 

A work sharing construct:

 

#pragma omp parallel for schedule (type [,chunk]), private (list), shared (list))

 

            Use schedule type STATIC for part A1 and DYNAMIC for part A2.  Private variables in

the list will be private to each thread.

 

Part C (Parallel Quicksort Algorithm using OpenMP tasks):

Implement the Quicksort Algorithm to sort an array of size 10K with elements of datatype ‘int’.

The array must be initialized using a random number generator.

 

Use OpenMP tasks to parallelize the Quicksort algorithm.

 

If statically allocating memory gives segmentation fault, use dynamic memory allocation for the array.

 

There will be no code skeleton provided for this part.

HINT: You may need OpenMP (OMP) directives such as:  omp task  and  omp taskwait.

HINT: You may also need OMP clauses such as:  shared  and  firstprivate.

 

NOTE:

·      Skeletons and the dataset for the lab are available on the Canvas.

Compiling and Running OpenMP program:

To compile you must use gcc/8.2.0 available from the module system.

 

$ module load gcc/8.2.0

 

To compile use the –std=c++11 –fopenmp flags like so:

 

$ g++ -std=c++11 –fopenmp Implementation.cpp –o Implementation

 

It is highly recommended to use the compilation flag –Wall to detect errors early, like so:

 

$ g++ -Wall -std=c++11 – fopenmp Implementation.cpp –o Implementation

 

To check for data races in your code, use the following flags (more about them in the discussion):

 

$ g++ -Wall -std=c++11 – fopenmp Implementation.cpp –o Implementation -fsanitize=thread -fPIE -pie -g

 

If there are any possible data races detected in the code, they will be shown as warnings on execution.

 

Setting thread affinity to run each thread on different core/processor, use environment variables:

 

More about this in the discussion.

 

To set number of threads:

 

-   OMP_NUM_THREADS

 

To bind threads to specific CPUs when compiling with GCC:

 

-   GOMP_CPU_AFFINITY

The variable should contain a space-separated or comma-separated list of CPUs. This list may contain different kinds of entries: either single CPU numbers in any order, a range of CPUs (M-N) or a range with some stride (M-N:S). CPU numbers start at zero. For example, GOMP_CPU_AFFINITY="0 3 1-2 4-15:2" will bind the initial thread to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, and 14 respectively and then start assigning back from the beginning of the list.  GOMP_CPU_AFFINITY=0 binds all threads to CPU 0.

Examples:

 

## Bind threads 0,2 to cores 0,1 and threads 1,3 to cores 8,9.

$export OMP_NUM_THREADS=4

$ export GOMP_CPU_AFFINITY=0,8,1,9

 

## Bind threads 0,1,2,3 to cores 8,9,10,11 respectively.

$ export OMP_NUM_THREADS=4

$ export GOMP_CPU_AFFINITY=8-15

 

 

Useful OpenMP Functions:

-       Use the OpenMP function omp_get_num_threads() to get the number of threads in use but call this function inside a parallel region.

-       To get the identifier of a running thread use the OpenMP function omp_get_thread_num(). The function returns an integer. You can assign the returned integer to a variable var_thread_id and later use the variable to control the execution flow of your program, i.e. if ( var_thread_id == 0 ){} else {}.

-       You may need to create variables private to an OpenMP thread (i.e. var_thread_id). First declare the var_thread_id variable and next to use the #pragma omp parallel for private(var_th_id) statement. In this way you do not need to worry about the overwriting of the variable var_thread_id.

-       Do not use I/O statements inside the parallel region.

 

 

Submit via Canvas in 3 parts:   YOU MUST USE the file names below:

1.     Part A1 and A2 in the same ImplementationA.cpp file.

2.     Part B as a PDF file name Analysis.pdf

3.     Part C in ImplementationC.cpp file.

 

 

Point Breakdown:

Part A Implementation: 40 pts, Part B Analysis: 10 pts.

Part C Implementation: 50 pts.

 

USEFUL LINKS:

http://en.wikipedia.org/wiki/OpenMP  (very easy and useful)

https://computing.llnl.gov/tutorials/openMP/  (much more complex but complete)

http://openmp.org/mp-documents/omp-hands-on-SC08.pdf  (very good tutorial)

http://openmp.org/mp-documents/OpenMP4.0.0.Examples.pdf  (very detailed examples)

 

NOTE: The TA cannot solve the exercises for you, but he can answer your specific questions!

Get help from top-rated tutors in any subject.

Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com