Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons.[1] A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection.

The following are some of the ways in which two code sequences can be duplicates of each other:

How duplicates are created

There are a number of reasons why duplicate code may be created, including:

https://www.researchgate.net/publication/305992168_Clones_Clustering_Using_K-Means?ev=prf_pub

Problems associated with duplicate code

Inappropriate code duplication generally makes editing more difficult due to unnecessary increases in complexity and length. This may lead to increased maintenance costs, more human error, forgotten or overlooked pieces of code, greater file size and may be indicative of a sloppy design. Small differences between clones can be indications of missed fault fixes leading to the hypothesis that such clones are related to faults. This, however, is still debated in the scientific community. Probably, there are further factors, such as the developers' awareness of clones, which play a role in this relationship.[4] Appropriate code duplication may occur for many reasons, including facilitating the development of a device driver for a device that is similar to some existing device [5]

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array_a[];
extern int array_b[];
 
int sum_a = 0;

for (int i = 0; i < 4; i++)
   sum_a += array_a[i];

int average_a = sum_a / 4;
 
int sum_b = 0;

for (int i = 0; i < 4; i++)
   sum_b += array_b[i];

int average_b = sum_b / 4;

The two loops can be rewritten as the single function:

int calc_average_of_four(int* array) {
   int sum = 0;
   for (int i = 0; i < 4; i++)
       sum += array[i];

   return sum / 4;
}

Using the above function will give source code that has no loop duplication:

extern int array1[];
extern int array2[];

int average1 = calcAverage(array1);
int average2 = calcAverage(array2);

Note that in this trivial case, the compiler may choose to inline both calls to the function, such that the resulting machine code is identical for both the duplicated and non-duplicated examples above. If the function is not inlined, then the additional overhead of the function calls will probably take longer to run (on the order of 10 processor instructions for most high-performance languages). This additional could theoretically be a problem.

See also

References

  1. Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06.
  2. Code similarities beyond copy & paste by Elmar Juergens, Florian Deissenboeck, Benjamin Hummel.
  3. Stefan Wagner, Asim Abdulkhaleq, Ivan Bogicevic, Jan-Peter Ostberg, Jasmin Ramadani. How are functionally similar code clones syntactically different? An empirical study and a benchmark PeerJ Computer Science 2:e49. doi:10.7717/peerj-cs.49
  4. Wagner, Stefan; Abdulkhaleq, Asim; Kaya, Kamer; Paar, Alexander (2016). "On the relationship of inconsistent software clones and faults: an empirical study". Proc. 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016).
  5. Kapser, C.; Godfrey, M.W., ""Cloning Considered Harmful" Considered Harmful," 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, Oct. 2006
  6. Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992.
  7. Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees
  8. Visual Detection of Duplicated Code by Matthias Rieger, Stephane Ducasse.
  9. Yuan, Y. and Guo, Y. CMCD: Count Matrix Based Code Clone Detection, in 2011 18th Asia-Pacific Software Engineering Conference. IEEE, Dec. 2011, pp. 250–257.
  10. Chen, X., Wang, A. Y., & Tempero, E. D. (2014). A Replication and Reproduction of Code Clone Detection Studies. In ACSC (pp. 105-114).

https://www.researchgate.net/publication/305992168_Clones_Clustering_Using_K-Means?ev=prf_pub

External links

This article is issued from Wikipedia - version of the 9/11/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.