【Java】Java数组拷贝效率，为什么我得出的答案和网上不一样？

Z时代
2024-01-10
分类：技术分享

今天我看了一下数组的拷贝，网上都是说拷贝效率对比

System.arraycopy()>Arrays.copyOf>clone>for

但我实验了之后结果并不是这样，我想知道为什么会造成这样的原因？
代码：

 public static void testSystemArrayCopy(int[] orginal) {
long start_time = System.nanoTime();
int[] target = new int[orginal.length];
System.arraycopy(orginal, 0, target, 0, target.length);
long end_time = System.nanoTime();
System.out.println("使用System.arraycopy方法耗时:" + (end_time - start_time));
}
public static void testArraysCopyOf(int[] orginal) {
long start_time = System.nanoTime();
int[] target = Arrays.copyOf(orginal, orginal.length);
long end_time = System.nanoTime();
System.out.println("使用Arrays.copyOf方法耗时:" + (end_time - start_time));
}
public static void testClone(int[] orginal) {
long start_time = System.nanoTime();
int[] target = orginal.clone();
long end_time = System.nanoTime();
System.out.println("使用clone方法耗时:" + (end_time - start_time));
}
public static void testFor(int[] orginal) {
long start_time = System.nanoTime();
int[] target = new int[orginal.length];
for (int i = 0; i < orginal.length; i++) {
target[i] = orginal[i];
}
long end_time = System.nanoTime();
System.out.println("使用for循环耗时:" + (end_time - start_time));
}
public static void main(String args[]) {
//需要改变原始数组的大小
int[] original = new int[10000000];
for (int i = 0; i < original.length; i++) {
original[i] = i;
}
System.out.println("原始数组的大小:" + original.length);
testSystemArrayCopy(original);
testArraysCopyOf(original);
testClone(original);
testFor(original);}

结果：

原始数组的大小:100
使用System.arraycopy方法耗时:8400
使用Arrays.copyOf方法耗时:52100
使用clone方法耗时:6000
使用for循环耗时:3100
for>clone>System.arraycopy>Arrays.copyOf
======================
原始数组的大小:10000
使用System.arraycopy方法耗时:42200
使用Arrays.copyOf方法耗时:123900
使用clone方法耗时:33700
使用for循环耗时:249200
clone>System.arraycopy>Arrays.copyOf>for
=======================
原始数组的大小:1000000
使用System.arraycopy方法耗时:3874700
使用Arrays.copyOf方法耗时:3174500
使用clone方法耗时:2867300
使用for循环耗时:6705100
clone>Arrays.copyOf>System.arraycopy>for
=======================
原始数组的大小:100000000
使用System.arraycopy方法耗时:242847700
使用Arrays.copyOf方法耗时:242949100
使用clone方法耗时:394861500
使用for循环耗时:136803300for>System.arraycopy≈Arrays.copyOf>clone

实验结果表明：

当数组大小比较小的时候for循环的效率最高，完胜其他方法的效率

当数组大小在1W-100W的时候 clone 效率最高，System.arraycopy也不差，for循环的效率比较糟糕

当数组大小比较大的时候，1亿 for循环效率最高，clone效率最慢

我的问题：
为什么的我结论和网上不一样，以及造成这样的原因

回答

性能测试除了算法，很大程度上和环境有关，如CPU, jvm的版本，jvm的内存参数，同样是上面的测试，相信如果你用下面的参数运行你的测试，会有不同的结果的

$ java -server -Xms10M -Xmx116M -XX:MetaspaceSize=10M -XX:MaxMetaspaceSize=10M Speed 原始数组的大小:10000000 使用System.arraycopy方法耗时:21240621 使用Arrays.copyOf方法耗时:28499728 使用clone方法耗时:24212209 使用for循环耗时:29592803 $ java -server -Xms100M -Xmx116M -XX:MetaspaceSize=10M -XX:MaxMetaspaceSize=10M Speed 原始数组的大小:10000000 使用System.arraycopy方法耗时:21235830 使用Arrays.copyOf方法耗时:17554636 使用clone方法耗时:14565962 使用for循环耗时:19275945

差别在于初始内存参数-Xms10M 和-Xms100M，结果完全不一样了。
离开环境谈性能是没有什么意义的。要固定环境参数，同时要隔离测试，不要写在一个程序里去测试，执行顺序也是相互影响的。

这样的测试最好用 jmh 来做
于是我写了一个测试，用长度 0, 1, 2, 4, 8, 16, 64, 256, 1024, 1024 * 1024 的byte[] 分别测四种复制方法的吞吐量（越大越好），15秒预热，10秒测量，测两轮，结果大概是这样的（各组按照从快到慢的顺序排序）：

Benchmark Mode Cnt Score Error Units array0SystemCopy thrpt 20 314675.886 ± 936.036 ops/ms array0ArraysCopy thrpt 20 312555.083 ± 2927.650 ops/ms array0ForLoop thrpt 20 312199.273 ± 4183.104 ops/ms array0Clone thrpt 20 214478.961 ± 421.628 ops/ms array1ForLoop thrpt 20 258703.707 ± 940.685 ops/ms array1SystemCopy thrpt 20 194942.177 ± 382.662 ops/ms array1Clone thrpt 20 194937.877 ± 344.930 ops/ms array1ArraysCopy thrpt 20 194607.121 ± 949.278 ops/ms array2ForLoop thrpt 20 240467.096 ± 1485.911 ops/ms array2SystemCopy thrpt 20 195098.045 ± 520.939 ops/ms array2Clone thrpt 20 194804.366 ± 831.149 ops/ms array2ArraysCopy thrpt 20 194747.743 ± 442.115 ops/ms array4ForLoop thrpt 20 212128.126 ± 842.421 ops/ms array4Clone thrpt 20 194893.477 ± 374.829 ops/ms array4SystemCopy thrpt 20 194547.102 ± 1301.595 ops/ms array4ArraysCopy thrpt 20 194532.113 ± 751.609 ops/ms array8SystemCopy thrpt 20 194789.870 ± 580.952 ops/ms array8ArraysCopy thrpt 20 194756.009 ± 390.458 ops/ms array8Clone thrpt 20 194300.107 ± 1013.889 ops/ms array8ForLoop thrpt 20 171056.354 ± 640.645 ops/ms array16ArraysCopy thrpt 20 187372.689 ± 296.296 ops/ms array16SystemCopy thrpt 20 187274.660 ± 444.482 ops/ms array16Clone thrpt 20 186272.644 ± 1910.138 ops/ms array16ForLoop thrpt 20 117366.002 ± 2753.701 ops/ms array64Clone thrpt 20 131972.165 ± 294.271 ops/ms array64ArraysCopy thrpt 20 131970.417 ± 312.744 ops/ms array64SystemCopy thrpt 20 131631.054 ± 693.137 ops/ms array64ForLoop thrpt 20 73172.782 ± 285.508 ops/ms array256Clone thrpt 20 49578.672 ± 92.802 ops/ms array256SystemCopy thrpt 20 49537.817 ± 514.318 ops/ms array256ArraysCopy thrpt 20 49477.653 ± 377.154 ops/ms array256ForLoop thrpt 20 24637.253 ± 225.039 ops/ms array1024Clone thrpt 20 13042.147 ± 82.129 ops/ms array1024ArraysCopy thrpt 20 13036.492 ± 99.095 ops/ms array1024SystemCopy thrpt 20 13015.679 ± 87.283 ops/ms array1024ForLoop thrpt 20 6902.488 ± 22.211 ops/ms array10241024Clone thrpt 20 13.685 ± 0.064 ops/ms array10241024SystemCopy thrpt 20 13.664 ± 0.066 ops/ms array10241024ArraysCopy thrpt 20 13.616 ± 0.098 ops/ms

array10241024ForLoop thrpt 20 6.875 ± 0.103 ops/ms

结论是对于很小的数组，for循环的方式略快一些，但是只要数组大一些，for循环就是最慢的，剩下三种方法除了空数组的时候clone有些慢之外，没有明显差距

Arrays.copyOf内部调用了System.arrayCopy，理论上慢一丢丢，至于clone因为看不到代码所以盲猜也是类似的实现

对这个问题还是比较感兴趣的，为此我也写了一个测试。

package test;
import java.util.Arrays;
public class Main {
    public static int[] testSystemArrayCopy(int[] orginal) {
        int[] target = new int[orginal.length];
        System.arraycopy(orginal, 0, target, 0, target.length);
        return target;
    }
    public static int[] testArraysCopyOf(int[] orginal) {
        int[] target = Arrays.copyOf(orginal, orginal.length);
        return target;
    }
    public static int[] testClone(int[] orginal) {
        int[] target = orginal.clone();
        return target;
    }
    public static int[] testFor(int[] orginal) {
        int[] target = new int[orginal.length];
        for (int i = 0; i < orginal.length; i++) {
            target[i] = orginal[i];
        }
        return target;
    }
    public static void main(String args[]) {
        final int LEN = 10_000;
        final int TIMES = 100_000;
        int[] original = new int[LEN];
        for (int i = 0; i < original.length; i++) {
            original[i] = i;
        }
        System.out.println("Size of arrays: " + LEN);
//        // heat up
//        testArraysCopyOf(original);
//        testSystemArrayCopy(original);
//        testClone(original);
//        testFor(original);
        long startTime, endTime;
        long totalSize = 0L;
        startTime = System.nanoTime();
        for (int i = 0; i < TIMES; i++) {
            int[] target = testFor(original);
            totalSize += target.length;
            target = null;
        }
        endTime = System.nanoTime();
        System.out.println("for loop: " + (endTime - startTime) / 1_000_000 + "ms");
//        startTime = System.nanoTime();
//        for (int i = 0; i < TIMES; i++) {
//            int[] target = testSystemArrayCopy(original);
//            totalSize += target.length;
//            target = null;
//        }
//        endTime = System.nanoTime();
//        System.out.println("System.arrayCopy(): " + (endTime - startTime) / 1_000_000 + "ms");
//        startTime = System.nanoTime();
//        for (int i = 0; i < TIMES; i++) {
//            int[] target = testArraysCopyOf(original);
//            totalSize += target.length;
//            target = null;
//        }
//        endTime = System.nanoTime();
//        System.out.println("Arrays.copyOf(): " + (endTime - startTime) / 1_000_000 + "ms");
//        startTime = System.nanoTime();
//        for (int i = 0; i < TIMES; i++) {
//            int[] target = testClone(original);
//            totalSize += target.length;
//            target = null;
//        }
//        endTime = System.nanoTime();
//        System.out.println("clone(): " + (endTime - startTime) / 1_000_000 + "ms");
        System.out.println("total size: " + totalSize);
    }}

测试是在我的个人电脑上完成的，操作系统是 Windows 10， Java 版本 1.8。

每轮测试只执行一种方法，预热环节被我删了，因为现在每轮里面会执行多次，预热对结果的影响应该很小。当然，加上也是可以的。

再来说说测试时使用的参数。我使用的是下面的参数来运行的：
```
-Xms2g -Xmx2g -Xmn1g
```
初始和最大内存都是 2GB，这样可以最大程度减小 JVM 扩展堆时的开销。

每个用例（指不同的数组规模）都执行两轮，先按照正序执行一轮，再逆序执行一轮。

下面是整理后的结果：

Size of arrays: 10,000 (100k times)
    # 1st
    for loop: 592ms
    System.arrayCopy(): 636ms
    Arrays.copyOf(): 633ms
    clone(): 681ms
    # 2nd
    for loop: 587ms
    System.arrayCopy(): 622ms
    Arrays.copyOf(): 628ms
    clone(): 674ms
Size of arrays: 100,000 (10k times)
    # 1st
    for loop: 828ms
    System.arrayCopy(): 849ms
    Arrays.copyOf(): 801ms
    clone(): 855ms
    # 2nd
    for loop: 821ms
    System.arrayCopy(): 825ms
    Arrays.copyOf(): 804ms
    clone(): 874ms
Size of arrays: 1,000,000 (10k times)
    # 1st
    for loop: 7896ms
    System.arrayCopy(): 9454ms
    Arrays.copyOf(): 10362ms
    clone(): 6934ms
    # 2nd
    for loop: 6789ms
    System.arrayCopy(): 9318ms
    Arrays.copyOf(): 9521ms
    clone(): 6909ms
Size of arrays: 10,000,000 (1k times)
    # 1st
    for loop: 7697ms
    System.arrayCopy(): 9236ms
    Arrays.copyOf(): 9242ms
    clone(): 8396ms
    # 2nd
    for loop: 7926ms
    System.arrayCopy(): 9303ms
    Arrays.copyOf(): 9203ms
    clone(): 8500ms
Size of arrays: 100,000,000 (100 times)
    # 1st
    for loop: 7780ms
    System.arrayCopy(): 8817ms
    Arrays.copyOf(): 8886ms
    clone(): 8808ms
    # 2nd
    for loop: 7772ms
    System.arrayCopy(): 8818ms
    Arrays.copyOf(): 9204ms    clone(): 9088ms