Java数组,查找重复项

我有一个数组,正在寻找重复项。

duplicates = false;

for(j = 0; j < zipcodeList.length; j++){

for(k = 0; k < zipcodeList.length; k++){

if (zipcodeList[k] == zipcodeList[j]){

duplicates = true;

}

}

}

但是,当没有重复项时,此代码不起作用。为什么?

回答:

duplicates=false;

for (j=0;j<zipcodeList.length;j++)

for (k=j+1;k<zipcodeList.length;k++)

if (k!=j && zipcodeList[k] == zipcodeList[j])

duplicates=true;

编辑后切换.equals()回原来的位置,==因为我在你正在使用的地方阅读int过,最初的问题尚不清楚。还要设置k=j+1,以将执行时间减半,但仍为O(n 2)。

这是一种基于哈希的方法。你需要为自动装箱付款,但是它是O(n)而不是O(n 2)。一个进取的人会去寻找一个基于int的原始哈希集(Apache或Google Collections有这样的东西,methinks。)

boolean duplicates(final int[] zipcodelist)

{

Set<Integer> lump = new HashSet<Integer>();

for (int i : zipcodelist)

{

if (lump.contains(i)) return true;

lump.add(i);

}

return false;

}

请参阅HuyLe的答案以了解或多或少的O(n)解决方案,我认为这需要几个附加步骤:

static boolean duplicates(final int[] zipcodelist)

{

final int MAXZIP = 99999;

boolean[] bitmap = new boolean[MAXZIP+1];

java.util.Arrays.fill(bitmap, false);

for (int item : zipcodeList)

if (!bitmap[item]) bitmap[item] = true;

else return true;

}

return false;

}

static boolean duplicates(final int[] zipcodelist)

{

final int MAXZIP = 99999;

boolean[] bitmap = new boolean[MAXZIP+1]; // Java guarantees init to false

for (int item : zipcodeList)

if (!(bitmap[item] ^= true)) return true;

return false;

}

好吧,所以我运行了一个基准测试,到处都比较麻烦,但这是代码:

import java.util.BitSet;

class Yuk

{

static boolean duplicatesZero(final int[] zipcodelist)

{

boolean duplicates=false;

for (int j=0;j<zipcodelist.length;j++)

for (int k=j+1;k<zipcodelist.length;k++)

if (k!=j && zipcodelist[k] == zipcodelist[j])

duplicates=true;

return duplicates;

}

static boolean duplicatesOne(final int[] zipcodelist)

{

final int MAXZIP = 99999;

boolean[] bitmap = new boolean[MAXZIP + 1];

java.util.Arrays.fill(bitmap, false);

for (int item : zipcodelist) {

if (!(bitmap[item] ^= true))

return true;

}

return false;

}

static boolean duplicatesTwo(final int[] zipcodelist)

{

final int MAXZIP = 99999;

BitSet b = new BitSet(MAXZIP + 1);

b.set(0, MAXZIP, false);

for (int item : zipcodelist) {

if (!b.get(item)) {

b.set(item, true);

} else

return true;

}

return false;

}

enum ApproachT { NSQUARED, HASHSET, BITSET};

/**

* @param args

*/

public static void main(String[] args)

{

ApproachT approach = ApproachT.BITSET;

final int REPS = 100;

final int MAXZIP = 99999;

int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };

long[][] times = new long[sizes.length][REPS];

boolean tossme = false;

for (int sizei = 0; sizei < sizes.length; sizei++) {

System.err.println("Trial for zipcodelist size= "+sizes[sizei]);

for (int rep = 0; rep < REPS; rep++) {

int[] zipcodelist = new int[sizes[sizei]];

for (int i = 0; i < zipcodelist.length; i++) {

zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));

}

long begin = System.currentTimeMillis();

switch (approach) {

case NSQUARED :

tossme ^= (duplicatesZero(zipcodelist));

break;

case HASHSET :

tossme ^= (duplicatesOne(zipcodelist));

break;

case BITSET :

tossme ^= (duplicatesTwo(zipcodelist));

break;

}

long end = System.currentTimeMillis();

times[sizei][rep] = end - begin;

}

long avg = 0;

for (int rep = 0; rep < REPS; rep++) {

avg += times[sizei][rep];

}

System.err.println("Size=" + sizes[sizei] + ", avg time = "

+ avg / (double)REPS + "ms");

}

}

}

With NSQUARED:

Trial for size= 10

Size=10, avg time = 0.0ms

Trial for size= 1000

Size=1000, avg time = 0.0ms

Trial for size= 10000

Size=10000, avg time = 100.0ms

Trial for size= 100000

Size=100000, avg time = 9923.3ms

With HashSet

Trial for zipcodelist size= 10

Size=10, avg time = 0.16ms

Trial for zipcodelist size= 1000

Size=1000, avg time = 0.15ms

Trial for zipcodelist size= 10000

Size=10000, avg time = 0.0ms

Trial for zipcodelist size= 100000

Size=100000, avg time = 0.16ms

Trial for zipcodelist size= 1000000

Size=1000000, avg time = 0.0ms

With BitSet

Trial for zipcodelist size= 10

Size=10, avg time = 0.0ms

Trial for zipcodelist size= 1000

Size=1000, avg time = 0.0ms

Trial for zipcodelist size= 10000

Size=10000, avg time = 0.0ms

Trial for zipcodelist size= 100000

Size=100000, avg time = 0.0ms

Trial for zipcodelist size= 1000000

Size=1000000, avg time = 0.0ms

但是只有一个头发.... 15ms的误差在内currentTimeMillis(),我的基准测试中有一些空白。请注意,对于任何超过100000的列表,你都可以简单地返回,true因为会有重复项。实际上,如果列表是随机的,则可以为更短的列表返回true WHP。道德是什么?在极限情况下,最有效的实现是:

 return true;

而且你不会经常犯错。

以上是 Java数组,查找重复项 的全部内容, 来源链接: utcz.com/qa/406940.html

回到顶部