Home  >  Article  >  Web Front-end  >  Detailed examples of several ideas for deduplication of JavaScript arrays

Detailed examples of several ideas for deduplication of JavaScript arrays

小云云
小云云Original
2018-02-08 16:14:001417browse

The demand for data deduplication is actually that tool libraries such as lodash have mature and complete implementations and can be maturely used in production environments. But this does not prevent us from the perspective of thinking expansion to see how duplication removal can be achieved using several ideas. This article mainly shares with you several ideas for deduplication of JavaScript arrays.

The first is the implementation of the conventional double-layer circular comparison idea

function doubleLoopUniq(arr) {
  let result = [];
  for (let i = 0, len = arr.length, isExist; i < len; i++) {
    // 定义一个变量表示当前元素在 result 中是否存在。
    isExist = false;
    for (let j = 0, rLen = result.length; j < rLen; j++) {
      if (result[j] === arr[i]) {
        // 依次对result 中的元素 和 原数组元素进行比对。
        isExist = true;
        break;
      }
    }
    // 最后判断如果不存在,则将此元素插入result
    !isExist && result.push(arr[i]);
  }
  return result;
}

Use the built-in indexOf of js to remove duplicates

function indexOfUniq(arr) {
  let result = [];
  for (let i = 0, len = arr.length; i < len; i++) {
    // 用indexOf 简化了二层循环的流程
    if (result.indexOf(arr[i]) === -1) result.push(arr[i]);
  }
  return result;
}

After sorting, compare before and after to remove duplicates

function sortUniq(arr) {
  let result = [], last;
  // 这里解构是为了不对原数组产生副作用
  [ ...arr ].sort().forEach(item => {
    if (item != last) {
      result.push(item);
      last = item;
    }
  });
  return result;
}

Deduplication through hashTable

function hashUniq(arr) {
  let hashTable = arr.reduce((result, curr, index, array) => {
    result[curr] = true;
    return result;
  }, {})
  return Object.keys(hashTable).map(item => parseInt(item, 10));
}

ES6 SET one line of code to achieve deduplication

function toSetUniq(arr) {
  return Array.from(new Set(arr));
}

splice deduplication (directly operates the array itself, with side effects)

function inPlaceUniq(arr) {
  let idx = 0;
  while (idx < arr.length) {
    let compare = idx + 1;
    while (compare < arr.length) {
      if (arr[idx] == arr[compare]) {
        arr.splice(compare, 1);
        continue;
      }
      ++compare
    }
    ++idx;
  }
  return arr;
}

Finally in Run a simple test under nodejs to see which one is more efficient~

let data = [];
for (var i = 0; i < 100000; i++) {
  data.push(Math.random())
}

// 实现一个性能测试的装饰器
function performanceTest(fn, descript) {
  var a = new Date().getTime();
  return function () {
    fn.apply(this, [].slice.call(arguments, 0));
    console.log(descript, new Date().getTime() - a)
  }
}

performanceTest(hashUniq, "hashTable")(data)
performanceTest(sortUniq, "sortUniq")(data)
performanceTest(toSetUniq, "toSetUniq")(data)
performanceTest(indexOfUniq, "indexOfUniq")(data)
performanceTest(doubleLoopUniq, "doubleLoopUniq")(data)
performanceTest(inPlaceUniq, "inPlaceUniq")(data)

The results are as follows

hashTable 168ms
sortUniq 332ms
toSetUniq 80ms
indexOfUniq 4280ms
doubleLoopUniq 13303ms
inPlaceUniq 9977ms

Extended thinking: If the elements in the array are objects, how to remove duplicates?

Since it is a reference type, deepEqual will inevitably be used. Although this idea can answer this problem, it is inevitably not efficient enough.

It can also be seen from the above test that deduplication through new Set and hashTable is the most efficient.
So there is no doubt that we need to transform based on these two methods. I want to use hashTable.
On the other hand, in order to reduce the time-consuming caused by deep comparison, I try to use JSON.stringify to reference The type is converted to a basic type.

function collectionUniq(collection) {
  let hashTable = {};
  collection.forEach(item => {
    hashTable[JSON.stringify(item)] = true;
  })
  return Object.keys(hashTable).map(item => JSON.parse(item))
}

Then here comes the problem. We all know that the attributes of objects are unordered. If the data is like this, then it is GG.

let collection = [ { a: 1, b: 2, c: 3 }, { b: 2, c: 3, a: 1 } ]

There is a toHash idea, After a basic deduplication of this array, in order to ensure accuracy,
first traverse the JSON string=>
Pass charCodeAt() gets the unicode encoding of each string =>
and adds them to get a total number. Finally, compare them in pairs. Those with equal values ​​are duplicates, thus achieving the effect of deduplication.

function toHash(obj) {
  let power = 1;
  let res = 0;
  const string = JSON.stringify(obj, null, 2);
  for (let i = 0, l = string.length; i < l; i++) {
    switch (string[i]) {
      case '{':
        power *= 2
        break
      case '}':
        power /= 2
        break
      case ' ':
      case '\n':
      case '\r':
      case '\t':
      break
      default:
        res += string[i].charCodeAt(0) * power
    }
  }
  return res
}

This is just a basic idea for implementation, and there is a lot of room for improvement. In order to reduce the possibility of hash collision, the weight of some special characters can be increased or decreased.

The key point is to ensure that the probability of collision is smaller than winning the jackpot.

Related recommendations:

Sharing several methods of deduplication in JavaScript arrays

Implementing array deduplication in PHP Method code

JS simple implementation of array deduplication method analysis

The above is the detailed content of Detailed examples of several ideas for deduplication of JavaScript arrays. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn