meena moon

meena moon

  • NA
  • 39
  • 8.2k

Detecting and Removing Redundancy in file

Mar 8 2020 1:08 PM
I have a very large dataset ( integer data ) in file .
I would like to search for duplicates data (int value) and then remove them from file in a rapidly way.
What would be a good algorithm for this ??
I'm reading about minhash algorithm. Is it a good way for this purpose? or is there another way??