De-duplicate an archive


To work more efficiently, remove the exact duplicates from an archive.

De-duplication is the process of finding duplicate items in a data source. Items are considered to be duplicates if their text content, excluding whitespace, is the same. (Their attributes are not compared.)

After the process is done, you can decide what to do with the exact duplicates.

  1. Open the archive.
  2. In the Exact Duplicates section, click Generate exact duplicates.
  3. In the confirmation dialog, click OK.

    Note: The processing time depends on the number of items in the archive.

  4. Optional: After the files have been processed, you can do the following from the Archive Details page:
    • To view the unique items, click View deduplicated files.
    • To create a bucket from the unique items, click Create bucket from non-exacts.
    • To create a dataset from the unique items, click Create dataset from non-exacts.
    • To permanently delete the clusters so you can de-duplicate the archive again, click Delete exact duplicates.
