Merging

Instances of persons representing the same historical person need to be merged. Merging is the process of

  1. Identifying the duplicate persons.
  2. Selecting which person is to be deleted and forwarded to the other. The person being forwarded is referred to as the "duplicate person," or "duplicate." The resulting person receiving the forward is referred to as the "surviving person," or "survivor."
  3. Selecting which resources owned by the duplicate should be copied onto the survivor during the merge. Examples of these resources include conclusions, source references, discussion references, note references, and relationships.
  4. Selecting which resources owned by the survivor are to be deleted as a result of the merge. This may be necessary for a successful merge because of the Family Tree conclusion constraints as described in the Facts guide. For example, if the birth of the duplicate person is to be preserved, the birth of the surviving person has to be deleted since persons may have only one birth conclusion. Similarly, if a couple relationship between the duplicate and a spouse is to be preserved, then the couple relationship to the same spouse has to be deleted since persons can only have one couple relationship to the same spouse ID.
  5. Applying the merge.

Identifying Duplicates

The Person Matches resource is a useful tool for identifying duplicate persons in the system. The results of a person match request will contain a link to the Merge Person resource that can be used to perform a merge between the person and possible duplicates.

Selecting the Survivor

When two persons are identified as possible duplicates, merge constraints can exist that restrict which person (if any, or both) can be selected as the survivor. To determine if any constraints apply, perform an OPTIONS or GET request to the Merge Person resource and examine the response headers as follows:

  • If the person for the current merge resource can be selected as the survivor, the Allow header specifies that a POST may be applied to the merge resource. Otherwise, the POST operation will not be specified as allowed.
  • If the other person can be selected as the survivor, a link is provided as a Link header that resolves to the merge resource for the other person. Otherwise, the Link header is not present.
  • If the two persons cannot be merged in any order, a Warning header is supplied that explains the reason the merge is not allowed.

Selecting Surviving and Non-Surviving Resources

When a merge is possible, a GET on the merge provides an analysis of the potential merge. The analysis provides a list of all resources for both the duplicate and survivor person that are subject to being copied or deleted, respectively. It also points out conflicts based on the Family Tree conclusion constraints as described in the Facts guide.

The merge is applied by performing a POST with the merge object that includes references to the resources to be deleted and the resources to be copied. Resources are copied only from the duplicate and deleted only from the survivor. If a conflicting resource is selected to be copied, then the associated resource from the survivor must also be selected for deletion. See the Merge Person example request for an example of merging.

A merge should also include the reason for why the user merged these two persons.

Some guidelines may be helpful in making decisions on which resources to keep and which to delete. If two duplicates represent the same historical person:

  • All attached sources should generally be copied from the duplicate and none deleted from the survivor, unless they represent the same real underlying source, (e.g., have the same URL).
  • All non-conflicting relationships should generally be copied from the duplicate and none deleted from the survivor. If there are relationships to be deleted, they should be deleted before or after the merge as a separate step with its own reason statement. It is permitted to drop relationships during the merge, but unfortunately this is often done inadvertently by a user, leaving behind relationships that should have been kept.
  • For "conflicting" relationships (those with the same relative person IDs for the same relationship type), one of the two should generally be kept. Often they are identical and it doesn't matter which is kept. Sometimes one will have more information (e.g., one couple relationship may have a more complete marriage date).
  • Occasionally, each of the two "conflicting" relationships will have information that would be nice to keep. For example, one might have a complete marriage date and the other a more complete marriage place. In such cases, a separate conclusion change will have to be made before or after the merge is performed. The merge process itself can't update individual conclusions or relationships other than copying or deleting them as explained above.

Results of Merging

When a relationship referencing the duplicate is selected to be preserved, the resulting merge operation will create a new relationship that references the survivor. The original relationship to the duplicate person will be deleted.

When the merge is complete, the duplicate person is forwarded to the survivor which means that requests for the duplicate will return a 301 Moved Permanently and contain a link in the Location header that points to the survivor.

The change history for the survivor includes details of the merge, including all resources that were deleted or added, a reason for the merge, and the ID of the user that performed the merge.