Handle obvious duplicates

handle_obvious_dups(
  CitDat,
  fieldsToHandle = NULL,
  nameDupCategories = NA_character_,
  nameDupGroups = NA_character_,
  nameDupKeywords = NA_character_
)

Arguments

CitDat	A tibble returned by `find_obvious_dups`.
fieldsToHandle	A character vector with all column/field names that should be handled. Note that this does not include "Categories", "Groups" and "Keywords".
nameDupCategories	Name that "Categories" of obvious duplicates should be set to. See details below.
nameDupGroups	Name that "Groups" of obvious duplicates should be set to. See details below.
nameDupKeywords	Name that "Keywords" of obvious duplicates should be set to. See details below.

Value

A tibble where information from obvious duplicates was brought together for dup_01, respectively.

Details

Currently this only works for files that were generated while Citavi was set to "English" so that column names are "Short Title" etc.
nameDupCategories, nameDupGroups and nameDupKeywords are all NA_character_ by default. If a character string is provided for one of them, the respective column (i.e. Categories, Groups or Keywords) is handled. This means that whenever obvious duplicates are present, all unique entries are collapsed into dup_01, while dup_02, dup_03 etc. are set to the provided character string.

Examples

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6")
CitDat <- read_Citavi_ctv6(example_path) %>%
   find_obvious_dups()

# before
CitDat %>%
   dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")
#> # A tibble: 5 x 5
#>   clean_title                    clean_title_id obv_dup_id DOI          PubMedID
#>   <chr>                          <chr>          <chr>      <chr>        <chr>   
#> 1 more_larger_simpler_how_compa~ ct_04          dup_01     "10.2135/cr~ ""      
#> 2 heritability_in_plant_breedin~ ct_02          dup_01     ""           "312488~
#> 3 hritability_in_plant_breeding~ ct_03          dup_01     ""           ""      
#> 4 heritability_in_plant_breedin~ ct_02          dup_02     "10.1534/ge~ ""      
#> 5 estimating_broad_sense_herita~ ct_01          dup_01     "10.2135/cr~ ""      

# after
CitDat %>%
   handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID")) %>%
   dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")
#> # A tibble: 5 x 5
#>   clean_title                    clean_title_id obv_dup_id DOI          PubMedID
#>   <chr>                          <chr>          <chr>      <chr>        <chr>   
#> 1 more_larger_simpler_how_compa~ ct_04          dup_01     "10.2135/cr~ ""      
#> 2 heritability_in_plant_breedin~ ct_02          dup_01     ""           "312488~
#> 3 hritability_in_plant_breeding~ ct_03          dup_01     ""           ""      
#> 4 heritability_in_plant_breedin~ ct_02          dup_02     "10.1534/ge~ ""      
#> 5 estimating_broad_sense_herita~ ct_01          dup_01     "10.2135/cr~ ""