Handle obvious duplicates

handle_obvious_dups(
  CitDat,
  fieldsToHandle = NULL,
  nameDupCategories = NA_character_,
  nameDupGroups = NA_character_,
  nameDupKeywords = NA_character_
)

Arguments

CitDat

A tibble returned by find_obvious_dups.

fieldsToHandle

A character vector with all column/field names that should be handled. Note that this does not include "Categories", "Groups" and "Keywords".

nameDupCategories

Name that "Categories" of obvious duplicates should be set to. See details below.

nameDupGroups

Name that "Groups" of obvious duplicates should be set to. See details below.

nameDupKeywords

Name that "Keywords" of obvious duplicates should be set to. See details below.

Value

A tibble where information from obvious duplicates was brought together for dup_01, respectively.

Details

[Maturing]
Currently this only works for files that were generated while Citavi was set to "English" so that column names are "Short Title" etc.
nameDupCategories, nameDupGroups and nameDupKeywords are all NA_character_ by default. If a character string is provided for one of them, the respective column (i.e. Categories, Groups or Keywords) is handled. This means that whenever obvious duplicates are present, all unique entries are collapsed into dup_01, while dup_02, dup_03 etc. are set to the provided character string.

Examples

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") CitDat <- read_Citavi_ctv6(example_path) %>% find_obvious_dups() # before CitDat %>% dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")
#> # A tibble: 5 x 5 #> clean_title clean_title_id obv_dup_id DOI PubMedID #> <chr> <chr> <chr> <chr> <chr> #> 1 more_larger_simpler_how_compa~ ct_04 dup_01 "10.2135/cr~ "" #> 2 heritability_in_plant_breedin~ ct_02 dup_01 "" "312488~ #> 3 hritability_in_plant_breeding~ ct_03 dup_01 "" "" #> 4 heritability_in_plant_breedin~ ct_02 dup_02 "10.1534/ge~ "" #> 5 estimating_broad_sense_herita~ ct_01 dup_01 "10.2135/cr~ ""
# after CitDat %>% handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID")) %>% dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")
#> # A tibble: 5 x 5 #> clean_title clean_title_id obv_dup_id DOI PubMedID #> <chr> <chr> <chr> <chr> <chr> #> 1 more_larger_simpler_how_compa~ ct_04 dup_01 "10.2135/cr~ "" #> 2 heritability_in_plant_breedin~ ct_02 dup_01 "" "312488~ #> 3 hritability_in_plant_breeding~ ct_03 dup_01 "" "" #> 4 heritability_in_plant_breedin~ ct_02 dup_02 "10.1534/ge~ "" #> 5 estimating_broad_sense_herita~ ct_01 dup_01 "10.2135/cr~ ""