Handle obvious duplicates
handle_obvious_dups( CitDat, fieldsToHandle = NULL, nameDupCategories = NA_character_, nameDupGroups = NA_character_, nameDupKeywords = NA_character_ )
CitDat | A tibble returned by |
---|---|
fieldsToHandle | A character vector with all column/field names that should be handled. Note that this does not include "Categories", "Groups" and "Keywords". |
nameDupCategories | Name that "Categories" of obvious duplicates should be set to. See details below. |
nameDupGroups | Name that "Groups" of obvious duplicates should be set to. See details below. |
nameDupKeywords | Name that "Keywords" of obvious duplicates should be set to. See details below. |
A tibble where information from obvious duplicates was brought together for dup_01
, respectively.
Currently this only works for files that were generated while Citavi
was set to "English" so that column names are "Short Title" etc.
nameDupCategories
, nameDupGroups
and nameDupKeywords
are all NA_character_
by default. If a character string is provided for one of them, the respective column
(i.e. Categories, Groups or Keywords) is handled. This means that whenever obvious duplicates are present,
all unique entries are collapsed into dup_01
, while dup_02
, dup_03
etc. are set
to the provided character string.
example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") CitDat <- read_Citavi_ctv6(example_path) %>% find_obvious_dups() # before CitDat %>% dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")#> # A tibble: 5 x 5 #> clean_title clean_title_id obv_dup_id DOI PubMedID #> <chr> <chr> <chr> <chr> <chr> #> 1 more_larger_simpler_how_compa~ ct_04 dup_01 "10.2135/cr~ "" #> 2 heritability_in_plant_breedin~ ct_02 dup_01 "" "312488~ #> 3 hritability_in_plant_breeding~ ct_03 dup_01 "" "" #> 4 heritability_in_plant_breedin~ ct_02 dup_02 "10.1534/ge~ "" #> 5 estimating_broad_sense_herita~ ct_01 dup_01 "10.2135/cr~ ""# after CitDat %>% handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID")) %>% dplyr::select("clean_title", "clean_title_id", "obv_dup_id", "DOI", "PubMedID")#> # A tibble: 5 x 5 #> clean_title clean_title_id obv_dup_id DOI PubMedID #> <chr> <chr> <chr> <chr> <chr> #> 1 more_larger_simpler_how_compa~ ct_04 dup_01 "10.2135/cr~ "" #> 2 heritability_in_plant_breedin~ ct_02 dup_01 "" "312488~ #> 3 hritability_in_plant_breeding~ ct_03 dup_01 "" "" #> 4 heritability_in_plant_breedin~ ct_02 dup_02 "10.1534/ge~ "" #> 5 estimating_broad_sense_herita~ ct_01 dup_01 "10.2135/cr~ ""