R/find_obvious_dups.R
find_obvious_dups.Rd
Identify obvious duplicates based on title and year
find_obvious_dups(CitDat, dupInfoAfterID = TRUE, preferDupsWithPDF = TRUE)
CitDat | A dataframe/tibble returned by |
---|---|
dupInfoAfterID | If TRUE (default), the newly created columns
|
preferDupsWithPDF | If TRUE (default), obvious duplicates are sorted by their info
in columns |
A tibble containing four additional columns:
clean_title
, clean_title_id
, has_obv_dup
and obv_dup_id
.
Currently this only works for files that were generated while Citavi
was set to "English" so that column names are "Short Title" etc.
example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") read_Citavi_ctv6(example_path) %>% find_obvious_dups() %>% dplyr::select(clean_title:obv_dup_id)#> # A tibble: 5 x 4 #> clean_title clean_title_id has_obv_dup obv_dup_id #> <chr> <chr> <lgl> <chr> #> 1 more_larger_simpler_how_comparable_are_~ ct_04 FALSE dup_01 #> 2 heritability_in_plant_breeding_on_a_gen~ ct_02 TRUE dup_01 #> 3 hritability_in_plant_breeding_on_a_geno~ ct_03 FALSE dup_01 #> 4 heritability_in_plant_breeding_on_a_gen~ ct_02 TRUE dup_02 #> 5 estimating_broad_sense_heritability_wit~ ct_01 FALSE dup_01