Step 1: Citavi to R

The following screenshot shows the Citavi project that is available in CitaviR as the 3dupsin5refs.ctv6.

Here, we can read in the information of interest via read_Citavi_ctv6().

library(tidyverse)
library(CitaviR)

example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") # in real life: replace with your path
CitDat <- read_Citavi_ctv6(path = example_path,
                           CitDBTableName = "Reference")

CitDat %>% select(Title, Year, Abstract, DOI) 
#> # A tibble: 5 x 4
#>   Title                        Year  Abstract                       DOI         
#>   <chr>                        <chr> <chr>                          <chr>       
#> 1 More, Larger, Simpler: How ~ 2018  Traditionally, cultivar evalu~ "10.2135/cr~
#> 2 Heritability in Plant Breed~ 2019  In plant breeding, heritabili~ ""          
#> 3 Hritability in Plant Breedi~ 2019  In plant breeding, heritabili~ ""          
#> 4 Heritability in plant breed~ 2019  In plant breeding, heritabili~ "10.1534/ge~
#> 5 Estimating broad-sense heri~ 2019  Broad-sense heritability is d~ "10.2135/cr~

If, for whatever reason, you wish to do the import from Citavi not via SQL, but with Excel files (exported from Citavi), then CitaviR offers an alternative approach TO DO LINK

Step 2: Process data in R

At this point there are many things one may wish to do with the data. In this example we will make use of the CitaviR functions to identify and handle obvious duplicates.

Find obvious duplicates

CitDat <- CitDat %>% 
  find_obvious_dups()

One way of identifying obvious duplicates is via CitaviR::find_obvious_dups(). In short, it first creates a clean_title by combining each reference’s Title and Year into a simplified string. This simplification is based on janitor::make_clean_names() and e.g. converts to all-lowercase, and removes special characters and unnecessary spaces. If two references have the same clean_title, they are identified as obvious duplicates. In this example, two references were indeed identified as obvious duplicates:

CitDat %>%
 select(Title, clean_title:obv_dup_id)
#> # A tibble: 5 x 5
#>   Title             clean_title            clean_title_id has_obv_dup obv_dup_id
#>   <chr>             <chr>                  <chr>          <lgl>       <chr>     
#> 1 More, Larger, Si~ more_larger_simpler_h~ ct_04          FALSE       dup_01    
#> 2 Heritability in ~ heritability_in_plant~ ct_02          TRUE        dup_01    
#> 3 Hritability in P~ hritability_in_plant_~ ct_03          FALSE       dup_01    
#> 4 Heritability in ~ heritability_in_plant~ ct_02          TRUE        dup_02    
#> 5 Estimating broad~ estimating_broad_sens~ ct_01          FALSE       dup_01

Note how a single typo (“Hritability”) prevents ct_03 from being detected as an obvious duplicate for ct_02. For cases like this, one may use CitaviR::find_potential_dups(), but we will not go so far in this introduction (TO DO LINK TO VIGNETTE).

Handle obvious duplicates

At this point we have already gained information and could continue with steps 4 and 5. However, sometimes duplicates hold different information as it is the case here for ct_02 and the columns PubMedID, DOI and Categories:

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, DOI, PubMedID)
#> # A tibble: 2 x 4
#>   clean_title_id obv_dup_id DOI                           PubMedID  
#>   <chr>          <chr>      <chr>                         <chr>     
#> 1 ct_02          dup_01     ""                            "31248886"
#> 2 ct_02          dup_02     "10.1534/genetics.119.302134" ""

In such a scenario it would be best to gather all information into the one non-duplicate (=dup_01) that will be kept and of interest later on. Here, CitaviR::handle_obvious_dups() comes in handy:

CitDat <- CitDat %>% 
  handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID"))

As can be seen, the columns listed in fieldsToHandle = are filled up (i.e. tidyr::fill(all_of(fieldsToHandle), .direction = "up")). By providing a string for nameDupCategories =, unique entries in the Categories column are collapsed into the respective entry for dup_01, while entries for all other obvious duplicates are replaced by the provided string.

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, DOI, PubMedID)
#> # A tibble: 2 x 4
#>   clean_title_id obv_dup_id DOI                           PubMedID  
#>   <chr>          <chr>      <chr>                         <chr>     
#> 1 ct_02          dup_01     ""                            "31248886"
#> 2 ct_02          dup_02     "10.1534/genetics.119.302134" ""

Therefore, we could now get rid of all obvious duplicates (obv_dup_id =! dup_01) without losing any information.

Step 3: R to Citavi

Finally, we want to implement the gained information into the Citavi project. For this, we can make use of update_Citavi_ctv6(). Say we would like to overwrite the old DOI and PubMed information and additionally store the clean_title_id and obv_dup_id in Custom field 1 and Custom field 2, respectively. You should probably close your Citavi project before running this:

CitDat %>%
  update_Citavi_ctv6(
    path = example_path,
    CitDBTableName = "Reference",
    CitDatVarToCitDBTableVar = c(
        "DOI"            = "DOI",
        "PubMedID"       = "PubMedID",
        "clean_title_id" = "CustomField1",
        "obv_dup_id"     = "CustomField2"),
    quiet = FALSE
  )
#> Sending SQL queries:
#> UPDATE Reference SET DOI = '10.2135/cropsci2017.09.0555' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]' 
#> UPDATE Reference SET DOI = '' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]' 
#> UPDATE Reference SET DOI = '' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]' 
#> UPDATE Reference SET DOI = '10.1534/genetics.119.302134' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]' 
#> UPDATE Reference SET DOI = '10.2135/cropsci2018.06.0376' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]' 
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]' 
#> UPDATE Reference SET PubMedID = '31248886' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]' 
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]' 
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]' 
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]' 
#> UPDATE Reference SET CustomField1 = 'ct_04' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]' 
#> UPDATE Reference SET CustomField1 = 'ct_02' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]' 
#> UPDATE Reference SET CustomField1 = 'ct_03' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]' 
#> UPDATE Reference SET CustomField1 = 'ct_02' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]' 
#> UPDATE Reference SET CustomField1 = 'ct_01' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]' 
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]' 
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]' 
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]' 
#> UPDATE Reference SET CustomField2 = 'dup_02' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]' 
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]'

If you now open the Citavi project, it should have changed as expected.

If, for whatever reason, you wish to do the export to Citavi not via SQL, but with Excel files, then CitaviR offers an alternative approach TO DO LINK

The basic CitaviR workflow

Step 1: Citavi to R

Step 2: Process data in R

Find obvious duplicates

Handle obvious duplicates

Step 3: R to Citavi