If, for whatever reason, you wish to do the import from and/or export to Citavi not via SQL, but with Excel files, then CitaviR offers an alternative approach with

However, especially the export to Citavi via Excel files takes more effort. In general, the work flow from start to finish is structured in five steps.

Step CitaviR Effort: 1st time setup Effort: regular use
Step 1: Citavi to xlsx 😏 😎
Step 2: xlsx to R 😎 😎
Step 3: Process data in R 😏 😎
Step 4: R to xlsx 😎 😎
Step 5: xlsx to Citavi 💥 😏

Effort levels: low effort 😎; acceptable effort 😏; can be cumbersome 💥

Step 1: Citavi to xlsx

In your Citavi project open the table view.

Make sure all relevant columns are selected (e.g. via Citavi’s customizable selection presets) and export to an Excel file.

Step 2: xlsx to R

The Excel file exported above is available in CitaviR as the example dataset 3dupsin5refs.xlsx. Furthermore, read_Citavi_xlsx() offers an import function based on readxl::read_excel() with some functionality specifically for xls/xlsx files created with Citavi via export to Excel.

library(tidyverse)
library(CitaviR)

my_path <- example_file("3dupsin5refs.xlsx") # in real life: replace with path to your xlsx file
CitDat  <- read_Citavi_xlsx(path = my_path)

CitDat
#> # A tibble: 5 x 19
#>   ID     Title  Year  `Short title` Author Categories Groups Abstract `DOI name`
#>   <chr>  <chr>  <chr> <chr>         <chr>  <chr>      <chr>  <chr>    <chr>     
#> 1 7e04d~ Estim~ 2019  Schmidt, Har~ Schmi~ 1 catA     Googl~ Broad-s~ 10.2135/c~
#> 2 2481c~ Herit~ 2019  Schmidt, Har~ Schmi~ 1 catA     Googl~ In plan~ 10.1534/g~
#> 3 db3ab~ Herit~ 2019  Schmidt, Har~ Schmi~ 2 catB     PubMed In plan~ <NA>      
#> 4 ba575~ Hrita~ 2019  Schmidt, Har~ Schmi~ 1 catA     TypoDB In plan~ 10.1534/g~
#> 5 fa402~ More,~ 2018  Schmidt, Möh~ Schmi~ 2 catB     Googl~ Traditi~ 10.2135/c~
#> # ... with 10 more variables: PubMed ID <chr>, Online address <chr>,
#> #   Periodical <chr>, Volume <chr>, Number <chr>, Page range <chr>,
#> #   Locations <chr>, has_attachment <lgl>, red_flag <lgl>, blue_circle <lgl>

Step 3: Process data in R

At this point there are many things one may wish to do with the data. In this example we will make use of the CitaviR functions to identify and handle obvious duplicates.

Find obvious duplicates

CitDat <- CitDat %>% 
  find_obvious_dups()

One way of identifying obvious duplicates is via CitaviR::find_obvious_dups(). In short, it first creates a clean_title by combining each reference’s Title and Year into a simplified string. This simplification is based on janitor::make_clean_names() and e.g. converts to all-lowercase, and removes special characters and unnecessary spaces. If two references have the same clean_title, they are identified as obvious duplicates. In this example, two references were indeed identified as obvious duplicates:

CitDat %>%
 select(Title, clean_title:obv_dup_id)
#> # A tibble: 5 x 5
#>   Title             clean_title            clean_title_id has_obv_dup obv_dup_id
#>   <chr>             <chr>                  <chr>          <lgl>       <chr>     
#> 1 Estimating broad~ estimating_broad_sens~ ct_01          FALSE       dup_01    
#> 2 Heritability in ~ heritability_in_plant~ ct_02          TRUE        dup_01    
#> 3 Heritability in ~ heritability_in_plant~ ct_02          TRUE        dup_02    
#> 4 Hritability in P~ hritability_in_plant_~ ct_03          FALSE       dup_01    
#> 5 More, Larger, Si~ more_larger_simpler_h~ ct_04          FALSE       dup_01

Note how a single typo (“Hritability”) prevents ct_03 from being detected as an obvious duplicate for ct_02. For cases like this, one may use CitaviR::find_potential_dups(), but we will not go so far in this introduction.

Handle obvious duplicates

At this point we have already gained information and could continue with steps 4 and 5. However, sometimes duplicates hold different information as it is the case here for ct_02 and the columns PubMed ID, DOI name and Categories:

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, `DOI name`, `PubMed ID`, Categories)
#> # A tibble: 2 x 5
#>   clean_title_id obv_dup_id `DOI name`                  `PubMed ID` Categories
#>   <chr>          <chr>      <chr>                       <chr>       <chr>     
#> 1 ct_02          dup_01     10.1534/genetics.119.302134 <NA>        1 catA    
#> 2 ct_02          dup_02     <NA>                        31248886    2 catB

In such a scenario it would be best to gather all information into the one non-duplicate (=dup_01) that will be kept and of interest later on. Here, CitaviR::handle_obvious_dups() comes in handy:

CitDat <- CitDat %>% 
  handle_obvious_dups(fieldsToHandle = c("DOI name", "PubMed ID"), 
                      nameDupCategories = "3 duplicate")

As can be seen, the columns listed in fieldsToHandle = are filled up (i.e. tidyr::fill(all_of(fieldsToHandle), .direction = "up")). By providing a string for nameDupCategories =, unique entries in the Categories column are collapsed into the respective entry for dup_01, while entries for all other obvious duplicates are replaced by the provided string.

CitDat %>% 
  filter(clean_title_id == "ct_02") %>% 
  select(clean_title_id, obv_dup_id, `DOI name`, `PubMed ID`, Categories)
#> # A tibble: 2 x 5
#>   clean_title_id obv_dup_id `DOI name`                 `PubMed ID` Categories   
#>   <chr>          <chr>      <chr>                      <chr>       <chr>        
#> 1 ct_02          dup_01     10.1534/genetics.119.3021~ 31248886    1 catA; 2 ca~
#> 2 ct_02          dup_02     <NA>                       31248886    3 duplicate

Therefore, we could now get rid of all obvious duplicates (obv_dup_id =! dup_01) without losing any information.

Step 4: R to xlsx

To export this table to Excel, write_Citavi_xlsx() offers an export function based on openxlsx::write.xlsx() with some extra functionality. For example, when supplying the same path we used for read_Citavi_xlsx(path = ...) in Step 2, the xlsx file will be created in the same folder it was imported from with a slightly altered name:

write_Citavi_xlsx(CitDat, read_path = my_path) # works in real life, but not for example data

Step 5: xlsx to Citavi

We now make use of the custom Citavi macros to import the information from excel to Citavi. In this case, we can make use of CIMR 002, which is one of the macro templates provided in the repository CitaviRMacros. As can be seen in this gif, we here store the obv_dup_ids into Custom field 1, while it is made sure to import the PubMed ID information handled in step 3. In order to also import the handled categories, one would additionaly use CIMR 010 [TO DO].

[TO DO Screenshot with “Click for gif”]

Afterwards, we could delete all titles in the Citavi project that are not dup_01 and get rid of all obvious duplicates without losing information.

1st time setup


Unfortunately, importing xlsx into Citavi is not as trivial as exporting xlsx from it. In order to make this work and reproduce e.g. what can be seen in the gif mentioned in step 5 you must first

Afterwards, you should be able to run the original Excel-import macro CIM007 Import arbitrary data from Microsoft Excel into custom fields of existing references by short title provided by Citavi, as well as all Excel-import macros in the repository CitaviRMacros.

Note that it is this very macro CIM007 that makes all of this possible. Without it, CitaviR would not nearly be as useful since - according to my knowledge - there is currently no other way to import Excel data into Citavi. All Import-macros (“CIMR” prefix) in the repository CitaviRMacros are basically adpated versions of CIM007.