If, for whatever reason, you wish to do the import from and/or export to Citavi not via SQL, but with Excel files, then CitaviR
offers an alternative approach with
read_Citavi_xlsx()
instead of read_Citavi_ctv6()
andwrite_Citavi_xlsx()
instead of update_Citavi_ctv6()
.However, especially the export to Citavi via Excel files takes more effort. In general, the work flow from start to finish is structured in five steps.
Step | CitaviR | Effort: 1st time setup | Effort: regular use |
---|---|---|---|
Step 1: Citavi to xlsx | 😏 | 😎 | |
Step 2: xlsx to R | ✅ | 😎 | 😎 |
Step 3: Process data in R | ✅ | 😏 | 😎 |
Step 4: R to xlsx | ✅ | 😎 | 😎 |
Step 5: xlsx to Citavi | 💥 | 😏 |
Effort levels: low effort 😎; acceptable effort 😏; can be cumbersome 💥
In your Citavi project open the table view.
Make sure all relevant columns are selected (e.g. via Citavi’s customizable selection presets) and export to an Excel file.
The Excel file exported above is available in CitaviR
as the example dataset 3dupsin5refs.xlsx
. Furthermore, read_Citavi_xlsx()
offers an import function based on readxl::read_excel()
with some functionality specifically for xls/xlsx files created with Citavi via export to Excel.
library(tidyverse)
library(CitaviR)
my_path <- example_file("3dupsin5refs.xlsx") # in real life: replace with path to your xlsx file
CitDat <- read_Citavi_xlsx(path = my_path)
CitDat
#> # A tibble: 5 x 19
#> ID Title Year `Short title` Author Categories Groups Abstract `DOI name`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 7e04d~ Estim~ 2019 Schmidt, Har~ Schmi~ 1 catA Googl~ Broad-s~ 10.2135/c~
#> 2 2481c~ Herit~ 2019 Schmidt, Har~ Schmi~ 1 catA Googl~ In plan~ 10.1534/g~
#> 3 db3ab~ Herit~ 2019 Schmidt, Har~ Schmi~ 2 catB PubMed In plan~ <NA>
#> 4 ba575~ Hrita~ 2019 Schmidt, Har~ Schmi~ 1 catA TypoDB In plan~ 10.1534/g~
#> 5 fa402~ More,~ 2018 Schmidt, Möh~ Schmi~ 2 catB Googl~ Traditi~ 10.2135/c~
#> # ... with 10 more variables: PubMed ID <chr>, Online address <chr>,
#> # Periodical <chr>, Volume <chr>, Number <chr>, Page range <chr>,
#> # Locations <chr>, has_attachment <lgl>, red_flag <lgl>, blue_circle <lgl>
At this point there are many things one may wish to do with the data. In this example we will make use of the CitaviR
functions to identify and handle obvious duplicates.
CitDat <- CitDat %>%
find_obvious_dups()
One way of identifying obvious duplicates is via CitaviR::find_obvious_dups()
. In short, it first creates a clean_title
by combining each reference’s Title
and Year
into a simplified string. This simplification is based on janitor::make_clean_names()
and e.g. converts to all-lowercase, and removes special characters and unnecessary spaces. If two references have the same clean_title
, they are identified as obvious duplicates. In this example, two references were indeed identified as obvious duplicates:
CitDat %>%
select(Title, clean_title:obv_dup_id)
#> # A tibble: 5 x 5
#> Title clean_title clean_title_id has_obv_dup obv_dup_id
#> <chr> <chr> <chr> <lgl> <chr>
#> 1 Estimating broad~ estimating_broad_sens~ ct_01 FALSE dup_01
#> 2 Heritability in ~ heritability_in_plant~ ct_02 TRUE dup_01
#> 3 Heritability in ~ heritability_in_plant~ ct_02 TRUE dup_02
#> 4 Hritability in P~ hritability_in_plant_~ ct_03 FALSE dup_01
#> 5 More, Larger, Si~ more_larger_simpler_h~ ct_04 FALSE dup_01
Note how a single typo (“Hritability”) prevents ct_03
from being detected as an obvious duplicate for ct_02
. For cases like this, one may use CitaviR::find_potential_dups()
, but we will not go so far in this introduction.
At this point we have already gained information and could continue with steps 4 and 5. However, sometimes duplicates hold different information as it is the case here for ct_02
and the columns PubMed ID
, DOI name
and Categories
:
CitDat %>%
filter(clean_title_id == "ct_02") %>%
select(clean_title_id, obv_dup_id, `DOI name`, `PubMed ID`, Categories)
#> # A tibble: 2 x 5
#> clean_title_id obv_dup_id `DOI name` `PubMed ID` Categories
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ct_02 dup_01 10.1534/genetics.119.302134 <NA> 1 catA
#> 2 ct_02 dup_02 <NA> 31248886 2 catB
In such a scenario it would be best to gather all information into the one non-duplicate (=dup_01
) that will be kept and of interest later on. Here, CitaviR::handle_obvious_dups()
comes in handy:
CitDat <- CitDat %>%
handle_obvious_dups(fieldsToHandle = c("DOI name", "PubMed ID"),
nameDupCategories = "3 duplicate")
As can be seen, the columns listed in fieldsToHandle =
are filled up (i.e. tidyr::fill(all_of(fieldsToHandle), .direction = "up")
). By providing a string for nameDupCategories =
, unique entries in the Categories
column are collapsed into the respective entry for dup_01
, while entries for all other obvious duplicates are replaced by the provided string.
CitDat %>%
filter(clean_title_id == "ct_02") %>%
select(clean_title_id, obv_dup_id, `DOI name`, `PubMed ID`, Categories)
#> # A tibble: 2 x 5
#> clean_title_id obv_dup_id `DOI name` `PubMed ID` Categories
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ct_02 dup_01 10.1534/genetics.119.3021~ 31248886 1 catA; 2 ca~
#> 2 ct_02 dup_02 <NA> 31248886 3 duplicate
Therefore, we could now get rid of all obvious duplicates (obv_dup_id =! dup_01
) without losing any information.
To export this table to Excel, write_Citavi_xlsx()
offers an export function based on openxlsx::write.xlsx()
with some extra functionality. For example, when supplying the same path we used for read_Citavi_xlsx(path = ...)
in Step 2, the xlsx file will be created in the same folder it was imported from with a slightly altered name:
write_Citavi_xlsx(CitDat, read_path = my_path) # works in real life, but not for example data
We now make use of the custom Citavi macros to import the information from excel to Citavi. In this case, we can make use of CIMR 002, which is one of the macro templates provided in the repository CitaviRMacros. As can be seen in this gif ⏯, we here store the obv_dup_id
s into Custom field 1, while it is made sure to import the PubMed ID information handled in step 3. In order to also import the handled categories, one would additionaly use CIMR 010 [TO DO].
[TO DO Screenshot with “Click for gif”]
Afterwards, we could delete all titles in the Citavi project that are not dup_01
and get rid of all obvious duplicates without losing information.
1st time setup
Unfortunately, importing xlsx into Citavi is not as trivial as exporting xlsx from it. In order to make this work and reproduce e.g. what can be seen in the gif mentioned in step 5 you must first
- enable Citavi macros
- install an OLE-DB-Provider. Citavi suggests the Microsoft Access Database Engine 2016 Redistributable Kit as described here in German.
Afterwards, you should be able to run the original Excel-import macro CIM007 Import arbitrary data from Microsoft Excel into custom fields of existing references by short title provided by Citavi, as well as all Excel-import macros in the repository CitaviRMacros.
Note that it is this very macro CIM007 that makes all of this possible. Without it,
CitaviR
would not nearly be as useful since - according to my knowledge - there is currently no other way to import Excel data into Citavi. All Import-macros (“CIMR” prefix) in the repository CitaviRMacros are basically adpated versions of CIM007.