In general, the work flow from start to finish is structured in three steps.
.ctv6
file.ctv6
fileThe following screenshot shows the Citavi project that is available in CitaviR
as the 3dupsin5refs.ctv6
.
Here, we can read in the information of interest via read_Citavi_ctv6()
.
library(tidyverse)
library(CitaviR)
example_path <- example_file("3dupsin5refs/3dupsin5refs.ctv6") # in real life: replace with your path
CitDat <- read_Citavi_ctv6(path = example_path,
CitDBTableName = "Reference")
CitDat %>% select(Title, Year, Abstract, DOI)
#> # A tibble: 5 x 4
#> Title Year Abstract DOI
#> <chr> <chr> <chr> <chr>
#> 1 More, Larger, Simpler: How ~ 2018 Traditionally, cultivar evalu~ "10.2135/cr~
#> 2 Heritability in Plant Breed~ 2019 In plant breeding, heritabili~ ""
#> 3 Hritability in Plant Breedi~ 2019 In plant breeding, heritabili~ ""
#> 4 Heritability in plant breed~ 2019 In plant breeding, heritabili~ "10.1534/ge~
#> 5 Estimating broad-sense heri~ 2019 Broad-sense heritability is d~ "10.2135/cr~
If, for whatever reason, you wish to do the import from Citavi not via SQL, but with Excel files (exported from Citavi), then
CitaviR
offers an alternative approach TO DO LINK
At this point there are many things one may wish to do with the data. In this example we will make use of the CitaviR
functions to identify and handle obvious duplicates.
CitDat <- CitDat %>%
find_obvious_dups()
One way of identifying obvious duplicates is via CitaviR::find_obvious_dups()
. In short, it first creates a clean_title
by combining each reference’s Title
and Year
into a simplified string. This simplification is based on janitor::make_clean_names()
and e.g. converts to all-lowercase, and removes special characters and unnecessary spaces. If two references have the same clean_title
, they are identified as obvious duplicates. In this example, two references were indeed identified as obvious duplicates:
CitDat %>%
select(Title, clean_title:obv_dup_id)
#> # A tibble: 5 x 5
#> Title clean_title clean_title_id has_obv_dup obv_dup_id
#> <chr> <chr> <chr> <lgl> <chr>
#> 1 More, Larger, Si~ more_larger_simpler_h~ ct_04 FALSE dup_01
#> 2 Heritability in ~ heritability_in_plant~ ct_02 TRUE dup_01
#> 3 Hritability in P~ hritability_in_plant_~ ct_03 FALSE dup_01
#> 4 Heritability in ~ heritability_in_plant~ ct_02 TRUE dup_02
#> 5 Estimating broad~ estimating_broad_sens~ ct_01 FALSE dup_01
Note how a single typo (“Hritability”) prevents ct_03
from being detected as an obvious duplicate for ct_02
. For cases like this, one may use CitaviR::find_potential_dups()
, but we will not go so far in this introduction (TO DO LINK TO VIGNETTE).
At this point we have already gained information and could continue with steps 4 and 5. However, sometimes duplicates hold different information as it is the case here for ct_02
and the columns PubMedID
, DOI
and Categories
:
CitDat %>%
filter(clean_title_id == "ct_02") %>%
select(clean_title_id, obv_dup_id, DOI, PubMedID)
#> # A tibble: 2 x 4
#> clean_title_id obv_dup_id DOI PubMedID
#> <chr> <chr> <chr> <chr>
#> 1 ct_02 dup_01 "" "31248886"
#> 2 ct_02 dup_02 "10.1534/genetics.119.302134" ""
In such a scenario it would be best to gather all information into the one non-duplicate (=dup_01
) that will be kept and of interest later on. Here, CitaviR::handle_obvious_dups()
comes in handy:
CitDat <- CitDat %>%
handle_obvious_dups(fieldsToHandle = c("DOI", "PubMedID"))
As can be seen, the columns listed in fieldsToHandle =
are filled up (i.e. tidyr::fill(all_of(fieldsToHandle), .direction = "up")
). By providing a string for nameDupCategories =
, unique entries in the Categories
column are collapsed into the respective entry for dup_01
, while entries for all other obvious duplicates are replaced by the provided string.
CitDat %>%
filter(clean_title_id == "ct_02") %>%
select(clean_title_id, obv_dup_id, DOI, PubMedID)
#> # A tibble: 2 x 4
#> clean_title_id obv_dup_id DOI PubMedID
#> <chr> <chr> <chr> <chr>
#> 1 ct_02 dup_01 "" "31248886"
#> 2 ct_02 dup_02 "10.1534/genetics.119.302134" ""
Therefore, we could now get rid of all obvious duplicates (obv_dup_id =! dup_01
) without losing any information.
Finally, we want to implement the gained information into the Citavi project. For this, we can make use of update_Citavi_ctv6()
. Say we would like to overwrite the old DOI and PubMed information and additionally store the clean_title_id
and obv_dup_id
in Custom field 1 and Custom field 2, respectively. You should probably close your Citavi project before running this:
CitDat %>%
update_Citavi_ctv6(
path = example_path,
CitDBTableName = "Reference",
CitDatVarToCitDBTableVar = c(
"DOI" = "DOI",
"PubMedID" = "PubMedID",
"clean_title_id" = "CustomField1",
"obv_dup_id" = "CustomField2"),
quiet = FALSE
)
#> Sending SQL queries:
#> UPDATE Reference SET DOI = '10.2135/cropsci2017.09.0555' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]'
#> UPDATE Reference SET DOI = '' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]'
#> UPDATE Reference SET DOI = '' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]'
#> UPDATE Reference SET DOI = '10.1534/genetics.119.302134' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]'
#> UPDATE Reference SET DOI = '10.2135/cropsci2018.06.0376' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]'
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]'
#> UPDATE Reference SET PubMedID = '31248886' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]'
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]'
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]'
#> UPDATE Reference SET PubMedID = '' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]'
#> UPDATE Reference SET CustomField1 = 'ct_04' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]'
#> UPDATE Reference SET CustomField1 = 'ct_02' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]'
#> UPDATE Reference SET CustomField1 = 'ct_03' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]'
#> UPDATE Reference SET CustomField1 = 'ct_02' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]'
#> UPDATE Reference SET CustomField1 = 'ct_01' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]'
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["3840ac9c-6193-4b10-9a29-99e3702fcdca"]'
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["49eef40c-3927-4392-992d-2788c222af86"]'
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["41c262fc-19bb-4dd9-9d48-dccfe5d64aa2"]'
#> UPDATE Reference SET CustomField2 = 'dup_02' WHERE StaticIDs = '["a4e2c5f7-5144-4299-8e69-423d8d862bc9"]'
#> UPDATE Reference SET CustomField2 = 'dup_01' WHERE StaticIDs = '["4cc778b0-6078-4bf2-9cae-f949698cb353"]'
If you now open the Citavi project, it should have changed as expected.
If, for whatever reason, you wish to do the export to Citavi not via SQL, but with Excel files, then
CitaviR
offers an alternative approach TO DO LINK