Detect the language that the abstract or other fields are written in

detect_language(
  CitDat,
  fieldsToDetectIn = c("Abstract"),
  wantedLanguage = c("english")
)

Arguments

CitDat

A dataframe/tibble possibly returned by read_Citavi_xlsx.

fieldsToDetectIn

Character vector with names of fields whose text language should be detected. Default is c("Abstract"). When multiple fields are given (e.g. c("Abstract", "Title")), they are combined into a single string whose language is then detected.

wantedLanguage

Character vector with names of languages that are desired. Default is c("english"). If not set to NULL, a new column det_lang_wanted is created, which is TRUE if the detected language in det_lang is a wanted language.

Value

A tibble containing at least one additional column: det_lang.

Details

[Experimental]
The underlying core function determining the language is textcat::textcat().

Examples

if (FALSE) { CitDat <- CitaviR::diabetesprevalence %>% dplyr::slice(1952:1955, 4390:4393) CitDat %>% detect_language() %>% dplyr::select(Abstract, det_lang, det_lang_wanted) }