This R Markdown file accompanies the talk The Potential of Interoperability in Qualitative Research by Szilvia Zörgő. The script below contains the main content of the talk, as well as points to tools with which you can interact and additional resources.
| Resource | URL |
|---|---|
| ROCK Website | https://rock.science |
| Talk repository | https://gitlab.com/szilvia/interoperability_in_qual |
| Rendered version of script | https://szilvia.gitlab.io/interoperability_in_qual |
| Posit Cloud project | https://posit.cloud/content/10502356 |
If you want to enjoy the full potential of this script, create an account on Posit Cloud. To use this script locally, please see any of the ROCK workshops on our website. You do not need to work locally to follow the talk, Posit Cloud has all you need.
The slides are available in the talk repository here.
We may find open datasets in online repositories, but they are usually isolated with poor metadata (e.g., files, creators, DOI, licensing, date/version). Just because a dataset is findable and accessible, doesn’t mean it is interoperable or reusable.
In general, it is a system’s/component’s ability to work with or use (part of) another system. For Open Science, it is a meaningfully interlinked network of (meta)data to enrich contextual knowledge about the data. Means to achieve this goal primarily include increasing machine-readability and reliance on FLOSS for data processing.
There is an ongoing, fundamental debate on whether qualitative data can be shared, aggregated, or reused in a meaningful way. Other than these important questions, there is an overreliance on proprietary software (especially for analysis), and congruently, a lack of open-source infrastructure & tools.
Possible via:
- File naming conventions (for more, see here)
- Qualified references (for more, see here)
- Controlled vocabularies (for more, see here)
- And standards (for more, see here)
For more on making qualitative data FAIR, see here.
Human- and machine-readable, open standard for qualitative
data
We are currently finishing up the manuscript containing a full
description of the standard; a sneak peek pre-publication here.
For a more user-friendly version with instructions on using the R
package {rock} for beginners, see The ROCK
Book.
Or check out this blog post.
This figure does not contain all conventions within the ROCK
standard, but shows what a source may look like (both the plain text and
the rendered HTML version).
For more information on the package, see http://rock.science.
During the talk, I give you a walk-through of the major functionality
from a conceptual perspective, then run the script below in Posit Cloud.
Follow along the conceptual bit, then let’s run the script together!
iROCK Interface for the ROCK to code and segment data
Shiny Apps:
Diamond for coding and segmenting data (instructions here)
Feldspar for creating a qualitative data table
Crystal for using the Qualitative Network Approach
For more Shiny apps, see http://rock.science.
Part of the script below is uploaded to the Posit Cloud project for this talk. I will be running these commands and then generating an HTML version as an analysis script.
### Optionally install the cutting-edge version of the {rock} package
install.packages(
pkgs = 'https://codeberg.org/R-packages/rock/archive/dev.tar.gz',
type = 'source',
repos = NULL
);
## Installing package into 'C:/Users/szilv/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
knitr::opts_chunk$set(
echo = TRUE,
eval = TRUE,
comment = ""
);
basePath <- here::here();
dataPath <- file.path(basePath, "data");
dataPath_coded <- file.path(dataPath, "040---coded-sources");
scriptsPath <- file.path(basePath, "scripts");
resultsPath <- file.path(basePath, "results");
Three plain text files containing data (i.e., “sources”) have been placed into the “010—raw-sources” subdirectory located within the data directory. Also, there are also some attributes of the mock data providers listed in the file called “case-attributes”.
The cleaning command places each of the sentences in your data on a new line. The {rock} package enables you to code data line-by-line, and recognizes newline characters as indicators of this, lowest level of segmentation. The chunk below will write the cleaned sources found in “010—raw-sources” into the subdirectory “020—cleaned-sources”.
rock::clean_sources(
input = file.path(dataPath, "010---raw-sources"),
output = file.path(dataPath, "020---cleaned-sources"),
preventOverwriting = FALSE
);
You may choose to add a unique identifier to each line of data (i.e., “utterances”). This is helpful, for example, if you want to merge different versions of the coded sources into a source that contains all codes applied by multiple researchers. The chunk below will write the sources with uids into the subdirectory “030—sources-with-uids”.
rock::prepend_ids_to_sources(
input = file.path(dataPath, "020---cleaned-sources"),
output = file.path(dataPath, "030---sources-with-uids"),
preventOverwriting = FALSE
);
This command will assemble all your coded sources and attributes into an R object that can be employed to run analyses and other commands below. (Note, coded sources and attributes have been pre-added for your convenience.)
dat <-
rock::parse_sources(
dataPath,
regex = "_coded|attributes"
);
This command allows you to collect and inspect coded fragments for certain codes, you can use the command below by changing the code labels “CodeA” and “CodeB” to the codes you’d like to inspect. You can modify the amount of context you wish to have around the coded utterance by changing “2” to any other number.
rock::inspect_coded_sources(
path = here::here("data", "040---coded-sources"),
fragments_args = list(
codes = "CodeA|CodeB",
context = 2
)
);
001_Source_cleaned_withUIDs_coded.rock
002_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
001_Source_cleaned_withUIDs_coded.rock
001_Source_cleaned_withUIDs_coded.rock
001_Source_cleaned_withUIDs_coded.rock
001_Source_cleaned_withUIDs_coded.rock
002_Source_cleaned_withUIDs_coded.rock
002_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
003_Source_cleaned_withUIDs_coded.rock
With this command, {rock} creates a code tree, which can be flat or hierarchical depending on the employed codes. In this talk, we use a flat code structure.
rock::show_fullyMergedCodeTrees(dat)
Inspect attributes of data providers (cases) or other specified aspects of data provision.
rock::show_attribute_table(dat)
| cid | sex | age | edu | Group |
|---|---|---|---|---|
| 1 | f | 40s | MA | Expert |
| 2 | f | 30s | BA | Novice |
| 3 | m | 20s | MA | Novice |
This command will allow you to see a bar chart of the code frequencies within the various sources they were applied. The command also produces a legend at the bottom of the visual to help identify the sources based on color.
rock::code_freq_hist(
dat
);
Code co-occurrences can be visualized with a heatmap. This representation will use colors to indicate the code co-occurrence frequencies. Co-occurrences are defined as two or more codes occurring on the same line of data (utterance).
rock::create_cooccurrence_matrix(
dat,
plotHeatmap = TRUE);
CodeA CodeB CodeC CodeD
CodeA 6 1 2 1
CodeB 1 8 1 3
CodeC 2 1 4 0
CodeD 1 3 0 11
This command will enable a tabularized version of your dataset, which for example, can be employed to further process your data with software such as Epistemic Network Analysis (https://www.epistemicnetwork.org), or “merely” represent your coded data in a single file. In this dataset, rows are constituted by utterances, columns by variables and data. The file will be an Excel called “mergedSourceDf” located in the results subdirectory.
rock::export_mergedSourceDf_to_xlsx(
dat,
file.path(resultsPath,
"mergedSourceDf.xlsx")
)
Warning in export_mergedSourceDf_to_file(x = x, file = file, exportArgs =
exportArgs, : The file you specified to export to
(D:/Sync/ROCK/workshops+talks/FAIR_Coffees_UM/interoperability_in_qual/results/mergedSourceDf.xlsx)
already exists, and `preventOverwriting` is set to `TRUE`, so I'm not writing
to disk. To override this, pass `preventOverwriting=FALSE`.
If multiple coders are applying different codes or coding schemes to the same dataset, or if a single coder is applying different codes in different rounds of coding, then merging coded sources may be useful. Merging means that you combine different coded versions of the same source into a “master” source that contains all applied codes. Merging is made possible via unique utterance identifiers (uids).
Some pre-coded versions of the data have been added to the subdirectory “041—coded-sources-for-merging”. A good practice is to create a “slug” for each coded version of the sources, for example, “_coder1” and “_coder2”, which you will see for the mock data. You need to choose a version of the coded source to be the foundation upon which the other versions are merged (indicated by “primarySourcesRegex” in the code below). For example, the command below says that all versions of each source should be “collapsed” onto the version with the slug: “_coder1”. The command below will write the merged sources into the same directory as where it found them, resulting in a merged version for each source that you placed into that directory.
rock::merge_sources(input = here::here("data",
"041---coded-sources-for-merging"),
output = "same",
primarySourcesPath = here::here("data",
"041---coded-sources-for-merging"),
primarySourcesRegex = "_coder1\\.rock");
With the knitting function, we generate a rendered version of the analysis script and all outputs we viewed. Go to the top of the interface where it says “knit” and click.
Currently, there are three ongoing projects in which we work towards making qualitative data more interoperable and qualitative research more transparent.
(PI: Anna Harris, Maastricht University)
The ERC-funded grant examines creative repurposing in hospitals on an
international scale. Part of the grant involves promoting Open | Science
principles by developing novel methods/infrastructure (e.g., a rich
metadata template for qualitative data) and setting an | | example in
open ethnographic data.
More information on this project here.
(PI: Szilvia Zörgő, Maastricht University)
Based on existing infrastructure (Psychological Construct Repository, Psycore), we aim to | |
develop a set of interlinked repositories designed to support specific
methodological approaches.
More information on this project here.
(PI: GJ Peters, Open University)
The NWO grant enables us to develop a comprehensive interface for the
ROCK encompassing most of the functionality of the R package.
More information on this project here.
If you want to know more, or start a project adhering to the ROCK
standard, and you cannot find the answers you’re looking for in the
disclosed resources, feel free to get in touch!
ROCK official: info@rock.science
Szilvia Zörgő: s.zorgo@maastrichtuniversity.nl
GJ Peters: gjalt-jorn.peters@ou.nl
For more on ROCK terminology, see here.
The Reproducible Open Coding Kit (ROCK) standard is licensed under
CC0 1.0 Universal.
The {rock} R package is licensed under a GNU General Public License; for
more see: https://rock.science.
ROCK citation:
Gjalt-Jorn Ygram Peters and Szilvia Zörgő (2025). rock: Reproducible
Open Coding Kit. R package version 0.9.7. https://rock.opens.science
For more on ROCK materials licensing and citation, please see: https://rock.opens.science/authors.html#citation.