Citing the packages, modules and softwares you used for your analysis is important, both from a reproducibility perspective (statistical routines are often implemented in different ways by different packages, which could explain slight discrepancies in the results. Saying “I did this using this function from that package version 1.2.3” is a way of protecting yourself by being clear about what you have found doing what you have done) but also for acknowledging the work and time that people spent creating tools for others (sometimes at the expense of their own research).
- That’s great, but how to actually cite them?
- I used about 100 packages, should I cite them all?
What should I cite?
Ideally, you should indeed cite all the packages that you used. However, it’s not very diegetic. Therefore, I would recommand the following:
- Cite the main / important packages in the manuscript
This should be done for the packages that were central to your specific analysis (i.e., that got you the results that you reported) rather than data manipulation tools (even though these are as much important).
For example:
Statistics were done using R 3.5.0 (R Core Team, 2018), the rstanarm (v2.13.1; Gabry & Goodrich, 2016) and the psycho (v0.3.4; Makowski, 2018) packages. The full reproducible code is available in Supplementary Materials.
- Present everything in Supplementary Materials
Then, in Supplementary Materials, you show the packages and functions you used. Moreover, in R, you can include (usually at the end) every used package and their version using the sessionInfo()
function.
How should I cite it?
Finding the right citation information is sometimes complicated. In R, this process is made quite easy, you simply run citation("packagename")
. For instance, citation("dplyr")
:
To cite ‘dplyr’ in publications use:
Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2018). dplyr: A Grammar of Data Manipulation. R package version
0.7.6. https://CRAN.R-project.org/package=dplyr
A BibTeX entry for LaTeX users is
@Manual{,
title = {dplyr: A Grammar of Data Manipulation},
author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
year = {2018},
note = {R package version 0.7.6},
url = {https://CRAN.R-project.org/package=dplyr},
}
For other languages, such as Python or Julia, it might be a little trickier, but a quick search on google (or github) should provide you with all the necessary information (version, authors, date). It’s better to have a slightly incomplete citation than no citation at all.
Previous blogposts
- The end of errors in ANOVA reporting
- Variable vs. Participant-wise Standardization
- Formatted Correlation with Effect Size
- Extracting a Reference Grid of your Data for Machine Learning Models Visualization
- Copy/paste t-tests Directly to Manuscripts
- Easy APA Formatted Bayesian Correlation
- Fancy Plot (with Posterior Samples) for Bayesian Regressions
- How Many Factors to Retain in Factor Analysis
- Beautiful and Powerful Correlation Tables
- Format and Interpret Linear Mixed Models
- How to do Repeated Measures ANOVAs
- Standardize (Z-score) a dataframe
- Compute Signal Detection Theory Indices
- Installing R, R Studio and psycho