CollaboratoR: an R workflow that uses Google Sheets and GitHub to reduce errors in collaborative data entry
This paper presents CollaboratoR, an open-source R package and workflow designed to make collaborative data entry more consistent and transparent. The system uses shared Google Sheets for people to enter data, then runs automated checks to catch common errors. It pushes tracked changes to GitHub so every update is recorded and the final files are standardized for analysis.
The researchers built CollaboratoR to sit between ad-hoc spreadsheets and more complex data-extraction systems. Data are entered in Google Sheets, validated against predefined rules, and converted to comma-separated values (CSV) files. Changes are tracked with git on GitHub, which provides version control — a way to record who changed what and when. After a human verifies records, the data are re-validated to make sure corrections did not introduce new problems.
The team tested the workflow in two real databases: a plant competition meta-analysis and an avian interaction database (AvianMetaNetwork). In both cases, the automated checks flagged common entry and formatting problems early. This gave better traceability of edits and reduced time spent cleaning the data after they were merged.
CollaboratoR matters because many research teams struggle with messy, inconsistent entries when many people work on the same dataset. The workflow aims to make data more reproducible and easier to combine by enforcing rules and recording changes. It also follows FAIR data principles (findable, accessible, interoperable, reusable), which help others discover and reuse the data.
There are some important caveats. CollaboratoR uses Google Sheets and GitHub as part of the workflow, so it depends on those services and requires some setup. The package is customizable in R, so teams will need some R skills to adapt it to their needs. The paper reports success in two case studies, but broader performance across other fields or larger, differently structured projects may vary. The authors also note a planned next step: adding continuous integration (CI) so automated quality checks run on every update and give rapid feedback to contributors.