process.phenotypes: automated phenotype standardization and reporting

Overview

This is an R package designed to help the process of phenotype dataset cleaning be automated, rigorous, and transparent. The overall cleaning process is simplified as follows:

A phenotype spreadsheet is exported to any of several supported formats:
- .tsv (plaintext, tab-delimited)
- .dta (STATA format)
- .sas7bdat (SAS format, with accompanying .sas code for category labels)
- .zsav (SPSS format)
The phenotype dataset is configured in YAML format. this allows the user to specify the expected data format (binary, categorical, ordinal, numeric, date, blood pressure, string), boundary conditions for numeric values, levels and alias for binary/categorical/ordinal variables, special values to be encoded as NA (missing) entries, and other restrictions.
The entire cleaning process for the file is run with a single R command.
After cleaning is complete, an html format report is emitted, reporting summary statistics and data cleaning observations (e.g. invalid values detected for categorical variables); this file is both for recordkeeping and for helping the user improve configuration for more refined cleaning.

Documentation

Please see any of the following documentation:

This package's Read the Docs contains extended documentation about various aspects of the package, including installation, configuration, and use.
Once installed, the standard R man pages (accessed, for example, with ?create.phenotype.report) contain extensive function interface documentation and examples.
The package has several useful vignettes, specifically covering manual dataset configuration, configuration from SurveyCTO form definitions, and the creation of derived variables, among other topics; see the doc directory of the GitLab repo or vignette(package = "process.phenotypes") in R for more.

Version History

See changelog for more information.

19 Aug 2022: public release; tagged version 1.4.0
01 Aug 2022: complete unit test coverage; refactor report and assorted minor fixes
24 May 2022: merge CTO dataset configuration files and corresponding added functionality
21 Sep 2021: initial release v1.0.0
27 Aug 2021: derived variables, transformations, many assorted improvements, and better readme
12 Jul 2021: string_cleanup branch merged into default; v0.1.0

How to contribute to development

Step 1: Set up a development environment (OSX and Linux only)

If needed, install miniconda by following the steps here.
If needed, install mamba: conda install mamba
Clone a copy of this repo:

git clone https://gitlab.com/data-analysis5/phenotypes/process.phenotypes.git
# or 
git clone git@gitlab.com:data-analysis5/phenotypes/process.phenotypes.git

Navigate into the repo directory: cd process.phenotypes
Create a conda environment with, minimally, the dependencies defined in r-dev.yaml. Make sure to activate your dev environment whenever you are writing/committing code!

# create the env
mamba env create -f r-dev.yaml

# activate the env
conda activate r-dev

Install commitizen as follows

npm install -g commitizen cz-conventional-changelog
commitizen init cz-conventional-changelog --save-dev --save-exact

Set up pre-commit hook scripts. This will apply linting and check for some common code formatting errors every time you commit. See https://pre-commit.com/ for more details.

pre-commit install

Install pre-commit in R (either in an R terminal or in Rstudio):

install.packages("precommit")

Step 2: Select an issue to work on, or submit one that you'd like to address

See the current issues for this project.

Step 3: Contribute code

All development work should branch off of the dev branch. Make sure you're on the right branch: git checkout dev
Make sure your repository is up-to-date with the remote: git pull origin dev
Create a new branch named for the feature you're going to work on: git checkout <feature_branch>
Write code and commit often!
- Stage changes with git add .
- Commit code with git cz; make sure to cite the issue number you're working on
- Push your changes to the remote repository with git push origin <feature_branch>
When you're all done, submit a merge request here. Other developers will review your code, make comments, and merge in your changes when ready!

Name		Name	Last commit message	Last commit date
Latest commit History 580 Commits
R		R
doc		doc
inst		inst
man		man
readthedocs		readthedocs
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.lintr		.lintr
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
environment.yaml		environment.yaml
r-dev.yaml		r-dev.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

process.phenotypes: automated phenotype standardization and reporting

Overview

Documentation

Version History

How to contribute to development

Step 1: Set up a development environment (OSX and Linux only)

Step 2: Select an issue to work on, or submit one that you'd like to address

Step 3: Contribute code

About

Licenses found

Releases 1

Packages

Languages

License

Licenses found

lightning-auriga/process.phenotypes

Folders and files

Latest commit

History

Repository files navigation

process.phenotypes: automated phenotype standardization and reporting

Overview

Documentation

Version History

How to contribute to development

Step 1: Set up a development environment (OSX and Linux only)

Step 2: Select an issue to work on, or submit one that you'd like to address

Step 3: Contribute code

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages