@R:
Authorsc(person(given = "firstname",
family = "lastname",
role = c("aut", "cre"), # There must be a "cre", but there can only be one
email = "your@email.com"),
person(given = "firstname",
family = "lastname",
role = "aut",
email = "your@email.com"))
Lab 8: Creating a Simple R-package
Package(s)
Schedule
- 08.00 - 08.15: Recap of Lab 7
- 08.15 - 08.45: Lecture
- 08.45 - 09.00: Break
- 09.00 - 12.00: Exercises
Learning Materials
Please prepare the following materials:
- Book: R Packages (2e): Welcome!
- Book: R Packages (2e): Introduction
- Book: R Packages (2e): Chapter 1 The Whole Game
- Video: Building R packages with devtools and usethis | RStudio (This is optional and is basically a video walk-through of the reading material)
Learning Objectives
A student who has met the objectives of the session will be able to:
- Prepare a simple R package for distributing documented functions
- Explain the terms Repository, Dependencies, and Namespace
- Implement testing in an R package
- Collaboratively work on an R package on GitHub
Exercises
Read the steps of this exercises carefully while completing them
Introduction
First, make sure to read and discuss the feedback you got from last week’s assignment!
The aim of these exercises is to set up a collaborative coding project using GitHub and collaborate on creating a simple R package that replicates the central dogma of molecular biology.
The exercises will build upon what you learned in Lab 7 and its exercises. But don’t worry too much, the setup steps will be given here as well.
TASKS
If you haven’t read the Primer on R packages yet, consider reading it now.
Task 1 - Setting up the R package
In the first task, one team member (you decide who) will initiate a package project and push it to Github. Then, when instructed, the remaining group members will connect to that repository, and the real package building will begin.
Task 1.1 - Create R Package
- Go to R for Bio Data Science RStudio Server and login.
- Click
Create a project
in the top left and chooseNew Directory
. - Select
R Package
and pick a fitting Package name.- The package you will create will replicate the central dogma of molecular biology. Let that inspire you
- Look up naming rules here
- The most important rule:
_
,-
, and'
are not allowed.
- Tick the “Create git repository” box.
- Click “Create project”.
RStudio will now create an R project for you that contains all necessary files and folders.
- Open the
DESCRIPTION
file and write a title for your package, a small description, and add authors.
- When adding authors, the format is:
- Create an MIT license (from the console)
usethis::use_mit_license("Group name")
- If you didn’t tick the “Create git repository”: Run
usethis::use_git()
. If the “Git” tab is not in the lower left panel: reopen the R project. Optional setup steps (generally good, but not essential for this course)
-
Add a Readme file
usethis::use_readme_rmd( open = FALSE )
. You will do that in the Group Assignment later -
Add a lifecycle badge:
usethis::use_lifecycle_badge( "Experimental" )
-
Write vignettes:
usethis::use_vignette("Vignette name")
-
Add a Readme file
Task 1.2 - Setup GitHub Repository for your group’s R package
During the previous class, you have been using a combination of console and terminal to set up GitHub. This week, you will solely be using the console.
Still only the first team member
- Go to https://github.com/rforbiodatascience24 and go to repositories.
- Click the green
New
button - Create a new repository called
group_X_package
. Remember to replace the X - Select
rforbiodatascience24
as owner - Make the repository
Public
- Click the green
Create repository
button - In this new repository, click the
settings
tab - Click
Collaborators and Team
- Click the green
Add people
button - Invite the remaining group members
- All other team members should now have access to the repository, but do not create your own project yet.
Task 1.3 - Connect your RStudio Server Project to the GitHub Repository
Still only the first team member
- Find your
PAT
key or create a newHow to create a new one
-
Type in the console
usethis::create_github_token()
. You’re going to be redirected to GitHub website. - In case you did that step manually, remember to give permission to access your repositories.
- You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
- Copy the generated token (should start with ghp_) and store it securely (e.g., in a password manager). Do not keep it in a plain file.
-
Type in the console
- Go back to the RStudio Server project
- Type in the console
gitcreds::gitcreds_set()
and paste thePAT
key - Stage all files (in the
git
window in the top right) by ticking all boxes underStaged
- Still in the console, run the following commands. Replace your group number and use your GitHub username and email. You run these to link your project to the GitHub repository you created. Remember to replace the X.
::use_git_remote(name = "origin", "https://github.com/rforbiodatascience24/group_X_package.git", overwrite = TRUE) # Remember to replace the X
usethis::use_git_config(user.name = "USERNAME", user.email = "USEREMAIL@EXAMPLE.ORG")
usethis::git_commit_all("Package setup")
gert::git_push("origin") gert
All other team members
After your teammate has pushed to GitHub.
- In the RStudio Server, create a new project based on the GitHub Repository just created by your teammate.
- Click
Create a project
in the top left. - Choose
Version Control
and thenGit
. - Paste in the repository URL:
https://github.com/rforbiodatascience24/group_X_package.git
. Remember to replace the X. - Give the directory a fitting name and click
Create Project
.
- Click
- Find your
PAT
key or create a newHow to create a new one
-
Type in the console
usethis::create_github_token()
. You’re going to be redirected to GitHub website. - In case you did it manually, remember to give permission to access your repositories.
- You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
- Copy the generated token (should start with ghp_) and store it securely (e.g. in password manager). Do not keep it in a plain file.
-
Type in the console
- Go back to the RStudio Server project
- Type in the console
gitcreds::gitcreds_set()
and paste thePAT
key - Still in the console, run the following commands. Replace your GitHub username and email.
::use_git_config(user.name = "USERNAME", user.email = "USEREMAIL@EXAMPLE.ORG") usethis
If RStudio at any point asks you to log in to GitHub, redo step 4 and 5.
Now you are ready to work on the R package.
Task 2 - Build the package
Each team member should build and implement their own function. The code will be given, but you will be asked to come up with a name for your function and many of the variables, so discuss in the team what naming convention you want to use. Do you want to use snake_case
(common in tidyverse) or camelCase
(common in Bioconductor)? It doesn’t really matter, but be consistent.
Task 2.1 - Incorporate data
In this task you will include the following codon table in your package.
This task should be done by one team member. The rest should follow along.
- Below is a standard codon table. In the console, store the table in an R object with a name of your own choosing.
c("UUU" = "F", "UCU" = "S", "UAU" = "Y", "UGU" = "C",
"UUC" = "F", "UCC" = "S", "UAC" = "Y", "UGC" = "C",
"UUA" = "L", "UCA" = "S", "UAA" = "_", "UGA" = "_",
"UUG" = "L", "UCG" = "S", "UAG" = "_", "UGG" = "W",
"CUU" = "L", "CCU" = "P", "CAU" = "H", "CGU" = "R",
"CUC" = "L", "CCC" = "P", "CAC" = "H", "CGC" = "R",
"CUA" = "L", "CCA" = "P", "CAA" = "Q", "CGA" = "R",
"CUG" = "L", "CCG" = "P", "CAG" = "Q", "CGG" = "R",
"AUU" = "I", "ACU" = "T", "AAU" = "N", "AGU" = "S",
"AUC" = "I", "ACC" = "T", "AAC" = "N", "AGC" = "S",
"AUA" = "I", "ACA" = "T", "AAA" = "K", "AGA" = "R",
"AUG" = "M", "ACG" = "T", "AAG" = "K", "AGG" = "R",
"GUU" = "V", "GCU" = "A", "GAU" = "D", "GGU" = "G",
"GUC" = "V", "GCC" = "A", "GAC" = "D", "GGC" = "G",
"GUA" = "V", "GCA" = "A", "GAA" = "E", "GGA" = "G",
"GUG" = "V", "GCG" = "A", "GAG" = "E", "GGG" = "G")
- Run
usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE, internal = TRUE)
- You have now made the data available to our functions, but we also want to make it visible for our users.
- Run
usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE)
.
- Run
- Write a data manual (document the data).
- All non-internal data should be documented in a
data.R
file in theR
folder. Create it withusethis::use_r("data")
- Add the following scaffold to
R/data.R
and write a very brief description of the data (see an example here). Don’t spend a lot of time here.
- All non-internal data should be documented in a
#' Title
#'
#' Description
#'
#'
#' @source \url{https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1}
"NAME_OF_YOUR_OBJECT"
Normally, you should also describe how the raw data was cleaned. You would do that in the file that opens after running usethis::use_data_raw( name = "NAME_OF_YOUR_OBJECT", open = TRUE )
, but that is less relevant here, so we will skip that part.
Your package now includes some data.
Restart R, clean your Environment (small broom in the Environment tab), run the three lines of code from The Package Workflow section:
::documentSaveAll() # Saves all you files
rstudioapi::document() # Writes all your manuals for you
devtools::load_all() # Simulates library("your package"), allowing you to use your functions devtools
And run ?NAME_OF_YOUR_OBJECT
. OBS For some unknown reason, this does not work on the current R version (4.3.1). Instead, you can click the “Build” tab next to the “Git” tab and then “install”. After installing and attaching, your manual should pop up. Try printing the object as well to see what it looks like print(NAME_OF_YOUR_OBJECT)
.
Push your changes to GitHub. The other team members pull the changes to have the data available to you as well.
Task 2.2 - Implement functions
In this task you will be working individually to implement a function each. If you are fewer in the team than the number of functions. The quickest to finish can implement the remaining, or separate them out as you see fit.
If you want an additional challenge, each team member can create their own Git branch gert::git_branch_checkout("NAME_OF_BRANCH")
and work there. When done, merge your branch with the main branch. That is a more clean and ‘proper’ workflow, but completely optional.
If you are in doubt what the underlying functions do, run ?function_name
to get a hint about their purpose. OBS For some unknown reason, this does not work on the current R version (4.3.1). As mentioned with the data, you can install your package, and then load the manual. Also, remember the functions are replicating the central dogma. Let that inspire you, when naming the functions and variables. If you get stuck, ask your teammates about their functions.
Function five is a bit more involved. Do it together or help your teammate out if it causes problems.
For each function (found below), complete the following steps:
- Look carefully at your function. Choose a fitting name for it
- Run
usethis::use_r("function_name")
to create an.R
file for the function - Paste in the function and rename all places it says
name_me
with fitting names - Click somewhere in the function. In the toolbar at the very top of the page, choose
Code
and clickInsert Roxygen Skeleton
- Fill out the function documentation
- Give the function a title
- Do not write anything after
@export
- The parameters should have the format:
@param param_name description of parameter
- The return has the format:
@return description of function output
- Examples can span multiple lines and what you write will be run, so make it runnable.
- Important! Either fill out everything or delete what you don’t. Otherwise, the package check will fail (Do not delete
@export
for these functions).
- Run the three lines of codes from The Package Workflow section
- If at this point, you get a warning that
NAMESPACE
already exists, delete it and try again.
::documentSaveAll() # Saves all you files
rstudioapi::document() # Writes all your manuals for you
devtools::load_all() # Simulates library("your package"), allowing you to use your functions devtools
- View your function documentation with
?function_name
- Defining a test or a series of tests around your newly created function ensures future corrections will not yield undesired results. Create such a test for your function. Run
usethis::use_test("function_name")
and write a test. Draw some inspiration from here or from running?testthat::expect_equal
.- Skip this step for function five. Instead, write inline code comments for each chunk of code. Press
Cmd+Shift+c
/Ctrl+Shift+C
to comment your currently selected line.
- Skip this step for function five. Instead, write inline code comments for each chunk of code. Press
- Rerun the three lines from The Package Workflow section and check that the package works with
devtools::check()
Briefly about check
devtools::check()
will check every aspect of your package and run the tests you created. You will likely get some warnings or notes like ‘No visible binding for global variables’. They are often harmless, but, if you like, you can get rid of them as described here. The check will tell you what might cause problems with your package and often also how to fix it. If there are any errors, fix those. Warnings and notes are also good to address. Feel free to do that if any pops up.
- If it succeeds without errors, push your changes to GitHub
- If you decided to create branches review last week’s exercises for guidance
- Always pull before pushing
- Use the RStudio GUI if you prefer
- Remember to pull first
- If you chose to create your own branch, merge it with master/main.
Function one
<- function(name_me2){
name_me1 <- sample(c("A", "T", "G", "C"), size = name_me2, replace = TRUE)
name_me3 <- paste0(name_me3, collapse = "")
name_me4 return(name_me4)
}
Function two
<- function(name_me2){
name_me1 <- gsub("T", "U", name_me2)
name_me3 return(name_me3)
}
Function three
<- function(name_me2, start = 1){
name_me1 <- nchar(name_me2)
name_me3 <- substring(name_me2,
codons first = seq(from = start, to = name_me3-3+1, by = 3),
last = seq(from = 3+start-1, to = name_me3, by = 3))
return(codons)
}
Function four
NAME_OF_YOUR_OBJECT
refers to the codon table you stored in Task 2.
<- function(codons){
name_me <- paste0(NAME_OF_YOUR_OBJECT[codons], collapse = "")
name_me2 return(name_me2)
}
Function five
This function will be the first to use dependencies. As a reminder, a dependency is a package that your package depends on. In this case, it will be stringr
and ggplot2
. They are already installed, so you don’t need to do that. The best way to add these packages to your own is to first add them to your package dependencies with usethis::use_package("package_name")
. If you care about reproducibility, you can add min_version = TRUE
to the function call to specify a required minimum package version of the dependency. Run usethis::use_package
for both dependencies.
For the ggplot2
functions, we will use ggplot2::function_name
everywhere a ggplot2
function is used. Also add @import ggplot2
to the function description (this is often done with ggplot2
because it has a lot of useful plotting functions with only rare name overlaps). If you want to be more precise, add @importFrom ggplot2 ggplot aes geom_col theme_bw theme
instead. The same applies for stringr
, but since we only use a few functions, add @importFrom stringr str_split boundary str_count
to the function description.
<- function(name_me2){
name_me1 <- name_me2 |>
name_me3 ::str_split(pattern = stringr::boundary("character"), simplify = TRUE) |>
stringras.character() |>
unique()
<- sapply(name_me3, function(amino_acid) stringr::str_count(string = name_me2, pattern = amino_acid)) |>
counts as.data.frame()
colnames(counts) <- c("Counts")
"Name_me2"]] <- rownames(counts)
counts[[
<- counts |>
name_me4 ::ggplot(ggplot2::aes(x = Name_me2, y = Counts, fill = Name_me2)) +
ggplot2::geom_col() +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "none")
ggplot2
return(name_me4)
}
Task 3 - Group discussion
- Describe each function to each other in order - both what it does and which names you gave them and their variables.
- The person(s) responsible for function five, describe how you added the two packages as dependencies.
- Discuss why it is a good idea to limit the number of dependencies your package has. When can’t it always be avoided?
- Discuss the difference between adding an
@importFrom package function
tag to a function description compared to usingpackage::function()
. Read this section if you are not sure or just want to learn more.
GROUP ASSIGNMENT (Important, see: how to)
For this week’s group assignment, write a vignette (user guide) for your package (max 2 pages). The vignette should include a brief description of what the package is about and demonstrate how each function in the package is used (individually and in conjunction with each other). As a final section, discuss use cases for the package and what other functions could be included. Also include the main points from your discussion in Task 3.
Include a link to your group’s GitHub repository at the top of the vignette. Hand it in as a pdf in DTU Learn.
- Create a vignette with
usethis::use_vignette("your_package_name")
. - When you are done writing it, run
devtools::build_vignettes()
. - At the top line of your vignette, change
rmarkdown::html_vignette
tormarkdown::pdf_document
- Rerun
devtools::build_vignettes()
- the created pdf-file in thedoc
folder is the document to hand it. - Lastly, duplicate the vignette as the GitHub README
- Create a README with
usethis::use_readme_rmd( open = TRUE )
. - Copy the content of the vignette Rmarkdown into the readme Keep the top part of the README as is. If you overwrote it anyway, change
rmarkdown::pdf_document
tormarkdown::github_document
. - Run
devtools::build_readme()
- Push the changes to GitHub.
- Create a README with
Last tip on packages
Next week, I will introduce the golem
package for building production-grade Shiny apps. However, I personally also use it to quickly get going with packages.
When starting out with an R package, it may seem complicated with a lot of things to remember. golem
remembers these things for you. When setting up your package / Shiny app with golem::create_golem("Name of your awesome package/app")
, it creates a dev
folder with a few files listing all you need to get started with an R package.
It does also give you a lot of other things that you will rarely use, and it also sets up some basic structures for Shiny apps. These can simply be deleted if you are not also building a Shiny app (which you will next week).