Lab 8: Creating a Simple R-package

Package(s)

Schedule

Learning Materials

Please prepare the following materials:

Learning Objectives

A student who has met the objectives of the session will be able to:

  • Prepare a simple R package for distributing documented functions
  • Explain the terms Repository, Dependencies, and Namespace
  • Implement testing in an R package
  • Collaboratively work on an R package on GitHub

Exercises

Read the steps of this exercises carefully while completing them

Introduction

The aim of these exercises is to set up a collaborative coding project using GitHub and collaborate on creating a simple R package that replicates the central dogma of molecular biology.

The exercises will build upon what you learned in Lab 7 and its exercises. But don’t worry too much, the setup steps will be given here as well.


TASKS

If you haven’t read the Primer on R packages yet, consider reading it now.

Task 1 - Setting up the R package

In the first task, one team member (you decide who) will initiate a package project and push it to Github. Then, when instructed, the remaining group members will connect to that repository, and the real package building will begin.

Task 1.1 - Create R Package

  1. Go to R for Bio Data Science RStudio Server and login.
  2. Click Create a project in the top left and choose New Directory.
  3. Select R Package and pick a fitting Package name.
    • The package you will create will replicate the central dogma of molecular biology. Let that inspire you
    • Look up naming rules here
    • The most important rule: _, -, and ' are not allowed.
  4. Tick the “Create git repository” box.
  5. Click “Create project”.

RStudio will now create an R project for you that contains all necessary files and folders.

  1. Open the DESCRIPTION file and write a title for your package, a small description, and add authors.
  • When adding authors, the format is:
Authors@R:
  c(person(given = "firstname",
           family = "lastname",
           role = c("aut", "cre"), # There must be a "cre", but there can only be one
           email = "your@email.com"), 
    person(given = "firstname",
           family = "lastname",
           role = "aut",
           email = "your@email.com"))
  1. Create an MIT license (from the console) usethis::use_mit_license("Group name")
  2. If you didn’t tick the “Create git repository”: Run usethis::use_git(). If the “Git” tab is not in the lower left panel: reopen the R project.
  3. Optional setup steps (generally good, but not essential for this course)
    • Add a Readme file usethis::use_readme_rmd( open = FALSE ). You will do that in the Group Assignment later
    • Add a lifecycle badge: usethis::use_lifecycle_badge( "Experimental" )
    • Write vignettes: usethis::use_vignette("Vignette name")

Task 1.2 - Setup GitHub Repository for your group’s R package

During the previous class, you have been using a combination of console and terminal to set up GitHub. This week, you will solely be using the console.

Still only the first team member

  1. Go to https://github.com/rforbiodatascience23
  2. Click the green New-button
  3. Create a new repository called group_X_package. Remember to replace the X
  4. Select rforbiodatascience23 as owner
  5. Make the repository Public
  6. Click the green Create repository-button
  7. In this new repository, click the settings tab
  8. Click Collaborators and Team
  9. Click the green Add people-button
  10. Invite the remaining group members
  11. All other team members should now have access to the repository, but do not create your own project yet.

Task 1.3 - Connect your RStudio Server Project to the GitHub Repository

Still only the first team member

  1. Find your PAT key or create a new
    How to create a new one
    • Type in the console usethis::create_github_token(). You’re going to be redirected to GitHub website.
    • In case you did that step manually, remember to give permission to access your repositories.
    • You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
    • Copy the generated token (should start with ghp_) and store it securely (e.g., in a password manager). Do not keep it in a plain file.
  2. Go back to the RStudio Server project
  3. Type in the console gitcreds::gitcreds_set() and paste the PAT key
  4. Stage all files (in the git window in the top right) by ticking all boxes under Staged
  5. Still in the console, run the following commands. Replace your group number and use your GitHub username and email. You run these to link your project to the GitHub repository you created. Remember to replace the X.
usethis::use_git_remote(name = "origin", "https://github.com/rforbiodatascience23/group_X_package.git", overwrite = TRUE) # Remember to replace the X
usethis::use_git_config(user.name = "USERNAME", user.email = "USEREMAIL@EXAMPLE.ORG")
gert::git_commit_all("Package setup")
gert::git_push("origin")

All other team members

After your teammate has pushed to GitHub.

  1. In the RStudio Server, create a new project based on the GitHub Repository just created by your teammate.
    • Click Create a project in the top left.
    • Choose Version Control and then Git.
    • Paste in the repository URL: https://github.com/rforbiodatascience23/group_X_package.git. Remember to replace the X.
    • Give the directory a fitting name and click Create Project.
  2. Find your PAT key or create a new
    How to create a new one
    • Type in the console usethis::create_github_token(). You’re going to be redirected to GitHub website.
    • In case you did it manually, remember to give permission to access your repositories.
    • You need to type your password. Don’t change the default settings of creating the token except for the description – set ‘RStudio Server’ or something similar. Then hit Generate token.
    • Copy the generated token (should start with ghp_) and store it securely (e.g. in password manager). Do not keep it in a plain file.
  3. Go back to the RStudio Server project
  4. Type in the console gitcreds::gitcreds_set() and paste the PAT key
  5. Still in the console, run the following commands. Replace your GitHub username and email.
usethis::use_git_config(user.name = "USERNAME", user.email = "USEREMAIL@EXAMPLE.ORG")

If RStudio at any point asks you to log in to GitHub, redo step 4 and 5.

Now you are ready to work on the R package.

Task 2 - Build the package


Each team member should build and implement their own function. The code will be given, but you will be asked to come up with a name for your function and many of the variables, so discuss in the team what naming convention you want to use. Do you want to use snake_case (common in tidyverse) or camelCase (common in Bioconductor)? It doesn’t really matter, but be consistent.


Task 2.1 - Incorporate data

In this task you will include the following codon table in your package.

This task should be done by one team member. The rest should follow along.

  1. Below is a standard codon table. Store it in an object with a name of your own choosing.
c("UUU" = "F", "UCU" = "S", "UAU" = "Y", "UGU" = "C",
  "UUC" = "F", "UCC" = "S", "UAC" = "Y", "UGC" = "C",
  "UUA" = "L", "UCA" = "S", "UAA" = "_", "UGA" = "_",
  "UUG" = "L", "UCG" = "S", "UAG" = "_", "UGG" = "W",
  "CUU" = "L", "CCU" = "P", "CAU" = "H", "CGU" = "R",
  "CUC" = "L", "CCC" = "P", "CAC" = "H", "CGC" = "R",
  "CUA" = "L", "CCA" = "P", "CAA" = "Q", "CGA" = "R",
  "CUG" = "L", "CCG" = "P", "CAG" = "Q", "CGG" = "R",
  "AUU" = "I", "ACU" = "T", "AAU" = "N", "AGU" = "S",
  "AUC" = "I", "ACC" = "T", "AAC" = "N", "AGC" = "S",
  "AUA" = "I", "ACA" = "T", "AAA" = "K", "AGA" = "R",
  "AUG" = "M", "ACG" = "T", "AAG" = "K", "AGG" = "R",
  "GUU" = "V", "GCU" = "A", "GAU" = "D", "GGU" = "G",
  "GUC" = "V", "GCC" = "A", "GAC" = "D", "GGC" = "G",
  "GUA" = "V", "GCA" = "A", "GAA" = "E", "GGA" = "G",
  "GUG" = "V", "GCG" = "A", "GAG" = "E", "GGG" = "G")
  1. Run usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE, internal = TRUE)
  2. You have now made the data available to our functions, but we also want to make it visible for our users.
    • Run usethis::use_data(NAME_OF_YOUR_OBJECT, overwrite = TRUE).
  3. Write a data manual (document the data).
    • All non-internal data should be documented in a data.R file in the R folder. Create it with usethis::use_r("data")
    • Add the following scaffold to R/data.R and write a very brief description of the data (see an example here). Don’t spend a lot of time here.
#' Title
#' 
#' Description
#' 
#' 
#' @source \url{https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1}
"NAME_OF_YOUR_OBJECT"

Normally, you should also describe how the raw data was cleaned. You would do that in the file that opens after running usethis::use_data_raw( name = "NAME_OF_YOUR_OBJECT", open = TRUE ), but that is less relevant here, so we will skip that part.

Your package now includes some data.

Restart R, clean your Environment (small broom in the Environment tab), run the three lines of code from The Package Workflow section:

rstudioapi::documentSaveAll()  # Saves all you files
devtools::document()  # Writes all your manuals for you
devtools::load_all()  # Simulates library("your package"), allowing you to use your functions

And run ?NAME_OF_YOUR_OBJECT. OBS For some unknown reason, this does not work on the current R version (4.3.1). Instead, you can click the “Build” tab next to the “Git” tab and then “install”. After installing and attaching, your manual should pop up. Try printing the object as well to see what it looks like print(NAME_OF_YOUR_OBJECT).

Push your changes to GitHub. The other team members pull the changes to have the data available to you as well.


Task 2.2 - Implement functions

In this task you will be working individually to implement a function each. If you are fewer in the team than the number of functions. The quickest to finish can implement the remaining, or separate them out as you see fit.

If you want an additional challenge, each team member can create their own Git branch gert::git_branch_checkout("NAME_OF_BRANCH") and work there. When done, merge your branch with the main branch. That is a more clean and ‘proper’ workflow, but completely optional.

If you are in doubt what the underlying functions do, run ?function_name to get a hint about their purpose. OBS For some unknown reason, this does not work on the current R version (4.3.1). As mentioned with the data, you can install your package, and then load the manual. Also, remember the functions are replicating the central dogma. Let that inspire you, when naming the functions and variables. If you get stuck, ask your teammates about their functions.

Function five is a bit more involved. Do it together or help your teammate out if it causes problems.

For each function (found below), complete the following steps:

  1. Look carefully at your function. Choose a fitting name for it
  2. Run usethis::use_r("function_name") to create an .R file for the function
  3. Paste in the function and rename all places it says name_me with fitting names
  4. Click somewhere in the function. In the toolbar at the very top of the page, choose Code and click Insert Roxygen Skeleton
  5. Fill out the function documentation
    • Give the function a title
    • Do not write anything after @export
    • The parameters should have the format: @param param_name description of parameter
    • The return has the format: @return description of function output
    • Examples can span multiple lines and what you write will be run, so make it runnable.
    • Important! Either fill out everything or delete what you don’t. Otherwise, the package check will fail (Do not delete @export for these functions).
  6. Run the three lines of codes from The Package Workflow section
  • If at this point, you get a warning that NAMESPACE already exists, delete it and try again.
rstudioapi::documentSaveAll()  # Saves all you files
devtools::document()  # Writes all your manuals for you
devtools::load_all()  # Simulates library("your package"), allowing you to use your functions
  1. View your function documentation with ?function_name
  2. Defining a test or a series of tests around your newly created function ensures future corrections will not yield undesired results. Create such a test for your function. Run usethis::use_test("function_name") and write a test. Draw some inspiration from here or from running ?testthat::expect_equal.
    • Skip this step for function five. Instead, write inline code comments for each chunk of code. Press Cmd+Shift+c / Ctrl+Shift+C to comment your currently selected line.
  3. Rerun the three lines from The Package Workflow section and check that the package works with devtools::check()
    • Briefly about check devtools::check() will check every aspect of your package and run the tests you created. You will likely get some warnings or notes like ‘No visible binding for global variables’. They are often harmless, but, if you like, you can get rid of them as described here. The check will tell you what might cause problems with your package and often also how to fix it. If there are any errors, fix those. Warnings and notes are also good to address. Feel free to do that if any pops up.
  4. If it succeeds without errors, push your changes to GitHub
Function one
name_me1 <- function(name_me2){
  name_me3 <- sample(c("A", "T", "G", "C"), size = name_me2, replace = TRUE)
  name_me4 <- paste0(name_me3, collapse = "")
  return(name_me4)
}
Function two
name_me1 <- function(name_me2){
  name_me3 <- gsub("T", "U", name_me2)
  return(name_me3)
}
Function three
name_me1 <- function(name_me2, start = 1){
  name_me3 <- nchar(name_me2)
  codons <- substring(name_me2,
                      first = seq(from = start, to = name_me3-3+1, by = 3),
                      last = seq(from = 3+start-1, to = name_me3, by = 3))
  return(codons)
}
Function four

NAME_OF_YOUR_OBJECT refers to the codon table you stored in Task 2.

name_me <- function(codons){
  name_me2 <- paste0(NAME_OF_YOUR_OBJECT[codons], collapse = "")
  return(name_me2)
}
Function five

This function will be the first to use dependencies. As a reminder, a dependency is a package that your package depends on. In this case, it will be stringr and ggplot2. They are already installed, so you don’t need to do that. The best way to add these packages to your own is to first add them to your package dependencies with usethis::use_package("package_name"). If you care about reproducibility, you can add min_version = TRUE to the function call to specify a required minimum package version of the dependency. Run usethis::use_package for both dependencies.

For the ggplot2 functions, we will use ggplot2::function_name everywhere a ggplot2 function is used. Also add @import ggplot2 to the function description (this is often done with ggplot2 because it has a lot of useful plotting functions with only rare name overlaps). If you want to be more precise, add @importFrom ggplot2 ggplot aes geom_col theme_bw theme instead. The same applies for stringr, but since we only use a few functions, add @importFrom stringr str_split boundary str_count to the function description.

name_me1 <- function(name_me2){
  name_me3 <- name_me2 |>  
    stringr::str_split(pattern = stringr::boundary("character"), simplify = TRUE) |>
    as.character() |> 
    unique()
  
  counts <- sapply(name_me3, function(amino_acid) stringr::str_count(string = name_me2, pattern =  amino_acid)) |>  
    as.data.frame()
  
  colnames(counts) <- c("Counts")
  counts[["Name_me2"]] <- rownames(counts)
  
  name_me4 <- counts |>  
    ggplot2::ggplot(ggplot2::aes(x = Name_me2, y = Counts, fill = Name_me2)) +
    ggplot2::geom_col() +
    ggplot2::theme_bw() +
    ggplot2::theme(legend.position = "none")
  
  return(name_me4)
}


Task 3 - Group discussion

  1. Describe each function to each other in order - both what it does and which names you gave them and their variables.
  2. The person(s) responsible for function five, describe how you added the two packages as dependencies.
  3. Discuss why it is a good idea to limit the number of dependencies your package has. When can’t it always be avoided?
  4. Discuss the difference between adding an @importFrom package function tag to a function description compared to using package::function(). Read this section if you are not sure or just want to learn more.

GROUP ASSIGNMENT


For this week’s group assignment, write a vignette (user guide) for your package (max 2 pages). The vignette should include a brief description of what the package is about and demonstrate how each function in the package is used (individually and in conjunction with each other). As a final section, discuss use cases for the package and what other functions could be included. Also include the main points from your discussion in Task 3.

Include a link to your group’s GitHub repository at the top of the vignette. Hand it in as a pdf in DTU Learn.

  1. Create a vignette with usethis::use_vignette("your_package_name").
  2. When you are done writing it, run devtools::build_vignettes().
  3. At the top line of your vignette, change rmarkdown::html_vignette to rmarkdown::pdf_document
  4. Rerun devtools::build_vignettes() - the created pdf-file in the doc-folder is the document to hand it.
  5. Lastly, duplicate the vignette as the GitHub README
    • Create a README with usethis::use_readme_rmd( open = TRUE ).
    • Copy the content of the vignette Rmarkdown into the readme Keep the top part of the README as is. If you overwrote it anyway, change rmarkdown::pdf_document to rmarkdown::github_document.
    • Run devtools::build_readme()
    • Push the changes to GitHub.

Last tip on packages


Next week, I will introduce the golem package for building production-grade Shiny apps. However, I personally also use it to quickly get going with packages.

When starting out with an R package, it may seem complicated with a lot of things to remember. golem remembers these things for you. When setting up your package / Shiny app with golem::create_golem("Name of your awesome package/app"), it creates a dev folder with a few files listing all you need to get started with an R package.

It does also give you a lot of other things that you will rarely use, and it also sets up some basic structures for Shiny apps. These can simply be deleted if you are not also building a Shiny app (which you will next week).