1 Creating an R Package

This tutorial will take you through the steps of creating a new package in R.

There are five primary tasks:

Create a package skeleton (set of empty files and folders).
Add your R code to the skeleton directory.
Create documentation for your package (similar to knitting RMD files).
Install your new package on your computer and test it out.
Upload the package to GitHub.

The entire script you need to build a package will look something like this.

# set your working directory
# if you don't want to create
# the package in your default 
# working directory

getwd()   # default directory, usually my documents 
library(devtools)

# step 1
usethis::create_package( "montyhall" )

# step 2 move R script to montyhall/R folder
# after completing documentation fields 

# step 3
setwd( "montyhall" )
devtools::document()

# step 4
setwd( ".." )
devtools::install( "montyhall" )
library( montyhall )
create_game()

# step 5: close R and re-open new console
devtools::install_github( "yourGitHubName/montyhall" )

These five steps are explained in detail below. The annotated script explains some of the i’s that need dotting and t’s that need to be crossed while you create a package. For example, different steps are executed while you are inside different folders, so you need to mind your working directories. Roxygen comments are sensitive to formatting, etc. Similar to errors you have encountered when learning how to use RMD documents, these are the types of details that will trip you up the first time you create a package.

Once you have completed the process, a key takeaway from the exercise is that packages are NOT hard to create. R has become a successful language because packages are easy to build, which means that barriers to sharing cool ideas is low, which means more people create packages, which means that the R ecosystem becomes dynamic and robust.

R has been much more successful as a social network than as a computer language[*]. Packages are not only useful for thought leaders that want to share innovative solutions with a global community by uploading packages to the CRAN. They are also great for amplifying expertise within organizations (senior analysts sharing existing code with junior analysts), for documentation (institutionalization of knowledge that prevents disruption as a result of turnover), and quality control. In this instance packages might be used internally and not shared broadly. They can be a powerful management tool, not just a tool for elite R users.

ASIDE ON R AS A LANGUAGE

[*] There is an important caveat to the statement that R is not successful because of its prowess as a computer language. Specifically, R is a powerful language designed for data programming, statistical analysis, and scientific computing. For a high-level language that is easy to learn it does A LOT OF THINGS well, which makes it a great general toolkit for data scientists.

For example, it can utilize object-oriented frameworks, it has packages that optimize speed on specific tasks, it you can run database queries directly in R. However, if object-oriented features are the most important requirements for your project then Java is better suited, if speed is the most important requirement then you might write some of the code in C++, and if database queries are the most important requirement then SQL will be useful. R can do all of the things these other languages can do, but it can’t do them as well as a language designed for a specific purpose.

You will often see the argument made that Python is a better

https://github.com/matloff/R-vs.-Python-for-Data-Science

https://qz.com/1661487/hadley-wickham-on-the-future-of-r-python-and-the-tidyverse

https://towardsdatascience.com/python-vs-r-for-data-science-cf2699dfff4b

1.0.1 ANNOTATED PACKAGE CREATION SCRIPT

When installing packages there are a few file management and path navigation details that you need to understand. This annotated version of the steps above is more explicit than you will eventually need, but it will be helpful your first time through:

ANNOTATED PACKAGE CREATION SCRIPT (right-click and save)

1.0.2 Basic R Console

It is recommended to complete this lab in a regular R console, NOT in R Studio.

R Studio asserts more control over file paths and working directories and it is dependent upon your operating system and R Studio settings, so you may get different behavior from the instructions.

The basic R console is ‘dumb’ - it does not try to guess what you want. That makes it more predictable for tutorials like this.

You can eventually create packages using RMD docs and other R Studio tools that help manage the process. But the steps will be different from this tutorial.

Note, if you need to Google any steps for additional help try to avoid instructions that use R Studio options. Their template for packages is helpful, but it will follow a different process than the one described here.

1.0.3 Making Mistakes

Note, if you make a mistake or encounter an error, you can always delete the new montyhall folder and start over at Step 1 (just keep a copy of your R script if you have already completed the roxygen comments).

Packages in R are just folders with R files and some extra documentation. They are designed to be minimalist.

1.1 Download the Template

You will need to download the Monty Hall Problem functions that we completed during Labs 01 and 02 from the link below. This script contains some roxygen text to get you started with the process of documentation.

DOWNLOAD THE TEMPLATE: monty-hall-pkg.R

Note that you will update this script and place it in the montyhall/R folder after you complete Step 01 and the package skeleton has been created (the working directories for your package are built).

The package code is provided - you just need to complete the documentation and develop the test code for each function (you can adapt these from the unit testing examples in the labs).

Once completed you can move monty-hall-pkg.R to the montyhall/R folder that is created in Step 1.

Note, you should NOT be using RMD files in this assignment. The only files in the R folder should be R scripts with the functions needed for the package and complete roxygen fields. This script becomes the library of functions inside of your new package and the roxygen fields are converted into formal documentation for the package.

1.2 PACKAGE CREATION STEPS:

1.2.1 Install Development Packages

#   Windows users must also install Rtools from:
#   https://cran.r-project.org/bin/windows/Rtools/ 

install.packages(
  c( "devtools", "roxygen2",
     "usethis", "testthat",
     "knitr" ) )

You can check that you have everything installed and working by running the following code:

library(devtools)
options( buildtools.check = function(action) TRUE )
has_devel()

## Loading required package: usethis

## Your system is ready to build packages!

1.2.2 Build a Package Skeleton

# set your directory if you want the package 
# created in a folder other than the default "documents" 

setwd( "some/path/here" )

# devtools::create_package() has been deprecated
usethis::create_package( "montyhall" )

This example is using the default “documents” directory:

getwd()  # documents in this example
usethis::create_package( "montyhall" )

documents     # package created inside documents 
├─ montyhall  # new folder that will appear after create_package()
│  ├─ \R 
│  ├─ DESCRIPTION
│  ├─ NAMESPACE

What you should see:

# > usethis::create_package( "montyhall" )
# ✔ Creating 'montyhall/'
# ✔ Setting active project to 'C:/Users/jdlecy/Documents/montyhall'
# ✔ Creating 'R/'
# ✔ Writing 'DESCRIPTION'
# Package: montyhall
# Title: What the Package Does (One Line, Title Case)
# Version: 0.0.0.9000
# Authors@R (parsed):
#     * First Last <first.last@example.com> [aut, cre] (<https://orcid.org/YOUR-ORCID-ID>)
# Description: What the package does (one paragraph).
# License: What license it uses
# Encoding: UTF-8
# LazyData: true
# ✔ Writing 'NAMESPACE'
# ✔ Writing 'montyhall.Rproj'
# ✔ Adding '.Rproj.user' to '.gitignore'
# ✔ Adding '^montyhall\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
# ✔ Opening 'montyhall/' in new RStudio session
# ✔ Setting active project to '<no active project>'

You will now have a directory in your current folder called montyhall. This will contain the files you need for your package.

Note, some of these files like gitignore are from GitHub - they will appear after completing Step 05. Files may vary by operating system, but you should at the very least have the R folder, a DESCRIPTION file, and a NAMESPACE file.

1.2.2.1 Where you will likely go wrong

Note that usethis::create_package( “montyhall” ) creates a new folder called “montyhall” inside your current directory.

documents     # base project folder 
├─ montyhall  # new folder that will appear after create_package()
│  ├─ \R
│  ├─ DESCRIPTION
│  ├─ NAMESPACE

Do NOT create your own folder called “montyhall”, navigate to that folder, and then call create_package() or you will end up with this scenario:

documents     
├─ montyhall     # base project folder
│  ├─ montyhall  # new folder that will appear after create_package()
│     ├─ \R
│     ├─ DESCRIPTION
│     ├─ NAMESPACE

If you try you should get the following warning message:

usethis::create_package( "montyhall" )
New project 'montyhall' is nested inside an existing project './', which is rarely a good idea.
If this is unexpected, the here package has a function, `here::dr_here()` that reveals why './' is regarded as a project.
Do you want to create anyway?

1: No
2: Absolutely not
3: I agree

1.2.3 Document Your Functions

Provide documentation for the functions in your package by adding roxygen comments to your R scripts. The template script provided in Step 1.1 above includes complete roxygen comments for one function and some placeholder tags for the others.

The example here is showing the documentation for the base R sum() function.

#' @title
#' Sum of vector elements.
#'
#' @description
#' `sum(x)` returns the sum of all the values present in its arguments.
#'
#' @details
#' This is a generic function: methods can be defined for it directly
#' or via the [Summary] group generic. For this to work properly,
#' the arguments `...` should be unnamed, and dispatch is on the
#' first argument.
#'
#' @param x Numeric, complex, or logical vectors.
#' @param na.rm A logical scalar. Should missing values (including `NaN`)
#'   be removed?
#' @return If all inputs are integer and logical, then the output
#'   will be an integer. Otherwise it will be a length-one numeric or
#'   complex vector.
#'
#'   Zero-length vectors have sum 0 by definition. See
#'   <http://en.wikipedia.org/wiki/Empty_sum> for more details.
#'
#' @examples
#' sum(1:10)
#' sum(1:5, 6:10)
#' sum(F, F, F, T, T)
#'
#' sum(.Machine$integer.max, 1L)
#' sum(.Machine$integer.max, 1)
#'
#' \dontrun{
#' sum("a")
#' }
sum <- function(..., na.rm = TRUE) {}

Note that good documentation describes all of the arguments needed by the function, including the required data types of each object. And clearly describe what will be returned when the function runs (type of object, what it contains).

The information you provide becomes the documentation that appears when you type help("function_name").

help( "sum" )

Roxygen, unlike R, is sensitive to the number of spaces your use. So don’t alter the formatting of the default comments in your script.

Place your documented R scripts into the “R” folder in your package directory, then try:

getwd()   # should be '.../Documents/montyhall'

# OTHERWISE NAVIGATE THERE:
#   move one level up:  setwd( ".." )
#   go into a folder:   setwd( "montyhall" )

devtools::document()

documents  
├─ montyhall    # should be inside here now
│  ├─ \R         # your R scripts should be in here
│  ├─ DESCRIPTION
│  ├─ NAMESPACE

You should see:

# Updating montyhall documentation
# Updating roxygen version in C:\Users\jdlecy\Documents\montyhall/DESCRIPTION
# Writing NAMESPACE
# Loading montyhall
# Writing create_game.Rd

documents     
├─ montyhall  
│  ├─ \R
│  ├─ \man     # new *.rd files will be here 
│  ├─ DESCRIPTION
│  ├─ NAMESPACE

You will see some errors as well if you have not yet finished documenting your functions. Ignore them for now.

Depending upon your OS and your R devtools version you may be required to complete ALL documentation before preceding to the next steps. At the very least each function should have a title.

You will now have a new folder in your montyhall directory called “man”, short for “manuals”. The documentation files have an .Rd (R documentation) extension. The man folder should contain one .Rd file for each exported function in your script (change_door.Rd, create_game.Rd, etc.).

Skip to the testing step below if you want to see if the package is now functional.

1.2.4 Update Your Package Description

Navigate to the main “montyhall” package folder on your computer and open the file called “DESCRIPTION” in a text editor (your computer will have a text editor like notebook). You will see something like this:

Package: montyhall
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R: 
    person(given = "First",
           family = "Last",
           role = c("aut", "cre"),
           email = "first.last@example.com",
           comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: What license it uses
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1

Since we use dplyr in the package, we need to add another line to import and attach it. Add:

Depends: 
   dplyr

Now complete the rest of the fields from “Title” to “Description”.

Package: montyhall
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R: 
    person(given = "First",
           family = "Last",
           role = c("aut", "cre"),
           email = "first.last@example.com",
           comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
Depends:
    dplyr
License: What license it uses
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.1.1

1.3 TEST IT:

1.3.1 Install Your Package

Go up one level in your directory so you are outside of the package folder, but in the same folder where the package folder lives.

documents     # should be back here 
├─ montyhall  
│  ├─ R
│  ├─ man

Then run the command to install your new package.

The install() command is looking for a folder called “montyhall”. It can’t find the folder if we are currently inside the folder. Thus we move up one level in the directory first:

setwd( ".." )  # move up one level with two periods
getwd()        # should be /documents NOT /montyhall 
devtools::install( "montyhall" )

If successful you will see messages like this:

# Updating montyhall documentation
# Updating roxygen version in C:\Users\jdlecy\Documents\montyhall/DESCRIPTION
# Writing NAMESPACE
# Loading montyhall
# Writing create_game.Rd
# > getwd()
# [1] "C:/Users/jdlecy/Documents/montyhall"
# > setwd( ".." )
# > devtools::install( "montyhall" )
# √  checking for file 'C:\Users\jdlecy\Documents\montyhall/DESCRIPTION'
# -  preparing 'montyhall':
# √  checking DESCRIPTION meta-information ... 
# -  checking for LF line-endings in source and make files and shell scripts
# -  checking for empty or unneeded directories
# -  building 'montyhall_0.0.0.9000.tar.gz'
#    
# Running "C:/PROGRA~1/R/R-36~1.1/bin/x64/Rcmd.exe" INSTALL \
#   "C:\Users\jdlecy\AppData\Local\Temp\RtmpKetkbm/montyhall_0.0.0.9000.tar.gz" --install-tests 
# * installing to library 'C:/Users/jdlecy/Documents/R/win-library/3.6'
# * installing *source* package 'montyhall' ...
# ** using staged installation
# ** R
# ** byte-compile and prepare package for lazy loading
# ** help
# *** installing help indices
# 'montyhall'g help for package     finding HTML links ...
#  done
# -reate_game                             html  
# ** building package indices
# ** testing if installed package can be loaded from temporary location
# *** arch - i386
# *** arch - x64
# ** testing if installed package can be loaded from final location
# *** arch - i386
# *** arch - x64
# ** testing if installed package keeps a record of temporary installation path
# * DONE (montyhall)

1.3.2 Try Loading

In a new R session try:

library( montyhall )
create_game()

## [1] "goat" "car"  "goat"

1.3.3 Check Help Files

You should be able to preview the help files that you created with your roxygen comments.

help( "create_game" )

1.3.4 Removing Packages

If you encounter an error or the package is not working properly, you might need to fix your code and try again.

You should be able to update the package then simply reinstall it. R will recognize that there are changes to the package and will install the most recent version.

If you run into problems you can force a package deletion then install fresh.

# library() attaches the package to the current environment
# detach is the opposite of library() - closes the package 
detach( "package:montyhall" )  # closes the package so not locked
remove.packages( "montyhall" ) # deletes from your computer

If that fails, you probably have the package loaded in another R session or have package files locked by editing them in another program.

If all else fails, you can delete the package manually from your personal R packages library, which usually lives in your Documents folder.

1.4 SHARE YOUR PACKAGE:

You have a couple of options for sharing your new package with others. You could submit the package to the CRAN so that everyone in the world could install it in R using install.packages(“montyhall”).

We cannot use this option because (1) it’s a homework assignment so we don’t want to burden people with the task of reviewing a new package, (2) package names on the CRAN must be unique so everyone from the class would have to name it something different, and (3) the CRAN requires that the package passes some robustness checks to ensure everything is documented correctly and all of the code is running smoothly.

1.4.1 Hosting on GitHub

A simpler option is to host your package on GitHub. Complete the following steps to upload your code to a new repository on GitHub:

Install the git client on your computer: LINK.
Install GitHub Desktop application: LINK.
Open GitHub in a browser and navigate to your profile.
Create a new repository titled “montyhall”. Check the “Create with README” option.
Copy the URL for your new repository.
Open GitHub desktop and select File >> Clone Repository.
Select the desired location of the folder and paste the URL into the appropriate dialogue.
Clone your GitHub repo.
Open the new “montyhall” folder that was just created by GitHub on your machine.
Transfer all of the files inside of your old “montyhall” folder into the new GitHub version.
You should see the files appear in the GitHub Desktop app. In the “summary” field type something like “initial commit” and at the bottom left select “Commit to master”.
Now up at the top select “Push origin”.

documents      
├─ montyhall  # current package files 
│  ├─ R
│  ├─ man     
projects       
├─ montyhall  # cloned github repo
│  ├─ .git
│  ├─         # copy files from original folder to this one

Do NOT copy the full folder. ONLY copy the package files inside the folder.

projects          
├─ montyhall  
│  ├─ .git
│  ├─ montyhall   # INCORRECT 
│  │  ├─ R
│  │  ├─ man

projects       
├─ montyhall      # CORRECT
│  ├─ .git
│  ├─ R
│  ├─ man   (plus DESCRIPTION, NAMESPACE, etc)

The last step sends all of the files to GitHub. They should appear on your repo page shortly.

Now others should be able to install your package by typing:

devtools::install_github( "yourGitHubName/montyhall" )

1.4.2 Test from GitHub

If you want to test whether installation works from GitHub first delete the version you installed earlier from your local machine. It will prevent conflicts when trying to install an identical version of the package, just from a different location.

remove.packages( "montyhall" ) # deletes from your computer 
devtools::install_github( "yourGitHubName/montyhall" )
montyhall::create_game()

See notes above on removing packages for more details.

1.5 Grading

I will grade assignments by running the script below, which generates a report with printouts of all of the help file text that you added, and also executes all of the test code included in the examples sections. I will look to ensure you have added documentation for all functions and that the examples are able to execute without error.

You can check your own work ahead of time by generating the report using the following script after adding your GitHub username.

First, delete your current package installation. Close down all instances of R or R Studio, open a new R environment and try:

# library() attaches the package to the current environment
# detach is the opposite of library() - closes the package 
detach( "package:montyhall" )   # closes the package so not locked
remove.packages( "montyhall" )  # deletes from your computer

Then you can run the testing script:

# IF NOT ALREADY INSTALLED:
pkgs <- c("devtools","purrr","rmarkdown")
install.packages(pkgs)

wd <- getwd()
dir.create("MONTY")

################

  git.hub.name  <-  # YOUR GITHUB USERNAME 

###############
  
## GENERATE PACKAGE REPORT 

filepath <- paste0( wd, "/MONTY/montyhall-test-", toupper(git.hub.name), ".HTML" )

download.file( 
  url="https://raw.githubusercontent.com/Watts-College/paf-514-template/main/labs/create-r-package-test.rmd",
  destfile="./MONTY/create-r-package-test.rmd" )

rmarkdown::render( 
  input = "./MONTY/create-r-package-test.rmd", 
  output_file = filepath,
  params = list( name=git.hub.name ) )

# file location
filepath

# preview file 
shell( filepath )

This script will install your package from GitHub, parse the documentation files you have created (RD files in the “man” folder), and attempt to execute all of the examples you created in the Examples section.

To receive full credit for the assignment:

Your package must install without errors
The documentation for all functions must be complete
The code snippets you provided in “Examples” must execute without error

What you should see:

If your package is running properly you will see a documentation section that shows you the helpfiles the user will see if they type help( "function_name" ):

#####   
#####     CREATE GAME 
#####   


create_game             package:montyhall              R Documentation

Create a new Monty Hall Problem game.

Description:

     'create_game()' generates a new game that consists of two doors
     with goats behind them, and one with a car.

Usage:

     create_game()
     
Arguments:

     ...: no arguments are used by the function.

Details:

     The game setup replicates the game on the TV show "Let's Make a
     Deal" where there are three doors for a contestant to choose from,
     one of which has a car behind it and two have goats. The
     contestant selects a door, then the host opens a door to reveal a
     goat, and then the contestant is given an opportunity to stay with
     their original selection or switch to the other unopened door.
     There was a famous debate about whether it was optimal to stay or
     switch when given the option to switch, so this simulation was
     created to test both strategies.

Value:

     The function returns a length 3 character vector indicating the
     positions of goats and the car.

Examples:

       create_game()

You will also see if the code snippets you provide as examples for each function are running properly:

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#####  
#####   play_game
#####  

>   play_game()
  strategy outcome
1     stay     WIN
2   switch    LOSE

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

#####  
#####   determine_winner
#####  

>   determine_winner()

>>>
>>>  Error in determine_winner() : argument "game" is missing, with no default
>>>

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

2 Additional Resources

These are a few good resources for reference:

A nice tutorial by Fong Chun Chan

The official R Packages book by Hadley Wickham and Jenny Bryant

Roxygen documentation on CRAN

Data Science for
Public Service

Labs designed by Jesse D. Lecy
Source code is available on GitHub

Creative Common License:
(CC BY-NC-SA 4.0)

CPP 527 Foundations of Data Science II
Part of the MS in Evaluation and Analytics
@ Arizona State University