CPP 526: Foundations of Data Science I


A dashboard template has been provided:



Data

You will again use the traffic accidents dataset from the Tempe Open Data Portal. The details of the dataset are reported at the end of these instructions.



Instructions

Create a dashboard with at least six tabs, four have already been provided for your.

The template currently provides:

Using these input widgets, the user should be able to define a query that selects a specific time-period (weekends, afternoon rush-hour, etc.) and the map should display the correct data.

The second tab allows the user to explore the prevalence and severity of different types of accidents by driver characteristics.

The template includes four value boxes at the top of the page to report statistics for each user query:

You can find new icons for the value boxes in the Font Awesome Gallery.

YAML Header

The options in the template header embed the source code in your dashboard for easy access. Make sure to include the runtime shiny argument when you include dynamic widgets.

---
title: "Crash Data: City of Tempe"
output: 
  flexdashboard::flex_dashboard:
    theme: spacelab
    source: embed
    smart: false
runtime: shiny
---

Access to Raw Data

The template includes a data table with the raw data that allows the users to filter and download the data they are interested in.

We use the Data Table DT package to embed data in a dashboard.

# library( DT )

these.buttons <- c( 'copy', 'csv', 'excel', 'pdf', 'print' )

datatable( dat,
           filter='bottom', rownames=FALSE, 
           #options=list( pageLength=5, autoWidth=TRUE ),
           fillContainer=TRUE, 
           style="bootstrap",
           class='table-condensed table-striped',
           extensions = 'Buttons', 
           options=list( dom='Bfrtip', 
                         buttons=these.buttons  )) %>%
  
  formatStyle( "name", "white-space"="nowrap" )



Your Task

Add at least two additional tabs that allow for meaningful analysis of the data. Include widgets for user input, and link the widgets to a leaflet map (like tabs 1 and 2) or any other tables or graphs that you find useful for analyzing traffic patterns. You might consider:

Accident characteristics:

Conditions at the time of accident:

Impairement?

Note that some categories like weather conditions are mutually exclusive and thus should be radio buttons, and others might be inclusive such as drugs and alcohol for driver 1 and driver 2, and could be operationalized with check-boxes.

Instead of using 24-hour periods you might consider defining your own periods such as rush hour, work day, school pick-up, etc.).



The About Tab

Include a description of the project including references for the data source, your contact info, and the intended use of the dashboard.

The data dictionary is included in this tab. The data dictionary was created as follows:

# pseudo-code only
data.dictionary <- read.csv()
dd.simple <-
  data.dictionary %>%
  select( variableNames, variableDescriptions )
pander( dd.simple ) # html table, from pander package



BONUS: Principles of Dashboard Design

You will note that on the second tab the input widgets are a little awkward for creating comparisons. It could potentially be improved by considering some of the following.

  • Since we have two drivers, perhaps splitting the map into two views is more meaningful for comparisons?
  • Age can be treated as categories to make it easier to select a meaningful group?
  • The use of mutually exclusive categories forces us to use gender in any comparison we make, which makes the choice set less flexible (perhaps we want to only consider age and not gender).
  • Is the map of points the best way to convey the comparisons? What information do you take from the side-by-side maps? Would tables or a heatmap summary of points, a table or a chart be better?
  • Driver 1 is typically the individual that caused the accident in the study. Is a driver 1 to driver 2 comparison the most meaningful way to slice the data? Perhaps it would be better as driver 1 to driver 1, with different characteristics?
  • Can we organize the widgets into groups to convey that they belong to different maps?

In other words, as the programmer you have a lot of control over how you shape the user’s consumption of the information! There are lots of assumptions about reference points and meaningful comparisons built into the dashboard.



Shiny Advice

When developing your analysis using shiny widgets, each will entail three steps:

I suggest that you develop each analytical step outside of the dashboard, then add the code in the appropriate dashboard buckets. Start with static versions of your parameters.

# parameters
days.of.week <- c("Sat","Sun")
start.time <- 3
end.time <- 6

# filter data
d2 <-
  dat %>%
  filter( day %in% days.of.week, hour >= start.time & hour <= end.time )

# analyze data
d2 %>%
  count( Collisionmanner ) %>%
  arrange( -n ) %>%
  pander()

Now to link to your shiny widgets, you just need to grab the user inputs:

# shiny widgets

checkboxGroupInput("days", label = h3("Day of Week"), 
    choices = list("Monday"    = "Mon", 
                   "Tuesday"   = "Tue", 
                   "Wednesday" = "Wed", 
                   "Thursday"  = "Thu",
                   "Friday"    = "Fri",
                   "Saturday"  = "Sat",
                   "Sunday"    = "Sun" ),
    selected = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),

sliderInput("hour", label = h3("Time of Day"), 
            min = 0, max = 23, value = c(6, 12))

# parameters
days.of.week <- input$days    # vector will all checked values
start.time <- input$hour[1]   # sliderInput lower value
end.time  <-  input$hour[2]   # sliderInput upper value

Pay attention to the return values of each widget:

Shiny Widget Gallery

They may be a single value or a vector, and they are typically characters, not numeric values, so you may need to recast as the correct value type to use in the filters. For example, if you use a check box for hours of the day, the return value will be a character and the greater than or less than operators would not work.

filter( day %in% days.of.week, hour >= start.time & hour <= end.time )



Practice with dplyr

If you want to practice some more with data wrangling steps, you might find this tutorial helpful:

How to Identify Hipster Names



Submission Instructions

There are two steps to submitting your final project:

1) Upload RMD to Canvas

Login to Canvas at http://canvas.asu.edu and navigate to the assignments tab in the course repository. Upload your RMD file to the appropriate lab submission link.

2) Deploy your dashboard

Visit the Shiny Apps platform hosted by R Studio, and create a free own account. Upload your dashboard to your account (after you run your file you should see the option “Publish” at the top right - follow the commands).

Include the link to your dashboard in your submission comments on Canvas.

Remember to:



Data

You will use the traffic accidents dataset (from the Tempe Open Data Portal) that was used in Lab-05:

URL <- "https://github.com/DS4PS/Data-Science-Class/blob/master/DATA/TempeTrafficAccidents.rds?raw=true"
dat <- readRDS( gzcon( url( URL ) ) )

Recall that the dataset contains the following variables:

column type label description
Incidentid numeric Incident ID Unique incident ID number assigned by Arizona Department of Transportation (ADOT).
DateTime timestamp Date Time Date and time that the crash occurred.
Year numeric Year Year that the crash occurred.
StreetName text Street Name The street that the crash occurred on.
CrossStreet text Cross-street The nearest intersecting street or road.
Distance numeric Distance from Intersection The distance, in feet, that the crash occurred from the cross-street.
JunctionRelation text Junction Relation The location of the crash in relation to a junction, either an intersection or connection between a driveway and a roadway.
Totalinjuries numeric Total Injuries Total number of persons with non-fatal injuries involved in the crash.
Totalfatalities numeric Total Fatalities Total number of persons with fatal injuries involved in the crash.
Injuryseverity text Injury Severity The highest severity of injury of all persons involved in the crash.
Collisionmanner text Collision Manner Identifies the manner in which two vehicles initially came into contact.
Lightcondition text Lighting Conditions The type/level of light that existed at the time of the crash.
Weather text Weather The prevailing (most significant) atmospheric conditions that existed at the time of the crash.
SurfaceCondition text Surface Condition The roadway surface condition at the time and place of a crash.
Unittype_One text Unit Type One Driver, Passenger, Pedestrian, Pedalcyclist or Driverless.
Age_Drv1 numeric
Gender_Drv1 text
Traveldirection_One text Travel Direction The direction the unit was traveling before the incident occurred,
Unitaction_One text Unit Action One The maneuver, or last action, of the unit before the crash.
Violation1_Drv1 text Violation One The main violation/behavior of the unit that contributed to the crash.
AlcoholUse_Drv1 text Alcohol Use 1 Indicates whether alcohol was a contributing factor in the crash or not.
DrugUse_Drv1 text Drug Use 1 Indicates whether drug use was a contributing factor in the crash or not.
Unittype_Two text Unit Type Two Driver, Passenger, Pedestrian, Pedalcyclist or Driverless.
Age_Drv2 numeric
Gender_Drv2 text
Traveldirection_Two text Travel Direction Two The direction the unit was traveling before the incident occurred.
Unitaction_Two text Unit Action Two The maneuver, or last action, of the unit before the crash.
Violation1_Drv2 text Violation Two The main violation/behavior of the unit that contributed to the crash.
AlcoholUse_Drv2 text Alcohol Use 2 Indicates whether alcohol was a contributing factor in the crash or not.
DrugUse_Drv2 text Drug Use 2 Indicates whether drug use was a contributing factor in the crash or not.
Latitude numeric Latitude Used to specify the precise location of the crash.
Longitude numeric Longitude Used to specify the precise location of the crash.