Chapter 7 Histograms | Data Visualization with R (2024)

7.1 Introduction

In this chapter, we will learn to:

  • create a bare bones histogram
  • specify the number of bins/intervals
  • represent frequency density on the Y axis
  • add colors to the bars and the border
  • add labels to the bars

A histogram is a plot that can be used to examine the shape and spread of continuous data. It looks very similar to a bar graph and can be used to detect outliers and skewness in data. The histogram graphically shows the following:

  • center (location) of the data
  • spread (dispersion) of the data
  • skewness
  • outliers
  • presence of multiple modes

To construct a histogram, the data is split into intervals called bins. The intervals may or may not be equal sized. For each bin, the number of data points that fall into it are counted (frequency). The Y axis of the histogram represents the frequency and the X axis represents the variable.

7.2 Distributions

Before we learn how to create histograms, let us see how normal and skewed distributions look when represented by a histogram.

7.2.1 Normal Distribution

Chapter 7 Histograms | Data Visualization with R (1)

7.2.2 Skewed Distributions

Chapter 7 Histograms | Data Visualization with R (2)

7.3 Basics

Histograms are created using the hist() function in R. The minimum input required to create a bare bones histogram is a continuous variable. Below is an example:

Chapter 7 Histograms | Data Visualization with R (3)

The hist() functions returns details of the histogram which can be accessed by assigning the histogram to a variable. Let us assign the above histogram to a variable h and use the $ symbol to access the details stored in the variable.

Chapter 7 Histograms | Data Visualization with R (4)

# display number of breaksh$breaks## [1] 10 15 20 25 30 35# frequency of the intervalsh$counts## [1] 6 12 8 2 4# frequency densityh$density## [1] 0.0375 0.0750 0.0500 0.0125 0.0250# mid points of the intervalsh$mids## [1] 12.5 17.5 22.5 27.5 32.5# varible nameh$xname## [1] "mtcars$mpg"# whether intervals are of equal sizeh$equidist## [1] TRUE

7.4 Bins

The hist() function creates equidistant intervals by default. We can specify the number of bins using the breaks argument.

Chapter 7 Histograms | Data Visualization with R (5)

The below plot displays histograms with different number of bins:

Chapter 7 Histograms | Data Visualization with R (6)

7.5 Intervals

If we want to create histograms with specific intervals, the breaks argument can be supplied with the intervals.

Chapter 7 Histograms | Data Visualization with R (7)

If you observe the Y axis, it does not represent frequency any more. Instead, it represents the frequency density. What is frequency density?

7.5.1 Frequency Density

Frequency Density = Relative Frequency / Class Width

Relative Frequency = Frequency / Total Observations

h <- hist(mtcars$mpg, breaks = c(10, 18, 24, 30, 35))

Chapter 7 Histograms | Data Visualization with R (8)

frequency <- h$countsclass_width <- c(8, 6, 6, 5)rel_freq <- frequency / length(mtcars$mpg)freq_density <- rel_freq / class_widthd <- data.frame(frequency = frequency, class_width = class_width, relative_frequency = rel_freq, frequency_density = freq_density)d
## frequency class_width relative_frequency frequency_density## 1 13 8 0.40625 0.05078125## 2 12 6 0.37500 0.06250000## 3 3 6 0.09375 0.01562500## 4 4 5 0.12500 0.02500000

When multiplied by the class width, the product will always sum upto 1.

sum(d$frequency_density * d$class_width)
## [1] 1

We will learn more about frequency density in a bit. Before we end this section, we need to learn about one more way to specify the intervals of the histogram, algorithms. The hist() function allows us to specify the following algorithms:

  • Sturges (default)
  • Scott
  • Freedman-Diaconis (FD)

In the below plot, we examine how th algorithms work:

Chapter 7 Histograms | Data Visualization with R (9)

7.6 Frequency Distribution II

Let us come back to frequency density. If you want the Y axis of the histogram to represent frequency density instead of counts, set the freq argument to FALSE.

Chapter 7 Histograms | Data Visualization with R (10)

The same result can be achieved by using the probability argument as well. It takes only logical values as inputs and the default is FALSE. If set to TRUE, the Y axis will represent the frequency density instead of counts.

hist(mtcars$mpg, probability = TRUE)

Chapter 7 Histograms | Data Visualization with R (11)

7.7 Color

To add colors to the bars of the histogram, use the col argument. If the number of colors specified is less than the number of bars, the colors are recycled. Below are a few examples:

7.7.1 Single Color

Chapter 7 Histograms | Data Visualization with R (12)

7.7.2 Different Colors

Chapter 7 Histograms | Data Visualization with R (13)

7.7.3 Recycled Colors

Chapter 7 Histograms | Data Visualization with R (14)

7.8 Border Color

Colors can be specified for the borders of the histogrambars using the border argument.

Chapter 7 Histograms | Data Visualization with R (15)

7.8.1 Different Colors

Chapter 7 Histograms | Data Visualization with R (16)

7.9 Labels

In certain cases, we might want to add the frequency counts on the histogram bars. It is easier for the user to know the frequencies of each bin when they are present on top of the bars. Let us add the frequency counts on top of the bars using the labels argument. We can either set it to TRUE or a character vector containing the label values. Let us look at both the methods.

7.9.1 Method 1

Set labels to TRUE.

Chapter 7 Histograms | Data Visualization with R (17)

7.9.2 Method 2

Specify the label values in a character vector.

Chapter 7 Histograms | Data Visualization with R (18)

7.10 Putting it all together..

Let us add a title and axis labels to the histogram.

hist(mtcars$mpg, labels = TRUE, prob = TRUE, ylim = c(0, 0.1), xlab = 'Miles Per Gallon', main = 'Distribution of Miles Per Gallon', col = rainbow(5))

Chapter 7 Histograms | Data Visualization with R (19)

Chapter 7 Histograms | Data Visualization with R (2024)
Top Articles
Tory Burch - 80 Premium Outlets Boulevard, Merrimack, New Hampshire, United States - Updated July 2024 - Guide.in.ua
Enjoy Tax-Free Shopping at the North Conway Outlets
Fighter Torso Ornament Kit
Netronline Taxes
Somboun Asian Market
Google Sites Classroom 6X
Www Movieswood Com
Imbigswoo
Conduent Connect Feps Login
REVIEW - Empire of Sin
Winterset Rants And Raves
Colts seventh rotation of thin secondary raises concerns on roster evaluation
House Party 2023 Showtimes Near Marcus North Shore Cinema
5 high school volleyball stars of the week: Sept. 17 edition
Houses and Apartments For Rent in Maastricht
Michael Shaara Books In Order - Books In Order
Define Percosivism
Idaho Harvest Statistics
Satisfactory: How to Make Efficient Factories (Tips, Tricks, & Strategies)
Aps Day Spa Evesham
Isaidup
Somewhere In Queens Showtimes Near The Maple Theater
Talkstreamlive
Project Reeducation Gamcore
2487872771
Prep Spotlight Tv Mn
Divina Rapsing
Poochies Liquor Store
Is Light Raid Hard
Creed 3 Showtimes Near Island 16 Cinema De Lux
Jailfunds Send Message
Calvin Coolidge: Life in Brief | Miller Center
Dentist That Accept Horizon Nj Health
The Bold And The Beautiful Recaps Soap Central
Review: T-Mobile's Unlimited 4G voor Thuis | Consumentenbond
WorldAccount | Data Protection
Jetblue 1919
Actor and beloved baritone James Earl Jones dies at 93
If You're Getting Your Nails Done, You Absolutely Need to Tip—Here's How Much
Uc Davis Tech Management Minor
Frontier Internet Outage Davenport Fl
3500 Orchard Place
Greatpeople.me Login Schedule
Adams-Buggs Funeral Services Obituaries
Www.homedepot .Com
Every Type of Sentinel in the Marvel Universe
San Diego Padres Box Scores
10 Bedroom Airbnb Kissimmee Fl
Solving Quadratics All Methods Worksheet Answers
Okta Hendrick Login
Predator revo radial owners
Selly Medaline
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6750

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.