How to Make a Histogram with Basic R (2024)

How to Make a Histogram with Basic R (1)

[This article was first published on The DataCamp Blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the next week we will cover the basics of how to create your ownhistograms in R. Three options will be explored: basic R commands, ggplot2 and ggvis.These posts are aimed at beginning and intermediate R users who need an accessible and easy-to-understand resource. Want to learn more?Discover the R tutorials at DataCamp.

What Is A Histogram?

A histogram is a visual representation of the distribution of a dataset. As such, the shape of a histogram is its most obvious and informative characteristic: it allows you to easily see where a relatively large amount of the data is situated and where there is very little data to be found (Verzani 2004). In other words, you can see where the middle is in your data distribution, how close the data lie around this middle and where possible outliers are to be found. Exactly because of all this, histograms are a great way to get to know your data!But what does that specific shape of a histogram exactly look like? In short, the histogram consists of an x-axis, an y-axis and various bars of different heights. The y-axis shows how frequently the values on the x-axis occur in the data, while the bars group ranges of values or continuous categories on the x-axis. The latter explains why histograms don’t have gaps between the bars.

Note that the bars of histograms are often called “bins” ; This tutorial will also use that name.

How to Make a Histogram with Basic R

Step One – Show Me The Data

Since histograms require some data to be plotted in the first place, you do well importing a dataset or using one that is built into R.This tutorial makes use of two datasets: the built-in R dataset AirPassengers and a dataset named chol, stored into a .txt file and available for download.

chol = read.csv("https://s3.amazonaws.com/assets.datacamp.com/blog_assets/chol.txt", sep = " ") 

Step Two–Familiarize Yourself With The Hist() Function

You can simply make a histogram by using the hist() function, which computes a histogram of the given data values. You put the name of your dataset in between the parentheses of this function, like this:

hist(AirPassengers)

Which results in the following histogram:However, if you want to select only a certain column of a data frame, chol for example, to make a histogram, you will have to use the hist() function with the dataset name in combination with the $ sign, followed by the column name:

hist(chol$AGE) #computes a histogram of the data values in the column AGE of the dataframe named “chol”

Step Three – Take The Hist() Function Up A Notch

The histograms of the previous section look a bit dull, don’t they? The default visualizations usually do not contribute much to the understanding of your histograms. You therefore need to take one more step to reach a better and easier understanding of your histograms. Luckily, this is not too hard: R allows for several easy and fast ways to optimize the visualization of diagrams, while still using the hist() function.In order to adapt your histogram, you simply need to add more arguments to the hist() function, just like this:

hist(AirPassengers, main="Histogram for Air Passengers", xlab="Passengers", border="blue", col="green", xlim=c(100,700), las=1, breaks=5)

This code computes a histogram of the data values from the dataset AirPassengers, gives it “Histogram for Air Passengers” as title, labels the x-axis as “Passengers”, gives a blue border and a green color to the bins, while limiting the x-axis from 100 to 700, rotating the values printed on the y-axis by 1 and changing the bin-width to 5.Overwhelmed by this large string of code? No worries! Let’s just break it down to smaller pieces to see what each argument does.

Names/colors

Change the title of the histogram by adding main as an argument to hist() function:

hist(AirPassengers, main="Histogram for Air Passengers") #Histogram of the AirPassengers dataset with title “Histogram for Air Passengers”

To adjust the label of the x-axis, add xlab. Similarly, you can also use ylab to label the y-axis:

hist(AirPassengers, xlab="Passengers", ylab="Frequency of Passengers") #Histogram of the AirPassengers dataset with changed labels on the x-and y-axes

If you want to change the colors of the default histogram, you simply add the arguments border or col. You can adjust, as the names itself kind of give away, the borders or the colors of your histogram.

hist(AirPassengers, border="blue", col="green") #Histogram of the AirPassengers dataset with blue-border bins with green filling

Tip do not forget to put the colors and names in between"".

X and Y Axes

Change the range of the x and y values on the axes by adding xlim and ylim as arguments to the hist() function:

hist(AirPassengers, xlim=c(100,700), ylim=c(0,30)) #Histogram of the AirPassengers dataset with the x-axis limited to values 100 to 700 and the y-axis limited to values 0 to 30

Note the c() function is used to delimit the values on the axes when you are using xlim and ylim. It takes two values: the first one is the begin value, the second is the end value

Rotate the labels on the y axis by adding “las = 1” as an argument. las can be 0, 1, 2 or 3.

hist(AirPassengers, las=1) #Histogram of the AirPassengers dataset with the y-values projected horizontally

According to whichever option you choose, the placement of the label will differ: if you choose 0, the label will always be parallel to the axis (which is the default); If you choose 1, the label will be put horizontally. Pick 2 if you want it to be perpendicular to the axis and 3 if you want it to be placed vertically.

Bins

You can change the bin width by adding breaks as an argument, together with the number of breakpoints that you want to have:

hist(AirPassengers, breaks=5) #Histogram of the AirPassengers dataset with 5 breakpoints

If you want to have more control over the breakpoints between bins, you can enrich the breaks argument by giving it a vector of breakpoints. You can do this by using the c() function:

hist(AirPassengers, breaks=c(100, 300, 500, 700)) #Compute a histogram for the data values in AirPassengers, and set the bins such that they run from 100 to 300, 300 to 500 and 500 to 700.

However, the c() function can make your code very messy sometimes. That is why you can instead add =seq(x, y, z). The values of x, y and z are determined by yourself and represent, in order of appearance, the begin number of the x-axis, the end number of the x-axis and the interval in which these numbers appear.Note that you can also combine the two functions:

hist(AirPassengers, breaks=c(100, seq(200,700, 150))) #Make a histogram for the AirPassengers dataset, start at 100 on the x-axis, and from values 200 to 700, make the bins 150 wide

Tip study the changes in the y-axis thoroughly when you experiment with the numbers used in the seq argument!

Note that the different width of the bars or bins might confuse people and the most interesting parts of your data may find themselves to be not highlighted or even hidden when you apply this technique to your original histogram. So, just experiment with this and see what suits your purposes best!

Extra: Probability Density

The hist() function shows you by default the frequency of a certain bin on the y-axis. However, if you want to see how likely it is that an interval of values of the x-axis occurs, you will need a probability density rather than frequency. We thus want to ask for a histogram of proportions. You can change this by setting the freq argument to false or set the prob argument to true:

hist(AirPassengers, main="Histogram for Air Passengers", xlab="Passengers", border="blue", col="green", xlim=c(100,700), las=1, breaks=5, prob = TRUE)#Histogram of the AirPassengers dataset with a probability density expressed through the y-axis instead of the regular frequency.

After you’ve called the hist() function to create the above probability density plot, you can subsequently add a density curve to your dataset by using the lines() function:

lines(density(AirPassengers)) #Get a density curve to go along with your AirPassengers histogram

Note that this function requires you to set the prob argument of the histogram to true first!

Step Four. Want To Go Further?

For an exhaustive list of all the arguments that you can add to the hist() function, have a look at the RDocumentation article on the hist() function.This is the first of 3 posts on creating histograms with R. The next post will cover the creation of histograms using ggplot2.Spotted a mistake? Send us a tweetThe post How to Make a Histogram with Basic R appeared first on The DataCamp Blog .

Related

To leave a comment for the author, please follow the link and comment on their blog: The DataCamp Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to Make a Histogram with Basic R (2024)

FAQs

How to Make a Histogram with Basic R? ›

You can create histograms with the function hist(x) where x is a numeric vector of values to be plotted. The option freq=FALSE plots probability densities instead of frequencies. The option breaks= controls the number of bins.

How to plot a histogram in base R? ›

You can create histograms with the function hist(x) where x is a numeric vector of values to be plotted. The option freq=FALSE plots probability densities instead of frequencies. The option breaks= controls the number of bins.

How to create a histogram of a data frame in R? ›

To create histogram of all columns in an R data frame, we can use hist. data. frame function of Hmisc package. For example, if we have a data frame df that contains five columns then the histogram for all the columns can be created by using a single line code as hist.

What is the basic histogram plot? ›

A histogram can be defined as a set of rectangles with bases along with the intervals between class boundaries. Each rectangle bar depicts some sort of data and all the rectangles are adjacent. The heights of rectangles are proportional to corresponding frequencies of similar as well as for different classes.

How to create a histogram in R ggplot? ›

For a histogram, you use the geom_histogram() function. You can then add on other customization layers like labs() for axis and graph titles, xlim() and ylim() to set the ranges of the axes, and theme() to move the legend and make other visual customizations to the graph. ggplot2 makes building visualization in R easy.

What is a histogram in R? ›

What is a histogram in R? A histogram is a graphical representation commonly used to visualize the distribution of numerical data. It divides the values within a numerical variable into “bins”, and counts the number of observations that fall into each bin.

How to draw a line on a histogram in R? ›

We start by creating a vector of data. Then, we create a histogram to visualize the distribution of the data. Finally, we use the abline() function with the argument v = mean(data) to add a vertical line at the mean value of the data. We also customize the line color to red, line width to 2, and line type to dashed.

What are the 5 steps to making a histogram? ›

How to make a histogram graph
  1. Put all your data in ascending order. The first step is to gather all your data and put it in ascending order. ...
  2. Put all your data into a chart. Now that you've put all your data in ascending order, you can place it in a chart. ...
  3. Draw a graph. ...
  4. Refer to your data table to draw the graph's bars.
Oct 22, 2023

What is the first step in making a histogram? ›

To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable.

How is a histogram created? ›

To create a histogram, the data need to be grouped into class intervals. Then create a tally to show the frequency (or relative frequency) of the data into each interval. The relative frequency is the frequency in a particular class divided by the total number of observations.

How do you make a histogram with two sets of data in R? ›

The hist() function is then called twice to create two histograms, one for x1 and one for x2 . The col argument is used to set the color of each histogram. The add argument is set to TRUE for the second histogram so that it is overlaid on top of the first histogram.

How to label a histogram in R? ›

Labels can be added to base graphs using the text or mtext functions and the x locations can be found in the return value from the hist function. Heights for plotting can be computed using the grconvertY function.

How to plot histograms side by side in R? ›

Side-by-Side Histograms

In this code, we use par(mfrow=c(1, 2)) to set up a 1x2 layout, which means two plots will appear side by side.

How do you make a bar graph in base R? ›

Bar plots can be created in R using the barplot() function. We can supply a vector or matrix to this function. If we supply a vector, the plot will have bars with their heights equal to the elements in the vector.

How do you plot a histogram distribution? ›

How to Plot Histogram?
  1. Begin by marking the class intervals on the X-axis and frequencies on the Y-axis.
  2. The scales for both the axes have to be the same.
  3. Class intervals need to be exclusive.
  4. Draw rectangles with bases as class intervals and corresponding frequencies as heights.

Top Articles
Latest Posts
Article information

Author: Pres. Carey Rath

Last Updated:

Views: 6758

Rating: 4 / 5 (61 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Pres. Carey Rath

Birthday: 1997-03-06

Address: 14955 Ledner Trail, East Rodrickfort, NE 85127-8369

Phone: +18682428114917

Job: National Technology Representative

Hobby: Sand art, Drama, Web surfing, Cycling, Brazilian jiu-jitsu, Leather crafting, Creative writing

Introduction: My name is Pres. Carey Rath, I am a faithful, funny, vast, joyous, lively, brave, glamorous person who loves writing and wants to share my knowledge and understanding with you.