How to Read and Merge +1000 Excel files using R
Do you want to know how to create R loops that read hundreds of Excel files into R?
And that your R loops clean, validate and join your Excel files data into a single and consolidated dataset?
Well, you are lucky… In this article we will see how to do just that 😉.
(but instead of reading this article you can also watch my video if you prefer:)
You can get the R code of this tutorial by subscribing to my newsletter on felixanalytix.com. Once you confirm your subscription, you will receive a link to access my GitHub account where you can get the R script of this video.
Read and write a single Excel file
To begin I just want to quickly show you how to read and write a single Excel file using R.
First we will install the R packages readxl and writexl, as well as the tidyverse for general data transformation:
As an example dataset, we will use the
txhousing data, available when you load the tidyverse R package. It contains data about sales by city and date in Texas. To save this dataset as an Excel file you can simply run
write_xlsx. To read it you can run
read_excelwith the path of your Excel file:
Write multiple Excel files in R
Now we will see how to write multiple Excel files in R.
Every time we have repetitive task to make in R, we want to create a function. I have created below a function that takes our dataset, filter it by a given city name, and save this filtered dataset as an Excel file. For example, if the
Abilene, our dataset will only contain information about this unique city. And only then our function will save this data as an Excel with the city name in the Excel file name in our folder “txhousing”.
We used the
map function from the R package purrr to iterate our function on each city name, which are listed with the object
Read (and join) multiple Excel files
We will see now how to read (and join together) multiple Excel files.
First we will create an list of all the Excel files paths we want to read into R. For that you can use the
list.files function (don’t forget to add
TRUE to get the full path of your files).
As we want to iterate over multiple files, we want to create a function.The function reads the Excel file into R as well as perform a simple test to check if the file path isn’t a missing value. But note that in most use cases you will have to write additional data cleaning and data validation.
To loop over all the Excel files, we used again the
map function from the purrr R package. The first argument should be the list of all your Excel files, and the second argument is the name of the function we just created. We ran our function on all the file paths and save the data in a
If your datasets have the same data structure, column names and data types as well, you may want to join them together. To do that, you used the
map_dfr from the purrr R package that bind the dataframe by row.
I hope you found this content useful. If you want to download the R code of this tutorial, you can get it by joining my newsletter on felixanalytix.com.
Don’t forget to clap and subscribe. And I see you in another article! 😊