How to Read and Merge +1000 Excel files using R
Would you like to know how to create R loops to read hundreds of Excel files into R?
And that your R loops clean, validate and join your Excel files data into a single and consolidated dataset?
Well, you are lucky. In this tutorial we will see how to do just that 😉.
You can also watch this tutorial on YouTube:
Download the R code of this tutorial by joining my newsletter on www.felixanalytix.com. You will receive an automatic email from me to access the R script.
Read and write a single Excel file
To begin I just want to quickly show you how to read and write a single Excel file using R.
First we will install the R packages readxl and writexl, as well as the tidyverse for general data transformation:
As an example dataset, we will use the txhousing
data, available when you load the tidyverse R package. It contains data about sales by city and date in Texas. To save this dataset as an Excel file you can simply run write_xlsx
. To read it you can run read_excel
with the path of your Excel file:
Write multiple Excel files in R
Now we will see how to write multiple Excel files in R.
Every time we have repetitive task to make in R, we want to create a function. I have created below a function that takes our dataset, filter it by a given city name, and save this filtered dataset as an Excel file. For example, if the city_name
is Abilene
, our dataset will only contain information about this unique city. And only then our function will save this data as an Excel with the city name in the Excel file name in our folder “txhousing”.
We used the map
function from the R package purrr to iterate our function on each city name, which are listed with the object list_city_names
.
Read (and join) multiple Excel files
We will see now how to read (and join together) multiple Excel files.
First we will create an list of all the Excel files paths we want to read into R. For that you can use the list.files
function (don’t forget to add full.names
equal TRUE
to get the full path of your files).
As we want to iterate over multiple files, we want to create a function.The function reads the Excel file into R as well as perform a simple test to check if the file path isn’t a missing value. But note that in most use cases you will have to write additional data cleaning and data validation.
To loop over all the Excel files, we used again the map
function from the purrr R package. The first argument should be the list of all your Excel files, and the second argument is the name of the function we just created. We ran our function on all the file paths and save the data in a df_list
object.
If your datasets have the same data structure, column names and data types as well, you may want to join them together. To do that, you used the map_dfr
from the purrr R package that bind the dataframe by row.
Conclusion
I hope you found this content useful. If you want to download the R code of this tutorial, you can get it by joining my newsletter on felixanalytix.com.
Don’t forget to clap and subscribe. And I see you in another article! 😊