In this post I want to detail my way of implementing automatic WordPress Backups on a site hosted on a DigitalOcean VPS. I am looking for advice on how to improve this setup from efficiency and security standpoints.
Requirements:
- DigitalOcean droplet running LEMP/LAMP stack (web server) and a WordPress site
- SSH Key Authorization between server and backup storage machine
Environment
First, I will explain what I have running in my environment. If you already have your web server up and running with WordPress installed and you want to get straight into the backup process, skip ahead to the “Backup Process” section below.
Web Server:
I am using a DigitalOcean VPS running Ubuntu 18.04 x64.
I basically followed this tutorial to get WordPress Installed (and the guides in the prerequisites section).
So, I have a non-root user with sudo privileges, and I have configured SSH-Key authorization while disabling password authentication to the server. I have also disabled root ssh login for security purposes.
For the webserver we are using Nginx, MySQL for site data, and php-fpm for php processing.
Additionally, I have secured the site with SSL using Let’s Encrypt and Certbot for automatic renewal.
The tutorial listed above is for installing WordPress on the server. Once that was up and running, I did things like creating a child theme, posts, pages, etc. All the typical things you do when running a WordPress site.
Naturally, after all that hard work I wanted a way to back up what I’ve done. Setting up the server for ssh access and being a web server isn’t all that much work. However, when you start customizing your theme, uploading content, accruing comments, etc.–it would be a lot of work to recreate all that!
This is when I knew I had to come up with a way to back up my WordPress website. Being the nerd that I am, I didn’t want to use a plugin for this, and I wanted it to be as automatic as possible.
Now it’s time to get into the backup process.
The Automatic WordPress Backup Process
In order to quickly recreate a WordPress website, you need two things based on what I’ve researched:
- The database
- WordPress files such as config, wp-content (uploads), etc.
So, in my mind, I think it’s a good idea to create a zip archive of all the files (to aid in transfer speed/bandwidth usage when transferring the files off the web server to the storage server).
Backing up the database should be as simple as a mysqldump of the WP Database.
What we are going to do is write a script that will create these files and store them in a directory of our choice.
We will then schedule a cronjob to complete this task automatically at a date we determine to be suitable.
Let’s detail that process and then we will discuss getting the files from the server to our backup storage.
The Server-Side Script to Create The Backup Files
Currently, the script is very basic an needs improvement. But, here it is in all its glory!
!/bin/bash #Script creates tarball of WP site files and mysqldump of the WP Database #create tarball
tar -C /var/www/html -czvf [pathTo]/[yourBackupDirectory]/wp-files.tar.gz [yourSiteDirectory]
#create the mysqldump of the wp-database
mysqldump -u [username] -p'[yourPassword]' [db name] > [pathTo]/[yourBackupDirectory]/db.wp__backup.sql
Pretty self-explanatory. Replace the items inside [. . .] with the paths/variables relevant to your environment. [yourSiteDirectory] is where the WordPress files are installed. In my case its /var/www/html/website.
Room for Improvement
Currently the script is hard coding the database credentials and paths. This could be improved by using variables for the paths and using a .my.cnf file to store my database credentials.
I would also like to modify the script to append the day, date, and time to the filename. This would create multiple backup files instead of just one that overwrites itself.
It would be nice to then have the script check the size of the backups directory and delete the oldest files after a certain size limit or time period has passed. Maybe a different script and cronjob could complete that task. Anyway, let’s move on to the next component of the automatic WordPress backup process.
The Cronjob
Now that we have our script, make sure it is executable and with the right permissions for our user to execute/read/write to it.
After that is all ready to go, you can test your script to make sure it works. Execute the script and make sure you have a tar.gz archive of your site’s files, and a .sql dump of your database.
If your script is working as anticipated, you can create a cronjob to automatically run the script.
We do this by editing the cronjob’s for our user (crontab -e). The syntax I have chosen is as follows:
0 0 * * 0 ~/wp-bak/wpBackupScript.sh
My script and the backups are stored in the user’s ~/wp-bak/ directory. Change this to reflect where you backups are stored.
This runs the script at 12 AM on every Sunday. I figure once a week backup is enough for my simple site at this point. Still need to address the overwriting issue though, which I will get to in iteration two, keep an eye out for that post.
Room for Improvement
It may be nice to run twice a week. It depends how often your site changes and how much data loss you can tolerate. I would like to find a way to make sure I am being as efficient on CPU use and disk space usage as possible. Please comment or email with any improvement suggestions you may have. I will post a link to the code on github at the end of this post as well.
Server-Side Conclusions
That pretty much takes care of the server-side of things. We have a script that creates our files for us, and a cronjob that automatically runs the script when we want it to. Let’s move on to pulling these files down to another computer.
Pulling the Backups to a Remote Machine using SFTP
We want to store these backups on a machine, say our computer at home. Or if you’re lucky enough to have an always on backup server/file share, you could do that.
Me being a weirdo and not having an always on Linux Server at home—I wanted to figure out how to pull these backups for storage on a Windows machine.
If I would have had a Linux machine running, I would have just used a script on the web server to rsync the files over to my backup server. Would have simplified things a bit.
However, I digress. This is how I went about pulling the files from my Windows 10 laptop at home.
The Batch File to Pull the Automatic WordPress Backups
I created a batch file to automatically pull the files using sftp. For this to work, my laptop has SSH key authentication configured with the webserver, and there is no passphrase on my SSH Key.
Here is the code in the batch file:
ECHO OFF
sftp -b "C:\Path\ToWhere\CommandFile\isStored\sftpCommands.txt" [email protected]
EXIT
That’ it! Nice and simple. All it does is connect to my webserver over SFTP (FTP with SSH), and it uses a command file to pass commands to the sftp. Here is what is in the command file:
get -r ~/pathTo/db.wp__backup.sql C:\Path\To\backupFiles\wordpressBak\database
get -r ~/pathTo/wp-files.tar.gz C:\Path\To\backupFiles\wordpressBak\wp-files
exit
Simple enough. You may need to tweak this depending on where you store everything and what you named your backup files, obviously.
Finally, we use the Windows Task Scheduler to run this batch file every Sunday at 3AM. Currently, 3 hours should be more than enough time to make sure my web server is done writing to the backup files. Scheduling a task is simple enough to do so I won’t go into the details of how to do that in this post.
Room for Improvement
Now, we have a batch file that connects to our web server using SFTP, pulls two files, quits the SFTP session and exits the command prompt. This runs automatically on Sundays at 3 AM.
This should be improved by first of all by implementing the day/date/time filename change in the server-side scripts mentioned above.
Then it would be nice for our storage side script to only pull the backups we don’t have (to save bandwidth). This would be easy using rsync. I think another way would be to use synchronization options and WinSCP scripts. Ideally though if I could have anything I wanted, I would just have a Linux server at home and use rsync between the webserver and the backup storage server because it has flags built in to only transfer what’s needed.
CONCLUSION
Currently, we have a process for automatic WordPress backups. I verified it works by first copying all the files to a fresh WordPress install on my local development server.
I then created a database with the same name as my old one and imported the SQL dump.
This worked when loading up the site on my dev server, but clicking any links took me to my live site because it was still up, and the links are obviously pointed at the live domain name not my dev server.
So, the theme loaded on local server which makes me believe everything was right, and the link structure is right because it sends me to the live site when you click anything. Cool.
I also verified that the sql dump works by checking the .sql file for comments. I left a comment on the site with a word that was never used anywhere else on the site, at the time the last backup was taken.
I then ran the backup, pulled the files, and searched the SQL dump for the comment, which it found. So that appears to be working, great!
Problems with this approach are that we currently overwrite the backups on the server and on the storage every time. This means we can’t restore back past the most recent backup. If something goes wrong and we backup what’s wrong…. I don’t have to explain what’s wrong here lol.
So, my next steps for improving this are going to be editing the server-side script to append the date and time to the file name, and create new backups each time instead of overwriting the old ones.
I then need to find a way to edit the batch file/get commands to only pull the files that we don’t have on our backup storage server.
Finally, I should improve the security of the whole thing by removing the database credentials from the backup script and ensuring that all files containing sensitive information can be accessed only by those file’s owner.
I also need to protect each system’s SSH keys. Other than that, I think the whole process is fairly secure, since we are using SSH for everything.
Please leave your input on security issues or how to implement any improvements with a comment, email, or pull/change request on Github.
You can view the repo for this setup here.
Contact me if you have any questions or want to work together on improving this setup!