As mentioned in the Work globally page, GitHub is an excellent way to keep a repository of a software project. A WordPress site is no exception. However, a WordPress site is significantly different than normal software projects in several regards:
- There is a lot of WordPress itself, as well as its Themes and Plugins that are contained in the public_html directory. That is OK, and it’s worthwhile tracking changes to those files. For one, it is a good way to detect corruption in a site.
- The vast majority of the content of a WP site is in posts and pages. When an author writes a blog post or a page of any sort, it does not create any new web pages in public_html. This is a result of the fact that WP contains a powerful CMS (ContentManagementSystem). A CMS manages the content in a database not a directory full of fully rendered HTML pages. This is discussed more in the Full stack page. The actual content of a blog post or a page is stored as a separate record in the wp_posts table. Blog posts have a post_type of ‘post’ and pages (like this one) have a post_type of ‘page’. There are in fact a number of other things that get stored in the wp_posts table. As a result of these items stored as records in a table, doing the GitHub tracking of the text of a file will not show changes in individual posts or pages.
Dumping the SQL data for source control
Because the main visible content is stored in a database table it is necessary to extract the individual records of the wp_posts (and other tables) to textual form so the normal GitHub mechanisms can show it in a readable form. As a first step to doing that, I have written a shells script called ‘dumpall‘ that is kept in the public_html directory (for now). That script is run in a shell process on the WP host site before a Git commit and push commands are run to update the GitHub repository. This is step 1 in the GitHub issue #10 of the GitHub repository’. Later, I will write a program to extract each record of the wp_posts table into a separate text file in a .data/posts directory. It will require some nicer format than a simple SQL dump.
One issue with the dumpall script is that the SQL dump for a specific table creates an INSERT statement to create all records into the table, but all of those records are written to a single line. Thus when GitHub tries to show the differences between subsequent revisions of a file, it looks like one VERY long line with lots of differences. What I would rather have is an INSERT statement that puts each record on a separate line. I have written a shell alias that uses the sed command to split that INSERT line into many lines. Thus, in the .data directory of my GitHub repository you will see the original SQL dump in a file named wp_posts.sql and a version with the lines split named wp_posts.split.sql. I am pushing this change on Wednesday April 17th, 2024 so you should see the text of this post will be changed in the wp_posts.split.sql file.
Dumping the markup for all pages and posts
In addition to saving the SQL tables generally, I want to export the text of all the Pages and Posts as separate files. This allows GitHub change tracking to present the textual changes to the posts and pages individually. To do this I wrote two shell scripts named dumpPages and dumpPosts. dumpPages uses the wp command line utility to get each page markup text into a file in the .pages directory. The file is named page-<name-of-page>.wpmu . The .wpmu extension is my extension name for “Word Press mark up”. For example .pages/how-to-use-github-to-source-control-a-wordpress-site.wpmu contains the markup text of this page. The dumpPosts command does the same for posts and stores them in the .posts directory.
When these commands are executed in a secure terminal connected to the server running this WordPress site, then the markup is exposed as text files to the git program. When I ask git for the status of this local repository I see this:
[michael]:public_html$ git status
On branch main
Your branch is up to date with ‘origin/main’.
Changes to be committed:
(use “git reset HEAD …” to unstage)
new file: .pages/page-blog.wpmu
new file: .pages/page-termius-as-a-secure-interactive-shell-and-file-transfer-client.wpmu
new file: .posts/post-post-7-process-improvements-stick.wpmu
Changes not staged for commit:
(use “git add …” to update what will be committed)
(use “git checkout — …” to discard changes in working directory)
modified: .data/wp_commentmeta.split.sql
modified: .data/wp_commentmeta.sql
modified: .data/wp_comments.split.sql
modified: .data/wp_comments.sql
modified: .data/wp_links.sql
modified: .data/wp_options.sql
modified: .data/wp_postmeta.sql
modified: .data/wp_posts.split.sql
modified: .data/wp_posts.sql
modified: .data/wp_usermeta.split.sql
modified: .data/wp_usermeta.sql
modified: .data/wp_users.split.sql
modified: .data/wp_users.sql
modified: .pages/page-development-platforms.wpmu
modified: .pages/page-how-to-do-easy-time-estimates.wpmu
modified: .pages/page-how-to.wpmu
modified: .pages/page-sample-page.wpmu
[michael]:public_html$ git diff .pages/page-how-to.wpmu
diff –git a/.pages/page-how-to.wpmu b/.pages/page-how-to.wpmu
index ed4ba26..2e8c3f7 100644
— a/.pages/page-how-to.wpmu
+++ b/.pages/page-how-to.wpmu
@@ -4,7 +4,7 @@
–

+