Open Source

How to recover all your posts from a WordPress database

Marco Fioretti shares a tip on how to retrieve WordPress posts by querying the MySQL database.
A while ago a friend of mine set up an Italian blog with WordPress, with me and other friends as co-authors. Most of those people preferred the usual WordPress interface to posting from the command line as I usually do. Unfortunately (long story for another day) after a while the whole blog was shut down. Of course we had complete backups offline, but for the reasons above almost nobody had an immediately reusable, plain text copy of all and only his or her posts!

I will show you how I managed to create all those copies in a way that:

  • only returns the essential data, without the comments, blocks and widgets that would clutter static copies of WordPress HTML pages
  • works when a blog isn't online anymore, on any platform supported by MySql
  • but also on still running WordPress blogs (if you have the right access to it)!

This is possible because WordPress stores all its content, except for pictures and other attachments, in one big MySql database. All you have to do is to query that database (or a clone obtained by its backups) in the right way. You can work at the Linux command line, as shown below, or in any other MySql client. I have tested the procedure on a database from WordPress 3.4.2, but it should work with little or no changes even with other versions of the software.

Clone the WordPress database on your computer

Plugins like BackUpWordPress, or any MySql interface, can dump all the content of a WordPress MySQL database into one big text file, which I'll call database_dump.sql. If, as in my case, the blog isn't online anymore, that's all that is available. In theory, you may get your posts back by directly parsing that file.

In practice, it is much less painful and more flexible to reload that dump into a real, new MySql database that you may also query with any MySQL client, in any way you need. Let's assume that you have already created such a new, empty MySql database just for this task, accessible to a user called "recover_blog" with the password "getposts". This is the command that loads in that database all the content of the WordPress one:
[marco]$ mysql -u recover_blog -pgetposts wordpress_dump < database_dump.sql

WordPress database structure

Before getting to the point I have to explain briefly some properties of the MySQL database parts that contain the information we need. These are the tables called wp_users, wp_posts, wp_terms, wp_term_taxonomy, and wp_term_relationships. Wp_users tells us the numeric IDs associated to each author. In my case, my own posts were only those whose author ID was 3:

  mysql> SELECT ID, display_name FROM wp_users;
  +----+----------------+
  | ID | display_name   |
  +----+----------------+
  |  2 | Angelo         |
  |  3 | Marco          |
  |  4 | Carlo          |
  |  5 | Gaetano        |
  +----+----------------+

Text, title, date, numeric author ID and many other data are inside wp_posts. A complete description of the other three tables and how they work would be too long, so I'll only tell you what is needed to understand this post. WordPress distributes across those tables the lists of all the tags and categories associated to each post, as well as their visible names. As an example, what these two queries (edited for brevity!) mean:

  mysql> select * from wp_term_relationships limit 1\G
  *************************** 1. row ***************************
         object_id: 680
  term_taxonomy_id: 4
  mysql> select * from wp_terms where term_id = 4\G
  *************************** 1. row ***************************
     term_id: 4
        name: linux

is that the post whose ID is 680 was tagged (also) with the taxonomy term number 4, whose name is "linux".

Here is the actual MySql query

I deliberately ignored optimization, because I needed to run that query just once. There are surely better ways to write it. If you know them, please post them in the comments. This said, here is what you can copy and paste (all on one line!!!) in your MySql client to get your posts back:
  SELECT wp_posts.post_title as Title, wp_posts.post_date as DATE,
  GROUP_CONCAT(wp_terms.name) AS TAGS, wp_posts.post_content as CONTENT FROM wp_terms
  INNER JOIN wp_term_taxonomy ON wp_terms.term_id = wp_term_taxonomy.term_id
  INNER JOIN wp_term_relationships ON wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
  INNER JOIN wp_posts ON wp_posts.ID = wp_term_relationships.object_id
  WHERE post_author = 3 AND post_type LIKE 'post' AND post_status LIKE 'publish' GROUP BY wp_posts.ID order by post_date\G

...and here is the result!

The easiest way to explain what that query does is to show its result. When I saved it into a file called  myquery.sql and ran this command:

mysql -u recover_blog -pgetposts wordpress_dump < myquery.sql > all_my_posts.txt
the file all_my_posts.txt looked like this (I trimmed the "CONTENT" for brevity, leaving just enough to show that it contains the actual text of each post, in HTML format):
  ************************** 1. row ***************************
    Title: Netlive e ITIS Linux
     DATE: 2010-04-12 22:52:13
     TAGS: scuola,software libero
  CONTENT: <a href="http://www.linuxnetlive.org">Netlive</a>, un software libero...
  *************************** 2. row ***************************
    Title: Convegno: Dalla memoria di carta...
     DATE: 2010-04-13 08:16:26
     TAGS: informatica,biblioteche,cultura,pubblica amministrazione
  CONTENT: "La carta cede il passo ad <strong>altri supporti</strong>...

As you can see I got, one after another, title, publication date, tags and content of all my published posts (note the "WHERE post_author = 3" part). Title, date and content are actual columns of the wp_posts table. The list of tag and category names, instead, must be built by linking posts' ID codes in wp_posts to term names in wp_terms through all the tables I mentioned. This is what GROUP_CONCAT and all those JOIN statements do (check the online MySql manual for details). Neat, huh? Next week, I'll show you how to convert that text file to HTML, PDF or other formats, or how to load it into other databases!

About

Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

8 comments
Kim SJ
Kim SJ

...why not load it back onto a new blog instance?

janitorman
janitorman

I don't picture ANYONE doing this.

mfioretti
mfioretti

"why not load it back onto a new blog instance?" for the reasons explained at the beginning of the post, not to mention its very title. The point never was to **restore** that blog, or any other WordPress blog. It was to **recover** the text of all its posts, and (almost) nothing else, in forms that would be reusable in other ways.

mfioretti
mfioretti

Janitorman, A bit too vague, you are. "I don't picture ANYONE doing this" could be, and has been said, about practically everything that is done with computers, starting from that CEO that said there would be no market for more than 5 computers in the world. May I ask you what do you mean, exactly, by "I don't picture ANYONE doing this"? Do you mean that nobody would ever need to do such a data recovery operation, or that nobody would do it in this way? Thanks, Marco

ryumaou@hotmail.com
ryumaou@hotmail.com

I'll grant you, most WordPress installations are pretty rock solid and don't require such extreme measures to retrieve data, but if you've ever had a hosting provider suddenly go bust on you, you have no idea the difficulty in retrieving data. Trust me! I actually had to parse several *years* worth of posts once because a host I was using just suddenly shut down and would only give me the flat output from a MySQL database as my "backup".

eosp
eosp

Mabye you have no use for this but it's exactly what I would like to do with a messed up old wordpress site I've inherited. It needs revamping and the best way to deal with that is to save the posts and start fresh.

mfioretti
mfioretti

As a complement to ryumanou's comment, I would like to point out a couple of things. First, there are many reasons why a blog, or any other website, may completely disappear from the Net, that have NOTHING to do with the reliability of the software used and/or the professionalism and financial solidity of the hosting provider. Please note that I did NOT say in the post that these were the reasons why that particular blog disappeared (and in case you're wondering, no, it wasn't my friend's fault, or decision :-) ). Apart from that, most of the potential users of a trick like this would not adopt it for actual data recovery, that is because their blog is "lost" for whatever reason. They will do it because they need that content, that may very well be still online, in formats that are ready to reuse in other ways. See my next post for more.

mfioretti
mfioretti

"it's exactly what I would like to do..." Hi eosp. This (reviving old blogs, wasting as little time as possible) is one of the topics that I will cover in the following posts. Stay tuned. In the meantime, if you have specific questions/needs/doubts about how to proceed, please write them here and I'll do my best to cover them in the follow-up post. Marco