/
Improvements of database backup/restore

Improvements of database backup/restore

Imported From: http://groups.google.com/group/in-portal-dev/browse_thread/thread/f07d5c002cc0682f#

In-Portal very basic backup/restore capabilities of its data. Right now it only allows to do an SQL dump of whole database into a single file and later replace current database with any of the dumps that were made before that.

I'm proposing to make it more intellectual to be able to backup/restore:

  1. all/specific records from selected table(-s) only
  2. associated data (based on data that is exported) from connected tables
  3. associated data  (based on data that is exported) from related tables 

And of course you can combine any of these into a single import/export procedure. 

Connected tables

These are tables, that are used to display data on a separate tab (in Admin Console), while adding/editing an item. For example images, custom fields, etc. Right now we define all connected tables via SubItems/ForeignKey/ParentTableKey options in each unit config.

Related tables

These are tables, where you reference this data. For example in paid listings table you specify link id (or link resource id) where this particular enhancement is applied to. Also when added a link/article to a category you reference particular category in context of that link/article. Right now we don't specify these table relations anywhere and thus can't use this info during backup/restore process. To fill that gap I'm proposing to specify all back-references to a field in field definition.

For example:

  • in ItemResourceId field definition (in unit config) of Listings table we specify that it uses data from ResourceId field of Link table
  • in CategoryId field definition (in unit config) of CategoryItems table we specify that it uses data from CategoryId field of Categories table

In unit config this might look like this:

'Fields' => array( 
	'ItemResourceId' => array(
		'type' => 'int',
		...,
		'default' => null, 'foreign_key' => 'l:ResourceId'
	), 
), 

Thanks to that described above improvements backup/restore code won't be just considering a database as a bunch of unconnected tables, but will know every piece about the data. 

Storage format

Right now all backup is stored as large sql file with all table definitions and data. This is extremely inconvenient to use, because we need to threat data in file as one piece to be able to extract separate database queries from it. When there are a lots of data this becomes very memory consuming process. Storing same data in XML format doesn't solve a problem too, because we yet again need to load ALL file into memory to be able to parse it.

That's why I'm proposing to use SQLLite engine, that is always bundled with PHP) and store a backup as a SQLLite database. Each SQLLite database is 1 file, so no problems with that.

Restoring backups

Since we now would allow a partial backup, then user can restore it back at any time without changing unrelated parts of database. When restoring we can't rely on fact that record ID=A in backup and ID=A in target database are same record. To solve this I'm proposing to record last record id of every database table we backup (if we even backup 1 record from it). This way during restore we will know, that if we have matching ID in target database and this ID is smaller then maximal id existed at time of backup, then it's same record. This check can be extremely useful when LIVE/UAT databases are used. 

Usage case

Some database exists on live website and we need to create close-to-live environment on UAT (user accepted testing) server. At first we just copy paste all the data from live. Over time data is filled in on both live and uat databases and same ID of record in both databases no longer reefer to same record. Luckily for use we recorded last ID in each table from UAT database creation. Now on restore we exactly know which records should be updated and which ones should create and their current ID in live database should be incremented (in all connected/related tables too) to prevent ID conflict in UAT database.

 

P.S.

I know this isn't easy subject to understand, but if we do implement what I was talking about here, then we can do export/import at least following items from dev/sandbox to live without difficulties:

  • pages
  • page content blocks
  • custom field definitions
  • data entered in custom fields