<HEAD> <TITLE>Fast-Checkpointing a Perforce Database Using a Network Appliance Filer</TITLE> </HEAD> <BODY bgcolor="#ffffff"> <h2>Fast-Checkpointing a Perforce Database Using a Network Appliance Filer</h2> <br> <font size=-1> Richard Geiger<br> Network Appliance, Inc.<br> December, 1999 </font> <p><h3>Introduction</h3> <p> This note describes how to use the "Snapshot" feature of Network Appliance filers to implement a "Perforce fast checkpoint" capability. This can serve to dramatically reduce the amount of time that the Perforce server is unavailable during a database checkpoint operation. <p> The size of the reduction will vary, depending on the size of the Perforce depot, as well as the overall size of the filer volume on which it it stored. But, for example: on a depot with a 1.8Gb database (i.e., of <tt>db.*</tt> files in <tt>$P4ROOT</tt>), stored on 17Gb filer volume, the window of time when the Perforce server was unavailable during a "normal" checkpoint was typically 40-45 minutes. By using the technique illustrated here, the window of unavailability was reduced to under 10 seconds. <h3>How it Works</h3> The "Snapshot" feature on Network Appliance filers allows the state of an entire filesystem (volume) to be rapidly saved. It's fast because it only involves copying pointers. Initially, a moment after the creation of the snapshot, the snapshot and the live filesystem have identical contents, and share the same set of data blocks. Subsequently, new writes to files and directories in the "live" version of the filesystem are done by writing to free disk blocks, and updating the pointers in the live filesystem. No disk block that was in use at the time the snapshot was taken is re-allocated until after the snapshot is deleted. Thus, the snapshot remains as a read-only version of the filesystem at the time the snapshot was taken. <p> The technique illustrated in the <tt>snap_checkpoint</tt> script takes advantage of this, by locking the Perforce database, snapshotting the filesystem containing the <tt>db.*</tt> files, and then unlocking the database. These steps happen fairly quickly - just a few seconds on a 17Gb filer volume, for example. As soon as the snapshot is complete, and the database has been unlocked, the Perforce server is once again available to users. At this point, a <blockquote> <tt>p4d -r <i></tt>snapshot-copy-of-<tt>$P4ROOT</i> -jd</tt> </blockquote> command is run, to create the Perforce checkpoint. This operation can take as long as it must, without locking down the live database. (There, actually, are a couple of other steps in involved to correctly handle the saving and truncation of the journal file.) <p> The <tt>snap_checkpoint</tt> script is intended <ul> <li> To illustrate the use of this technique, in a concrete way; or <p><li> For actual use in a production environment. </ul> <p> <tt>snap_checkpoint</tt> has a handful of configuration parameters (see the comments under "Configuration Settings" at the top of the script), but is not completely flexible in every way imaginable; you may need to alter it in order to make it fit well with your own Perforce server backup practices. <h3>Example Output</h3> Here's what the output looks like: <blockquote><pre> $ snap_checkpoint > /u/p4/dist/r99.2/bin.osf/p4 -p p4netapp:1678 counters : burt = 976 : change = 1211 : journal = 789 : notify = 442 > /u/p4/dist/r99.2/bin.osf/p4 -p p4netapp:1678 counter journal 790 : Counter journal set. /u/p4/root.p4netapp:1678 locked. > /bin/cp -p /u/p4/checkpoint.p4netapp:1678/journal /u/p4/checkpoint.p4netapp:1678/20000110112316.jnl.789 > /bin/rsh maglite snap delete perforce checkpoint 2>&1 : No such snapshot. > /bin/rsh maglite snap create perforce checkpoint 2>&1 : creating snapshot...... /u/p4/root.p4netapp:1678 unlocked. </pre> <blockquote><i> The steps up to this point execute quickly. Beyond this point, the Perforce server is available to users, while the checkpoint operation actually takes place from the snapshot. </i></blockquote> <pre> > /usr/local/bin/gzip /u/p4/checkpoint.p4netapp:1678/20000110112316.jnl.789 > /u/p4/dist/r99.2/bin.osf/p4d -r /u/p4/root.p4netapp:1678/.snapshot/checkpoint -p p4netapp:1678 -z -jd /u/p4/checkpoint.p4netapp:1678/20000110112316.ckp.790.gz : Dumping to /u/p4/checkpoint.p4netapp:1678/20000110112316.ckp.790.gz... > /bin/rsh maglite snap delete perforce checkpoint 2>&1 $ </blockquote></pre> <p> <hr> <p> <i> NEITHER THE AUTHOR, NETWORK APPLIANCE, INC. NOR PERFORCE SOFTWARE MAKE ANY WARRANTY, EXPLICIT OR IMPLIED, AS TO THE CORRECTNESS, FITNESS FOR ANY APPLICATION, NOR THE SAFETY OF THE <tt>snap_checkpoint</tt> SOFTWARE. </i>
# | Change | User | Description | Committed | |
---|---|---|---|---|---|
#10 | 1544 | Richard Geiger |
Update to reflect changes in p4d 2002.1: a) The change in the locking order, due to db.changex b) The fix for job006497 |
||
#9 | 908 | Richard Geiger |
make all references to "snap_checkpoint" hyperlinks actual script. |
||
#8 | 831 | Richard Geiger | Add job# | ||
#7 | 829 | Richard Geiger | About the Data ONTAP 6.1 bugfix problem | ||
#6 | 437 | Richard Geiger |
Hack to handle r2000.1's newfound reluctance to do "p4 counter journal NNNN". |
||
#5 | 264 | Richard Geiger | Add links back to the NetApp & Perforce web sites per Laura's suggestion. | ||
#4 | 259 | Richard Geiger | Update the example output to match the latest version of the script. | ||
#3 | 246 | Richard Geiger |
Update the script such that we use, verbatim, the p4d_snap_checkpoint function from "p4d_admin", which the version we're finally really deploying. This should make it much easier to maintain in the future. Also update the html doc to match. |
||
#2 | 239 | Richard Geiger |
- Use LOCK_SH when locking the database - Use ALL CAPS when shunning all responsibility for the thing (Warranty disclaimer) |
||
#1 | 238 | Richard Geiger |
Sample script illustrating how to use Data ONTAP snapshots for a "fast checkpoint", plus accompanying notes |