snap_checkpoint.html #10

  • //
  • guest/
  • richard_geiger/
  • utils/
  • snap_checkpoint/
  • snap_checkpoint.html
  • View
  • Commits
  • Open Download .zip Download (8 KB)
<HEAD>
<TITLE>Fast-Checkpointing a Perforce Database Using a Network Appliance Filer</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">

<h2>Fast-Checkpointing a Perforce Database Using a Network Appliance Filer</h2>
<br>
<font size=-1>
Richard Geiger<br>
Network Appliance, Inc.<br>
January, 2000; revised March 14, 2002
</font>

<blockquote>
<p><b><font color=red>Note for use with Network Appliance Data ONTAP 6.1 and later</font>:</b>
<p>

Release 6.1 of Data ONTAP filer software includes a bug fix for
Network Appliance bug id 34931, "A user does not get 'Permission
Denied' when trying to append to a snapshot file."

<p>

Unfortunately, <a href=snap_checkpoint><tt>snap_checkpoint</tt></a>
has heretofore unwittingly relied on the behavior of this bug in order
to work, since <tt>p4d -jd</tt> opens files for read/write accesss
(even though it does not <i>use</i> write access to perform the
checkpoint operation in this case).

<p>

Perforce has tracked this defect as <tt>job006497</tt>,
'"p4d -jd" should open all database files read-only'.

<p>

A fix for the problem is present in Perforce release 2002.1.

<p>

Thus, users running Perforce with storage on a Network Appliance filer
runnning Data ONTAP version 6.1 (or later), <b>must</b> use Perforce
server version (<tt>p4d</tt>) 2002.1 (or later), in order to use <a
href=snap_checkpoint><tt>snap_checkpoint</tt></a>. This is not an
issue if your are using a Data ONTAP release prior to 6.1.

<p>

</blockquote>

<p>


<blockquote>
<p><b>Note for use with Perforce release 2000.1 and later servers:</b>
<p>
In release 2000.1 of <tt>p4d</tt>, the server will not allow certain
system-defined counters (including "<tt>journal</tt>")
to be altered by the <tt>p4 counter</tt>
command.  The <a href=snap_checkpoint><tt>snap_checkpoint</tt></a> script (described below) must be able to
manipulate the value of the journal counter; therefore,
<tt>snap_checkpoint#5</tt> now relies on the use of of a user-defined
counter - "<tt>snap_journal</tt>" - to track the checkpoint number. In
order to use this version of <a href=snap_checkpoint><tt>snap_checkpoint</tt></a>, you must
therefore first establish an initial value for this counter, using
<blockquote>
<tt>p4 counter snap_journal <i>value</i></tt>
</blockquote>
where <tt><i>value</i></tt> should be the same as the current
value of the <tt>journal</tt> counter. This initialization might look,
for example, like:
<blockquote><pre>
% p4 counters
change = 80106
job = 3
journal = 913
notify = 80106
% p4 counter snap_journal 913
Counter snap_journal set.
</pre></blockquote>

From this point on, it is important to use the <tt>snap_journal</tt>
counter to track checkpoint numbers, rather than the default
<tt>journal</tt> counter. This essentially means that you will need to
consistently use only <a href=snap_checkpoint><tt>snap_checkpoint</tt></a> to make
checkpoints.

If you <i>must</i> run a "normal" (non-snapshot-based) checkpoint, you
can carefully use <tt>p4d -j<i>option</i> -J ...</tt> to explicitly
specify the checkpoint file name, manually save and truncate the
journal, and manually increment the <tt>snap_journal</tt> counter.

<p>
Discussions are underway with Perforce Software about defining a
mechanism that could once again allow the <tt>journal</tt> counter to
be changed by <tt>p4 counter</tt>, removing the need for this
<tt>snap_journal</tt> hack.

</blockquote>

<p><h3>Introduction</h3>

<p>
This note describes how to use the "Snapshot" feature of Network
Appliance filers to implement a "Perforce fast checkpoint" capability.
This can serve to dramatically reduce the amount of time that the
Perforce server is unavailable during a database checkpoint operation.

<p>
The size of the reduction will vary, depending on the size of the
Perforce depot, as well as the overall size of the filer volume on
which it it stored. But, for example: on a depot with a 1.8Gb database
(i.e., of <tt>db.*</tt> files in <tt>$P4ROOT</tt>), stored on 17Gb filer volume, the
window of time when the Perforce server was unavailable during a
"normal" checkpoint was typically 40-45 minutes. By using the
technique illustrated here, the window of unavailability was reduced
to under 20 seconds.

<h3>How it Works</h3>

The "Snapshot" feature on Network Appliance filers allows the state
of an entire filesystem (volume) to be rapidly saved. It's fast
because it only involves copying pointers. Initially, a moment
after the creation of the snapshot, the snapshot and the live
filesystem have identical contents, and share the same set of data
blocks. Subsequently, new writes to files and directories in the
"live" version of the filesystem are done by writing to free disk
blocks, and updating the pointers in the live filesystem. No disk
block that was in use at the time the snapshot was taken is
re-allocated until after the snapshot is deleted. Thus, the snapshot
remains as a read-only version of the filesystem at the time the
snapshot was taken.

<p>

The technique illustrated in the <a href=snap_checkpoint><tt>snap_checkpoint</tt></a> script takes
advantage of this, by locking the Perforce database, snapshotting the
filesystem containing the <tt>db.*</tt> files, and then unlocking the
database. These steps happen fairly quickly - just a few seconds on a
17Gb filer volume, for example. As soon as the snapshot is complete,
and the database has been unlocked, the Perforce server is once again
available to users. At this point, a
<blockquote>
<tt>p4d -r <i></tt>snapshot-copy-of-<tt>$P4ROOT</i> -jd</tt>
</blockquote>
command is run, to create the Perforce checkpoint.
This operation can take as long as it must, without
locking down the live database.  (There, actually, are a couple of
other steps in involved to correctly handle the saving and truncation
of the journal file.)

<p>
The <a href=snap_checkpoint><tt>snap_checkpoint</tt></a> script is intended both...

<ul>
<li>
...To illustrate the use of this technique, in a concrete way; or
<p><li>
...For actual use in a production environment.
</ul>

<p>
<a href=snap_checkpoint><tt>snap_checkpoint</tt></a> has a handful of configuration parameters
(see the comments under "Configuration Settings" at the top of the
script), but is not completely flexible in every way imaginable; you
may need to alter it in order to make it fit well with your own
Perforce server backup practices.

<h3>Example Output</h3>

Here's what the output looks like:

<blockquote><pre>
$ snap_checkpoint
&gt; /bin/rsh powermatic snap delete perforce checkpoint 2&gt;&1
: No such snapshot.
: deleting snapshot...
&gt; /u/p4/VERS/bin.osf/p4 -p p4netapp:1672 counters
: change = 26
: journal = 1
: snap_journal = 1
&gt; /u/p4/VERS/bin.osf/p4 -p p4netapp:1672 counter snap_journal 2
: Counter snap_journal set.
snap_checkpoint: /u/p4/root.p4netapp:1672 locked.
&gt; /bin/rsh powermatic snap create perforce checkpoint 2&gt;&1
: creating snapshot...
snap_checkpoint: "/u/p4/checkpoint.p4netapp:1672/journal" truncated.
snap_checkpoint: /u/p4/root.p4netapp:1672 unlocked.
</pre>
<blockquote><i>

The steps up to this point execute quickly. Beyond this point, the
Perforce server is available to users, while the checkpoint operation
actually takes place from the snapshot.

</i></blockquote>
<pre>
&gt; /bin/cp -p /u/p4/checkpoint.p4netapp:1672/.snapshot/checkpoint/journal /u/p4/checkpoint.p4netapp:1672/20000925142937.jnl.1
&gt; /usr/local/bin/gzip /u/p4/checkpoint.p4netapp:1672/20000925142937.jnl.1
&gt; /u/p4/VERS/bin.osf/p4d -r /u/p4/root.p4netapp:1672/.snapshot/checkpoint -p p4netapp:1672 -z -jd /u/p4/checkpoint.p4netapp:1672/20000925142937.ckp.2.gz
: Dumping to /u/p4/checkpoint.p4netapp:1672/20000925142937.ckp.2.gz...
$ 
</blockquote></pre>
<p>
<hr>
<p>
<i>
For more informaiton on Perforce - The Fast Software Configuration Management System -
please visit <a href=http://www.perforce.com/>the Perforce web site</a>.
<p>
For more informaiton Network Appliance filers - Fast, Simple, Reliable Network Attached Storage -
please visit <a href=http://www.netapp.com/>the Network Appliance web site</a>.
</i>
<p>
<hr>
<p>
<i>
NEITHER THE AUTHOR, NETWORK APPLIANCE, INC. NOR PERFORCE SOFTWARE MAKE ANY
WARRANTY, EXPLICIT OR IMPLIED, AS TO THE CORRECTNESS, FITNESS FOR ANY
APPLICATION, NOR THE SAFETY OF THE <a href=snap_checkpoint><tt>snap_checkpoint</tt></a> SOFTWARE.
</i>
# Change User Description Committed
#10 1544 Richard Geiger Update to reflect changes in p4d 2002.1:
  a) The change in the locking order, due to db.changex
  b) The fix for job006497
#9 908 Richard Geiger make all references to "snap_checkpoint" hyperlinks actual
script.
#8 831 Richard Geiger Add job#
#7 829 Richard Geiger About the Data ONTAP 6.1 bugfix problem
#6 437 Richard Geiger Hack to handle r2000.1's newfound reluctance to do
"p4 counter journal NNNN".
#5 264 Richard Geiger Add links back to the NetApp & Perforce web sites per Laura's suggestion.
#4 259 Richard Geiger Update the example output to match the latest version of the script.
#3 246 Richard Geiger Update the script such that we use, verbatim, the p4d_snap_checkpoint
function from "p4d_admin", which the version we're finally really
deploying. This should make it much easier to maintain in the
future. Also update the html doc to match.
#2 239 Richard Geiger - Use LOCK_SH when locking the database
- Use ALL CAPS when shunning all responsibility for the thing
  (Warranty disclaimer)
#1 238 Richard Geiger Sample script illustrating how to use Data ONTAP snapshots for
a "fast checkpoint", plus accompanying notes