Change 16546

Files 5
Comments 11

configlex.l

configparse.y

p4dctl.cpp

server.cpp

server.h

Tip: Use n and p to cycle through the changes.

	Flag as Task Matt Attaway commented 9 years ago Nice change! I'll talk with the developer about bringing this patch into the product if you're ok with that.
	Reply ·0

	Flag as Task lbarbier commented 9 years ago Thanks, I'm totally ok with it. I'm the one that asked for the feature in case #00206843 This change is "required" for users using p4dctl to create checkpoints in a distributed helix server (aka commit/edge) as by default they are compressed and break edge servers. configparse.c is certainly not necessary for the CL as it's generated at compil time.
	Reply ·0

	Flag as Task Matt Attaway commented 9 years ago @nick_poole will be in touch tomorrow; he's the develop for p4dctl.
	Reply ·0

	Flag as Task Nick Poole commented 9 years ago Hi @lbarbier First off, thank you for contributing your solution back to the community. This is a great start and I've got a couple of suggestions that will make this feature even better. The issue you've encountered is that 'p4dctl checkpoint' is calling 'p4d -z -jc', which is compressing the checkpoint and the rotated journal, breaking replication. But I believe that you'd also encounter the same problem if you ran 'p4dctl journal', which rotates the journal by calling 'p4d -z -jj'. There are two solutions to this problem: Disable compression by not passing the '-z' flag (as in your shelf). Compress only the checkpoint, not the journal, by passing the '-Z' flag. I think there are probably use cases for both, so rather than providing a single compression on/off we should add two options: CompressCheckpoint = true/false CompressJournal = false/true In this the case that CompressCheckpoint == true and CompressJournal == true, pass '-z'. In this the case that CompressCheckpoint == true and CompressJournal == false, pass '-Z'. In this the case that CompressCheckpoint == false and CompressJournal == false, pass no flag. In this the case that CompressCheckpoint == false and CompressJournal == true, pass no flag, or raise a config parse error. What do you think? If you like, I can add my suggested changes to the review for you to test out, or if you'd prefer to make the change, I'll be happy to review it. I'd like to work with you, so I'm happy to work whichever way you'd prefer.
	Reply ·0

	Flag as Task lbarbier commented 9 years ago Hi Nick, I wasn't aware of -Z for compressing the checkpoint only. I didn't looked at journal rotation because I never do that (not yet in fact) so it was out of my mind. But considering it, if you want to rotate journal and compress it, shouldn't case 4 (ckp=false, jnl=true) still work for journal rotations ? I agree it would be weird to checkpoint with journal uncompressed while rotate would compress it. Other than this, I totally agree that we then need to go further and add journal compression too. It's your code, it might be faster for you to work on it no ? For me now that I know the toolchain and how is the code generated it should be quite faster
	Reply ·0

	Flag as Task lbarbier commented 9 years ago Hello, P4D::checkpoint might be problematic. I do get the idea about doing it with a 'p4d -jc' but replicas don't like that and enforce usage of 'p4 admin checkpoint'. Which make both journal and checkpoint fail. Thinking about a solution. Current thought is leading me to a new server type in server.conf that would be be p4replica with a p4replica class son of p4d. Checkpoint and RotateJournal being overridden.
	Reply ·0

	Flag as Task Nick Poole commented 9 years ago Hi @lbarbier, The issue with compressing only the journal (not the checkpoint) is that there's no p4d flag to do that. We could compress the rotated journal afterwards, or uncompress the checkpoint, but I think it's an inadvisable enough case that we don't need to worry about it. I'll go ahead and make the changes in the review; I didn't want to go ahead without checking first, in case I caused offence. As for checkpointing replica servers, that could be tricky, but it should be possible to identify if a server is acting as a replica and do something appropriate. I'll do some research and let you know what I come up with.
	Reply ·0

	Flag as Task lbarbier commented 9 years ago (edited) Hi @nick_poole , Feel free to look at this : https://swarm.workshop.perforce.com/changes/16575 It's a first draft on adding a p4replica server type. I'm not extremely happy with how I made it, I didn't hade time yet to try it, at least it compile. It should work fine unless I forgot to take care of specific cases. I also added journal compression but not yet error handling on Journal = true & checkpoint = false. No clue yet how to handle it.
	Reply ·0

	Flag as Task Nick Poole commented 9 years ago Hi @lbarbier, Thanks for your continued work on this. I've been thing a bit more about both cases, and I think we might want to consider a different direction: ever since replication was introduced, compressing journals has become a problem. p4dctl's checkpoint and journal commands predate replication (they were part of the original open-source project: //guest/tony_smith/perforce/p4dctl/). I propose that we don't add an option to rotate journals at all; instead, just change the '-z' flag to '-Z'. I still think that the option to enable/disable the flag is a great idea, and this is very similar to your original review. I'll update the review shortly with this proposal. As for supporting replica Perforce Servers, I think that running 'p4 admin checkpoint' from p4dctl might be problematic: p4dctl needs to be configured to run the command as a privileged user, potentially with authentication. Unlike the P4D::Stop() implementation, there's no good fallback for this at the moment. I think we could adapt both P4D::Checkpoint() and P4D::Journal() to attempt to execute the 'p4 admin' commands, and fall back to calling 'p4d -jc/-jj'. If we added some error handling to detect the replica error message we could return a more useful message to the user. To improve the experience even further, I'll log an enhancement request for 'p4d -jc' against a replica to schedule a checkpoint, so that if 'p4dctl checkpoint' falls back to running that command against a replica, it will still succeed.
	Reply ·0

	Flag as Task lbarbier commented 9 years ago Hi @nick_poole, I picked your changes and merged into my version and pushed back into shelve #16575. I think we need to preserve journal rotations. As a simple example : my edge server has 100GB of journal created each day while my commit only has 5GB a week. I want to rotate daily my edge journal so that weekly checkpoint doesn't take an hour and don't explode journal filesystem too. I got my work working without any modification. As you said you need a privileged user but p4dctl already has all the tools it need to do so : by using P4TICKETS and P4USER in the environment section it totally fix the problem. So now I have a replica able to checkpoint and rotate journal. I did noticed that checkpoints weren't made on replicas until master would do a journal rotation. I also did know that journal rotation on master would launch a rotation on replicas It was less funny to discover that launching a replica journal rotation trigger in fact a master journal rotation. Which mean it's pointless to do a rotation of journal on replicas if something is already doing it on master. Also mean you can't rotate whenever you want a replica which IMO is a bad thing. A daily rotation on 2 sites with time difference could potentially make the rotation happen in the middle of the day. So to do a checkpoint on a replica with an automatic task you have first to schedule the checkpoint on the replica and then launch a journal rotation on master, 2 steps need to be done in the right order. With my current knowledge I think the whole checkpoint / rotation on replicas need a buff. Regards, -L
	Reply ·0

	Flag as Task Nick Poole commented 9 years ago Hi @lbarbier, I see your point: if the server is at the end of a chain of servers, then uncompressed journals are just consuming disk space. I'm glad to hear that the 'p4 admin checkpoint' is working in your environment. I'm actually testing out a change at would use that logic for and P4D, falling back to calling 'p4d -jc' if the 'p4 admin' command failed. The behaviour you're seeing with rotations starting at the master, and checkpoints being scheduled is by design. The journal number is stored in the counters tables, and replicated as part of the journal, so in order to ensure that journals and checkpoints match up between replicas, the journal rotation on the master causes journal rotations on the replicas. If you rotated the journal on just a replica, it would have a different journal number to the master. If the journal numbers somehow got out of sync, and you had some sort of failure that required you to restore the master, you would have no reference point to restore and restart the replica. I agree that this could be made better, possibly by improving visibility of which replicas were scheduled to checkpoint, etc, but journal rotations are relatively cheap, so even if you rotated before you scheduled the checkpoint, and again after, I believe the overall time that the process will take will be about the same. I'll update the review tomorrow with an option to compress the journal. I'm also going to move it into the main branch: I'll need to backport if I it's to be included in a patch release. I'll also include the 'p4 admin checkpoint/journal' calls that I mentioned (the only real difference to your shelf is that they are on the P4D class: I'm not convinced that we need another server type, and I'd rather not add unnecessary configuration options). Thanks, Nick
	Reply ·0

commits/16546

Change 16546