Staggered "build on all" for jobs
A job can be configured to run on all agents. It will (subject to agent availability) run these concurrently. It would be great if I could run a job on all agents, but staggered, so that if there are more than 1 agent, the job will only be started on n+1 agent when n agent is finished its run.
Our use case for motivating this request could be for an HA web application that requires a restart. Concurrent jobs would make the web app unavailable, while staggered run-on-all would upgrade app server nodes 1 by 1.
-
Hi Brett
To get the feature working would require us to change the core working of how Go thinks of Jobs. In Go's concept jobs are run in parallel and stages are sequential.
We would suggest you to split the jobs such that the restart is taken care of in another stage.
Regards,
Ali/Princy
-
To get into a little more detail, jobs are meant to run in parallel, it's a core concept of Go. Changing this requires quite a bit of effort, and that's why it hasn't been considered yet. However, while this is the case, you can do something which gets you close to what you want (but not exactly what you want).
You can split the agents you want to run-on-all into two, by giving half of them a resource, say, "resource1" and the other half a resource, say, "resource2". Then, if you have 2 stages, with identical jobs set to run-on-all, with the first stage set to run the job on agents with resource1 and the second stage to run the job on agents with resource2, then, half your app servers will be upgraded, while the other half are serving requests. Then, the second half of your app servers will be upgraded.
It's a workaround, for sure. But, it might work for you.
-
Thanks for the suggestions on the workaround - we implemented a workaround within a day of posting this improvement. It involves using a task that calls a wrapper script to execute the command (e.g. service x restart), but uses a lock file on shared storage for control. There is no special configuration needed in Go then. Here's a psuedo example of how we did it:
RUN_ID=${GO_PIPELINE_NAME}-$(GO_PIPELINE_COUNTER}-$(GO_STAGE_NAME}-${GO_STAGE_COUNTER}
function do_restart() {
service x restart
}
while [ -f /nfsshare/appdeploy/${RUN_ID} ]; do
sleep 10
done
hostname > /nfsshare/appdeploy/${RUN_ID}
do_restart
sleep 10 # give the service 10 seconds to start up
rm -f /nfsshare/appdeploy/${RUN_ID}
Please sign in to leave a comment.
Comments
5 comments