Pipeline locking and build failures

Follow

Comments

12 comments

  • Avatar
    Brett Cave

    It looks like a single stage pipeline releases the lock when it fails, but if a multi-stage pipeline fails, the lock is not released.


    Also, is there  a way to allow scheduling of pipelines to be executed once lock is released?

  • Avatar
    Rajesh Muppalla

    Hi Brett,


    A single stage pipeline instance is considered complete when the single stage passes or fails or cancels. Hence the lock gets released.


    For a mulitple stage pipeline, the lock does not get released if any of the intermediate stages fail or cancel. However, if the last stage fails or cancels the lock gets released as the particular pipeline instance is considered complete.


    You can schedule a pipeline by calling our pipeline scheduling API. More info here - http://www.thoughtworks-studios.com/go/2.1/help/Pipeline_API.html


    The link also documents the API to release the lock of a locked pipeline.


    Regards,


    Rajesh

  • Avatar
    Brett Cave

    Yep, noticed that.


    Like I mentioned in my original post, if we want to automate pipeline unlocking, we would have to have a <runif status="failed" for all our tasks that would schedule a job (e.g. via curl) that in return calls the releaseLock on the initial job...  We found that we could not configure a pipeline without a material however - is this something that might be available in future releases? (the releasePipelineLock pipeline doesn't require materials, unless the scripts themselves are in source control or another pipeline).

  • Avatar
    Rajesh Muppalla

    Hi Brett,


    How are you planning to run the scheduling API? Are you planning to create a pipeline for this and trigger it on a timer (cron-based) schedule?


    Please note that, the script to release the pipeline lock needs to have the ability to continously poll the pipeline status as you cannot unlock a pipeline if any of the stages in the pipeline are currently running.


    If you can explain the exact use case you are trying to solve and how you are planning to go about it, may be we can figure out something simpler.


    - Rajesh

  • Avatar
    Brett Cave

    Our for building are based on "Continuous Delivery", one of the reasons why we are using Go :)


    Part of our build policy is "every commit produces an artifact", so for each revision we have a deployable artifact. In order to successfully schedule 1 pipeline per commit, we have had to script a message queing system. It would really be great to just schedule using a queue in Go, and let it handle the backlog. Our queueing system:



    • Polls the releaseLock API for the pipeline. If there is no lock / the lock is successfully removed and there are no other builds waiting to be queued, then schedule a build

    • Otherwise add the request that will trigger the pipeline for that specific material to the queue.

    • Consume messages in the queue (fifo). The consumer is cron-based, and queries the releaseLock API. A new build will be scheduled if there is no lock or the lock is successfully removed.


    We were going to use inter-pipeline calls directly, but ran into the issue you mentioned.


    Is there an info / status method in the API? I am currently using the relealseLock API to determine the status, but would like to be able to query without action (e.g. if an intermediate stage failed, I would like to get a "Locked by Pipeline/INSTANCE_NUMBER" result without removing lock). - the docs mention that the GET method returns info, but cannot find any URI's that work with GET.

  • Avatar
    Anush Ramani

    Hi Brett,


    Could you please clarify a couple of questions to help us understand your configuration better:



    1. Why do you need the pipeline to be locked i.e. why do you want only one build happening at any given point in time? Go can handle multiple builds at the same time while correctly propagating the same checkin revision to all stages of the build. Is this constraint due to agent resource limitations? Or do you have deployments as part of your pipeline?

    2. Regarding your "every commit produces an artifact" policy, could you give us some background on why this is? I know that this is one of the principles of CI, but that was primarily because with traditional CI systems, you could not ask it to build with a particular historic revision. But now, with Go, you can perform "bisects" on an ad-hoc basis i.e. you can pick a particular revision from your SCM and have Go make a build with that specific revision on the fly. So, technically, you have the ability to procduce artifacts for every checkin, but you don't necessarily need to do it all the time.

  • Avatar
    Brett Cave

    Hi Anush,


    1. Our pipeline has a number of stages, and due to slow connectivity between our Go server and externally hosted SCM, it is only the initial stage that has fetchMaterials set to true. So if we run without locking, then the same revision is not correctly maintained. This is the reason we enable locking in the first place. Our deployments are configured in seperate pipelines to builds.


    2. Yes, because of the principles of CI. It also streamlines the process of identifying exactly who and where the break is. If there is a break in a build with multiple revisions, then the problem could lie in a commit by a number of developers. By creating a build for every change, we can quickly identify who is responsible for the break, and which revision it was in.


    We do not deploy every commit, only build and test each one (to achieve a similar goal to the "pre-flight commit" feature that some other CI systems offer)

  • Avatar
    Rajesh Muppalla

    Hi Brett,


    1. Can you let us know how many agents are configured to run for the pipeline? And how many jobs do each of these stages have? It seems to me that for the setup you mention you can accomplish this by having only a single agent, that updates to the latest revision from the SCM during the first stage and subsequently runs sequentially for all the remaining stages using the same SCM revision. If thats the case, you also don't need locking enabled. Having a single agent ensures that at any point of time only one instance of the pipeline is actually running, though there could be more than one instance scheduled (waiting for an agent to be assigned). 


    How slow is the connectivity? The agents do a full checkout (or a clone in case of DVCS) the first time which may take time, but subsequently they update (or pull) from the repos, which is faster. To ensure that you don't checkout the same repo multiple times on the different agents, you can checkout the repo once on one of the agents and copy it to the other agents in the agent-installation-dir/pipelines/<pipeline-name> folder.  


    2. Although this setup would make it easier to identify "who broke the build", it can potentially increase the feedback time in case of build failures. For example, if 4 developers commit together and your queuing system orders commit in such a way that the last commit that triggers the pipeline has a failure, the developer will have to wait for 3 prior commits to get built before he knows that his commit failed the build. Alternately, what happens if the first commit breaks the build and the developer needs to make another commit (commit 5) to fix it. If the scheduling queue runs each of the commits (commit 2-4) sequentially, they will all fail and commit 5 will be in the queue till then. 


    In case you are using a DVCS as your SCM (git or hg), we can have "pre-flight builds" implemented in Go. This needs some amount of setup, the instructions for which I can provide in a separate post.


    Can you attach the cruise-config.xml file (please ensure that you mask any sensitive information)? May be that can help me understand your setup better and hopefully help you simplify it.


    - Rajesh

  • Avatar
    Brett Cave

    Hi Rajesh,


    1 build agent. We found that without locking and with a pipeline consisting of numerous stages, the pipelines are not executed sequentially. Our pipeline has 7 stages. Without locking, and with a single agent, the allocation of jobs to the single agent works in more of a round robin sense across multiple pipelines. stage1 of instance1 (s1i1), then stage2 of instance1 (s2i1), then a 2nd instance is triggered and that results in s1i2 being next, followed by s3i1, then s2i2, and so on. We had stages from 4 different pipelines in rotation at one point, which is when we switched the lock on.

  • Avatar
    Daniel Alexiuc

    Automatic Pipelline Locking should be split into two separate features - "limiting the number of concurrent builds" and "locking of pipeline if the build breaks".


     


    There are good reasons I don't want multiple concurrent builds running. i.e. the dodgy tests rely on the database schema being in a certain state :)


     


    But there are also good reasons why I don't want the pipeline to lock when the build goes red. In fact I still don't understand why it does this at all. It is really frustrating when someone checks in a fix and then forgets to unlock the build so that it can run, especially if the build takes a long time.


    Rajesh, with regard to your second point:


    2. Although this setup would make it easier to identify "who broke the build", it can potentially increase the feedback time in case of build failures. For example, if 4 developers commit together and your queuing system orders commit in such a way that the last commit that triggers the pipeline has a failure, the developer will have to wait for 3 prior commits to get built before he knows that his commit failed the build. Alternately, what happens if the first commit breaks the build and the developer needs to make another commit (commit 5) to fix it. If the scheduling queue runs each of the commits (commit 2-4) sequentially, they will all fail and commit 5 will be in the queue till then. 


    I want the build to ALWAYS run if it is idle and there is new code. If multiple developers have commited code, and commit 5 is the fix, then it should batch all the changes together and run a single build, just like it does normally. If you are concerned about the situation where commit 5 has missed the start of the build that included commits 2-4, then that developer can always cancel the running build and re-trigger it so it includes his changes. This makes way more sense to me than just locking the build completely and ignoring any changes that queue up until someone unlocks it. THAT wastes everybody's time.

  • Avatar
    Brandon Liles

    I have to agree with Daniel.

    Please make it easy to limit the pipeline to one build at a time without forcing this to also lock the pipeline whenever there is a build failure.

  • Avatar
    Aditya Sood

    Hi,

     

    This forum is only for reference. GoCD is now Open Source and the community has moved here. Please redirect your feature requests here

     

    Regards,

    Aditya

Please sign in to leave a comment.