Git polling causing excessive load through frequent submodule cloning

Follow

Comments

3 comments

  • Avatar
    janmejay

    Hi Andy,


    Go server performs material check every one minute, and uses a UUID as destination directory for 'git clone' the first time. Second time onwards Go does not perform a clone, but insteed uses existing clone and performs fetch over it.


    The wipe the dir and perform fresh clone flow will only engage when there is already a different git repo(repo with a different url) cloned to the destination directory.


    This can happen in a few scenarios, for instance, if you have a UUID conflict(and new repo trying to use UUID which is already used by another repo) or if destination is not writable by the user that Go is running as etc.


    Can you please enable debug logging, that is, modify /etc/go/log4j.properties or equivallent for your installation environment to change the line 'log4j.logger.com.thoughtworks.cruise=INFO' to 'log4j.logger.com.thoughtworks.cruise=DEBUG' and follow it with a server restart. 


    Please wait for the server to run for some time(a few hours) after doing this and send us the server log for this period.


    Regards,
    Janmejay 

  • Avatar
    Andy Thompson

    Hi Janmejay


    I've got logs for an hour of debug activity. How would you like me to pass them on?


    Regarding what I meant over cloning, Go doesn't directly call git clone, but, as described in http://community.thoughtworks.com/posts/45dcafc35c, goes through each submodule and deletes the data, then runs


    git submodule update --init


    which internally runs "git clone -n {submodule url} {submodule dir}" for each submodule (as Go deleted the original)


    This effectively would cause possibly 100's of MB of data to be transfered over the network or filesystem every minute, increasing as the submodules gained history.


    If the submodules are removed from the repo (and so no longer being cloned), the polling is causing barely any load due to incremental fetches to the repository.


    Kind Regards


    Andy

  • Avatar
    Matthew Skelton

    Andy

    We faced similar issues with git submodule. I understand that a fix is coming for git submodule behaviour in GO 12.4, and you can workaround the issue in versions prior to 12.4 by adding an extra config line to the wrapper config file like this:

    wrapper.java.additional.XX=-Dmaterial.check.threads=4

    (where XX is a unique sequential config number)

    This reduces the simultaneous material check subprocesses to 4 (instead of the default 10). Result: disk activity is reduced, and pipeline trigger speed is not noticably affected. You could try other values which work for you.

    Matthew

Please sign in to leave a comment.