Convalesco

Current revision: 0.7

Last update: 2020-06-24 20:47:52 +0000 UTC

It is not because things are difficult that we do not dare; it is because we do not dare that things are difficult.

Seneca


Parallel stages in Jenkins scripted pipelines

Date: 26/05/2020, 19:25

Category: technology

Revision: 3



Jenkins is an automation tool commonly used for software deployment. Scripted pipelines use groovy threads to run parallel stages. To run parallel stages, an array of stage objects has to be created and passed as an argument to parallel keyword.

Common pitfalls when running parallel stages will be discussed in this article.

Groovy closures

The most common pitfall are Groovy closures. The usual approach to the uninitiated is to create arrays using the for loop. For example:

def environments = ["development", "staging", "production"]

stage("deploy to multiple environments") {
    def deployments = [:]
    for (e in environments) {
      deployments[e] = {
        stage(e) {
           // do work
        }
      }
    }
  parallel deployments
}

The equivalent code in groovy would be:

def environments = ["development", "staging", "production"]
for (e in environments) {
  Thread.start { println e }
}

Surprisingly, the above example produces the following output:

production
production
production

The variable e is mutable and keeps mutating as the for loop goes by, at runtime. The above array is small. For even slightly bigger arrays, results will vary based on the value the variable has at thread launch:

def environments = ["development", "staging", "production", "testing", "stress", "pre-production", "post-production", "qa"]
for (e in environments) {
  Thread.start { println e }
}

$ groovy test.groovy
production
stress
production
testing
pre-production
qa
qa
post-production

To perform a complete array loop smoothly, groovy’s each function can be used alongside closures. The example bellow maps each element to a new variable:

def environments = ["development", "staging", "production"]
environments.each { e -> Thread.start { println e } }

The above code yields the expected result:

staging
development
production

The same can be achieved through a for loop, should we choose to define the thread variable in the loop before launching the thread:

def environments = ["development", "staging", "production", "testing", "stress", "pre-production", "post-production", "qa"]
for (e in environments) {
  def b = e
  Thread.start { println b }
}

The above code yields the expected result:

production
testing
stress
post-production
qa
staging
pre-production
development

Here you can see the complete array loop in scripted pipeline:

def environments = ["development", "staging", "production"]

stage("deploy to multiple environments") {
    def deployments = [:]
    environments.each { e ->
      deployments[e] = {
        stage(e) {
          // do work
        }
      }
    }
  parallel deployments
}

Variables must be defined in all cases to keep inconsistencies at bay, Groovy’s website is explicit about variable definition:

It is mandatory for variable definitions to have a type or placeholder. If left out, the type name will be deemed to refer to an existing variable (presumably declared earlier).

So even when calling a function, the def or var must be used. Let’s take a look at the following example:

def branch(env) {
  //
  // The function will always return the same result
  // when yield on a thread, if 'def' or 'var' is not used!
  //
  // b = env <- wrong!
  def b = env // <- right!
  switch(b) {
    case "development":
      return "development-branch"
      break;
    case "staging":
      return "staging-branch"
      break;
    default:
      return "master"
      break;
  }
}

def environments = ["development", "staging", "production"]
for (e in environments) {
  def b = e
  Thread.start {
    // z = branch(b) // <- wrong
    def z = branch(b) // <- right
    printf("environment: ${b}\tbranch: ${z}\n")
  }
}

Once z has been defined, the original value will be used instead of being re-evaluated in all threads. This happens because Groovy closures are able to access variables defined prior to execution outside the closure’s scope, a non-intuitive characteristic that leads to all sorts of issues!

The proper definition, def variable = <value> must be used everywhere to avoid closure related issues described above.

Workspace isolation

Running stages in parallel speeds up the pipeline considerably. The anticipated trade-off is increased complexity. By default, Jenkins will run the parallel pipelines in a common workspace which might lead to race conditions.

To achieve isolation, especially multi-slave-nodes, isolating staging in slaves is advised:

def environments = ["development", "staging", "production"]

stage("deploy to multiple environments") {
    def deployments = [:]
    environments.each { e ->
      deployments[e] = {
        node('slave') { // run deployment(s) on 'slave' nodes, might the same or different.
          stage(e) {
            // println e
          }
        }
      }
    }
  parallel deployments
}

Now Jenkins will run each stage on a different slave node. If the scheduler runs the job on the same node, the workspace will be different.

Stashing files and folders

Most common workflows start by cloning source code. Although it’s possible run a clone stage on every node, is not efficient. One way to optimise the workflow, could be by cloning the source code in one stage, create an archive and move the archive in the respective nodes. Jenkins will handle moving archives around transparently:

   stage("stash artifacts") {
     stash name: 'artifacts', useDefaultExcludes: false
   }

   stage("deploy to multiple environments") {
    def deployments = [:]
    environments.each { e ->
      deployments[e] = {
        node('slave') { // run the deployment on a 'slave' node
          // Check if unstash is needed
          def exists = fileExists '.git'
          if (!exists) {
            unstash 'artifacts'
          }
          stage(e) {
            // println e
         }
       }
      }
    }
  parallel deployments
}

The useDefaultExcludes: false argument is useful if .git directory needs to be stashed as well. The second important bit is the following check:

          def exists = fileExists '.git'
          if (!exists) {
            unstash 'artifacts'
          }

Jenkins is going to use the same workspace for at least one of the parallel jobs and the unstash command will fail, if the files exist in the current workspace. The check hints that there is no need to unstash as the files are already there.