The fool doth think he is wise, but the wise man knows himself to be a fool.
W. Shakespeare , As You Like It - 1603 A.D.
Jenkins is an automation tool commonly used for software deployment. Scripted pipelines use groovy threads to run parallel stages. To run parallel stages, an array of stage objects has to be created and passed as an argument to parallel
keyword.
Common pitfalls when running parallel stages will be discussed in this article.
The most common pitfall are Groovy closures. The usual approach to the uninitiated
is to create arrays using the for
loop. For example:
def environments = ["development", "staging", "production"]
stage("deploy to multiple environments") {
def deployments = [:]
for (e in environments) {
deployments[e] = {
stage(e) {
// do work
}
}
}
parallel deployments
}
The equivalent code in groovy would be:
def environments = ["development", "staging", "production"]
for (e in environments) {
Thread.start { println e }
}
Surprisingly, the above example produces the following output:
production
production
production
The variable e
is mutable and keeps mutating as the for
loop goes by, at runtime. The above array is small. For even slightly bigger arrays, results will vary based on the value the variable has at thread launch:
def environments = ["development", "staging", "production", "testing", "stress", "pre-production", "post-production", "qa"]
for (e in environments) {
Thread.start { println e }
}
$ groovy test.groovy
production
stress
production
testing
pre-production
qa
qa
post-production
To perform a complete array loop smoothly, groovy’s each
function can be used alongside closures. The example bellow maps each element to a new variable:
def environments = ["development", "staging", "production"]
environments.each { e -> Thread.start { println e } }
The above code yields the expected result:
staging
development
production
The same can be achieved through a for
loop, should we choose to define the thread variable in the loop before launching the thread:
def environments = ["development", "staging", "production", "testing", "stress", "pre-production", "post-production", "qa"]
for (e in environments) {
def b = e
Thread.start { println b }
}
The above code yields the expected result:
production
testing
stress
post-production
qa
staging
pre-production
development
Here you can see the complete array loop in scripted pipeline:
def environments = ["development", "staging", "production"]
stage("deploy to multiple environments") {
def deployments = [:]
environments.each { e ->
deployments[e] = {
stage(e) {
// do work
}
}
}
parallel deployments
}
Variables must be defined in all cases to keep inconsistencies at bay, Groovy’s website is explicit about variable definition:
It is mandatory for variable definitions to have a type or placeholder. If left out, the type name will be deemed to refer to an existing variable (presumably declared earlier).
So even when calling a function, the def
or var
must be used. Let’s take a look at the following example:
def branch(env) {
//
// The function will always return the same result
// when yield on a thread, if 'def' or 'var' is not used!
//
// b = env <- wrong!
def b = env // <- right!
switch(b) {
case "development":
return "development-branch"
break;
case "staging":
return "staging-branch"
break;
default:
return "master"
break;
}
}
def environments = ["development", "staging", "production"]
for (e in environments) {
def b = e
Thread.start {
// z = branch(b) // <- wrong
def z = branch(b) // <- right
printf("environment: ${b}\tbranch: ${z}\n")
}
}
Once z
has been defined, the original value will be used instead of being re-evaluated in all threads. This happens because Groovy closures are able to
access variables defined prior to execution outside the closure’s scope, a non-intuitive characteristic that leads to all sorts of issues!
The proper definition, def variable = <value>
must be used everywhere to avoid closure related issues described above.
Running stages in parallel speeds up the pipeline considerably. The anticipated trade-off is increased complexity. By default, Jenkins will run the parallel pipelines in a common workspace which might lead to race conditions.
To achieve isolation, especially multi-slave-nodes, isolating staging in slaves is advised:
def environments = ["development", "staging", "production"]
stage("deploy to multiple environments") {
def deployments = [:]
environments.each { e ->
deployments[e] = {
node('slave') { // run deployment(s) on 'slave' nodes, might the same or different.
stage(e) {
// println e
}
}
}
}
parallel deployments
}
Now Jenkins will run each stage on a different slave node. If the scheduler runs the job on the same node, the workspace will be different.
Most common workflows start by cloning source code. Although it’s possible run a clone stage on every node, is not efficient. One way to optimise the workflow, could be by cloning the source code in one stage, create an archive and move the archive in the respective nodes. Jenkins will handle moving archives around transparently:
stage("stash artifacts") {
stash name: 'artifacts', useDefaultExcludes: false
}
stage("deploy to multiple environments") {
def deployments = [:]
environments.each { e ->
deployments[e] = {
node('slave') { // run the deployment on a 'slave' node
// Check if unstash is needed
def exists = fileExists '.git'
if (!exists) {
unstash 'artifacts'
}
stage(e) {
// println e
}
}
}
}
parallel deployments
}
The useDefaultExcludes: false
argument is useful if .git
directory needs to be stashed as well. The second important bit is the following check:
def exists = fileExists '.git'
if (!exists) {
unstash 'artifacts'
}
Jenkins is going to use the same workspace for at least one of the parallel jobs and the unstash
command will fail, if the files exist in the current workspace. The check hints that there is no need to unstash
as the files are already there.