DevOps

How to Retry Failed Steps in GitHub Action Workflows

Published Nov 29, 2022

Updated Feb 13, 2023

3 min read

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Sometimes things can go wrong in your GitHub Action workflow step(s), and you may want to retry them. In this article, we'll cover two methods for doing this!

Pre-requisites

Git This should be installed in your path.
GitHub account: We'll need this to use GitHub Actions.

Initial setup

In order to follow along, here are the steps you can take to setup your GitHub Actions workflow:

Initialize your git repository

In your terminal, run git init to create an empty git repository or skip this step if you already have an existing git repository.

Create a workflow file

GitHub workflow files are usually .yaml/.yml files that contain a series of jobs and steps to be executed by GitHub Actions. These files often reside in .github/workflows. If the directories do not exist, go ahead and create them. Create a file retry.yml in .github/workflows. For now, the file can contain the following:

name: "Retry action using retry step"
on:
	# This action is called when this is pushed to github
	push:
	# This action can be manually triggered
	workflow_call:
jobs:
    # This name is up to you
	retry-job:
		runs-on: "ubuntu-latest"
		name: My Job
		steps:
			- name: Checkout repository
			  uses: actions/checkout@v3

			- name: Print
			  run: |
				echo 'Hello'

Testing your workflow

You can test your GitHub Action workflow by pushing your changes to GitHub and going to the actions tab of the repository. You can also choose to test locally using Act.

Retrying failed steps

Approach 1: Using the retry-step action

By using the retry-step action, we can retry any failed shell commands. If our step or series of steps are shell commands, we can use the retry-step action to retry them.

If, however, you'd like to try retry a step that is using another action, then the retry-step action will NOT work for you. In that case, you may want to try the alternative steps mentioned below.

Modify your action file to contain the following:

name: "Retry action using retry step"
on:
	# This action is called when this is pushed to github
	push:
	# This action can be manually triggered
	workflow_call:
jobs:
    # This name is up to you
	retry-job:
		runs-on: "ubuntu-latest"
		name: My Job
		steps:
			- name: Checkout repository
			  uses: actions/checkout@v3

			- name: Use the reusable workflow
			  # Use the retry action
			  uses: nick-fields/retry@v2
			  with:
				max_attempts: 3
				retry_on: error
				timeout_seconds: 5
				# You can specify the shell commands you want to retry here
				command: |
					echo 'some command that would potentially fail'

Approach 2: Duplicate steps

If you are trying to retry steps that use other actions, the retry-step action may not get the job done. In this case, you can still retry steps by retrying steps conditionally, depending on whether or not a step failed.

GitHub provides us with two main additional attributes in our steps:

continue-on-error - Setting this to true means that the even if the current step fails, the job will continue on to the next one (by default failure stops a job's running).
steps.{id}.outcome - where {id} is an id you add to the steps you want to retry. This can be used to tell whether a step failed or not, potential values include 'failure' and 'success'.
if - allows us to conditionally run a step

name: "Retry action using retry step"
on:
	# This action is called when this is pushed to GitHub
	push:
	# This action can be manually triggered
	workflow_call:
jobs:
    # This name is up to you
	retry-job:
		runs-on: "ubuntu-latest"
		name: My Job
		steps:
			- name: Checkout repository
			  uses: actions/checkout@v3

			- name: Some action that can fail
			  # You need to specify an id to be able to tell what the status of this action was
			  id: myStepId1
			  # This needs to be true to proceed to the next step of failure
			  continue-on-error: true
			  uses: actions/someaction

			# Duplicate of the step that might fail ~ manual retry
			- name: Some action that can fail
			  id: myStepId2
			  # Only run this step if step 1 fails. It knows that step one failed because we specified an `id` for the first step
			  if: steps.myStepId1.outcome == 'failure'
			  # This needs to be true to proceed to the next step of failure
			  continue-on-error: true
			  uses: actions/someaction

Bonus: Retrying multiple steps

If you want to retry multiple steps at once, then you can use composite actions to group the steps you want to retry, and then use the duplicate steps approach mentioned above.

Conclusion

How do you decide which approach to use?

If you are retrying a step that is only shell commands, then you can use the retry step action.
If you are retrying a step that needs to use another action, then you can use duplication of steps with conditional running to manually retry the steps.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

About the author(s)

Allan M. Jeremy
Allan Jeremy is a self-proclaimed polymath, entrepreneur and Senior Software Engineer at ThisDot Labs. He is an AI and crypto enthusiast that loves to learn most things tech and business as well as sharing what he learns.
@theallanjeremy @AllanJeremy

Tag and Release Your Project with GitHub Actions Workflows

Tag and Release your project with GitHub Actions Workflows GitHub Actions is a powerful automation tool that enables developers to automate various workflows in their repositories. One common use case is to automate the process of tagging and releasing new versions of a project. This ensures that your project's releases are properly versioned, documented, and published in a streamlined manner. In this blog post, we will walk you through two GitHub Actions workflows that can help you achieve this. Understanding GitHub Tags and Releases GitHub tags and releases are essential features that help manage and communicate the progress and milestones of a project. Let's take a closer look at what they are, why they are useful, and how they can be used effectively. GitHub Tags A GitHub tag is a specific reference point in a repository's history that marks a significant point of development, such as a release or a specific commit. Tags are typically used to identify specific versions of a project. They are lightweight and do not contain any additional metadata by default. Tags are useful for several reasons: 1. Versioning: Tags allow you to assign meaningful version numbers to your project, making it easier to track and reference specific releases. 2. Stability: By tagging stable versions of your project, you can provide users with a reliable and tested codebase. 3. Collaboration: Tags enable contributors to work on specific versions of the project, ensuring that everyone is on the same page. GitHub Releases GitHub releases are a way to package and distribute specific versions of your project to users. A release typically includes the source code, compiled binaries, documentation, and release notes. Releases provide a convenient way for users to access and download specific versions of your project. Releases offer several benefits: 1. Communication: Releases allow you to communicate important information about the changes, improvements, and bug fixes included in a specific version. 2. Distribution: By packaging your project into a release, you make it easier for users to download and use your software. 3. Documentation: Including release notes in a release helps users understand the changes made in each version and any potential compatibility issues. Effective Use of Tags and Releases To make the most of GitHub tags and releases, consider the following tips: 1. Semantic Versioning: Follow a consistent versioning scheme, such as semantic versioning (e.g., MAJOR.MINOR.PATCH), to clearly communicate the nature of changes in each release. 2. Release Notes: Provide detailed and concise release notes that highlight the key changes, bug fixes, and new features introduced in each version. This helps users understand the impact of the changes and make informed decisions. 3. Release Automation: Automate the release process using workflows, like the one described in this blog post, to streamline the creation of tags and releases. This saves time and reduces the chances of human error. By leveraging GitHub tags and releases effectively, you can enhance collaboration, improve communication, and provide a better experience for users of your project. The Goal The idea is to have a GitHub action that, once triggered, updates our project's version, creates a new tag for our repository, and pushes the updates to the main branch. Unfortunately, the main branch is a protected branch, and it's not possible to directly push changes to a protected branch through a GitHub action. Therefore, we need to go through a pull request on the main branch, which, once merged, will apply the changes due to the version update to the main branch. We had to split the workflow into two different GitHub actions: one that creates a pull request towards the main branch with the necessary code changes to update the repository's version, and another one that creates a new tag and releases the updated main branch. This way, we have one additional click to perform (the one required to merge the PR), but we also have an intermediate step where we can verify that the version update has been carried out correctly. Let’s dive into these two workflows. Update version and create Release's PR Workflow ` Walkthrough: Step 1: Define the Workflow The workflow starts with specifying the workflow name and the event that triggers it using the on keyword. In this case, the workflow is triggered manually using the "workflow_dispatch" event, which means it can be run on-demand by a user. Additionally, the workflow accepts an input parameter called "version," which allows the user to specify the type of version bump (major, minor, or patch). The workflow_dispatch event allows the user to set the "version" input when running the workflow. Step 2: Prepare the Environment The workflow will run on an Ubuntu environment (ubuntu-latest) using a series of steps under the jobs section. The first job is named "version." Step 3: Checkout the Code The workflow starts by checking out the code of the repository using the actions/checkout@v3 action. This step ensures that the workflow has access to the latest codebase before making any modifications. Step 4: Set up Node.js Next, the workflow sets up the Node.js environment using the actions/setup-node@v3 action and specifying the Node.js version 16.x. It's essential to use the appropriate Node.js version required by your project to avoid compatibility issues. Step 5: Install Dependencies To ensure the project's dependencies are up-to-date, the workflow runs npm install to install the necessary packages as defined in the package.json file. Step 6: Configure Git To perform version bump and create a pull request, the workflow configures Git with a user name and email. This allows Git to identify the author when making changes in the repository. Step 7: Update the Version The workflow now performs the actual version bump using the npm version command. The new version is determined based on the "version" input provided when running the workflow. The updated version number is stored in an output variable named update_version, which can be referenced later in the workflow. Step 8: Update the Changelog After bumping the version, the workflow updates the CHANGELOG.md file to reflect the new release version. It replaces the placeholder "Unreleased" with the updated version using the sed command. [*We will return to this step later*] Step 9: Create a Pull Request Finally, the workflow creates a pull request using the peter-evans/create-pull-request@v5 action. This action automatically creates a pull request with the changes made in the workflow. The pull request will have a branch name following the pattern "release/", where corresponds to the updated version number. The outcome of this workflow will be a new open PR in the project with package.json and CHANGELOG.md file changed. [*we will speak about the changelog file later*] Now we can check if the changes are good, approve the PR and merge it into main. Merge a PR with a title that starts with "Release:" automatically triggers the second workflow Tag & Release Workflow ` Walkthrough: As you can see we added a check for the PR title before starting the job once the PR is merged and closed. Only the PRs with a title that starts with "Release:" will trigger the workflow. The first three steps are the same as the one described in the previous workflow: we check out the code from the repository, we set up node and we install dependencies. Let's start with: Step 4: Check formatting To maintain code quality, we run the npm run format:check command to check if the code adheres to the specified formatting rules. This step helps catch any formatting issues before proceeding further. Step 5: Build The npm run build command is executed in this step to build the project. This step is particularly useful for projects that require compilation or bundling before deployment. Step 6: Set up Git To perform Git operations, such as tagging and pushing changes, we need to configure the Git user's name and email. This step ensures that the correct user information is associated with the Git actions performed later in the workflow. Step 7: Get tag In this step, we retrieve the current version of the project from the package.json file. The version is then stored in an output variable called get_tag.outputs.version for later use. Step 8: Tag the commit Using the version obtained in the previous step, we create a Git tag for the commit. The tag is annotated with a message indicating the version number. Finally, we push the tag and associated changes to the repository. Step 9: Create changelog diff To generate release notes, we extract the relevant changelog entries from the CHANGELOG.md file. This step helps summarize the changes made since the previous release. (We will return to this step later) Step 10: Create release Using the actions/create-release action, we create a new release on GitHub. The release is associated with the tag created in the previous step, and the release notes are provided in the body of the release. Step 11: Delete release_notes file Finally, we delete the temporary release_notes.md file created in Step 9. This step helps keep the repository clean and organized. Once also the second workflow is finished our project is tagged and the new release has been created. The "Changelog Steps" As you can see the release notes are automatically filled, with a detailed description of what has been added, fixed, or updated in the project. This was made possible thanks to the "Changelog steps" in our workflows, but to use them correctly, we need to pay attention to a couple of things while developing our project. Firstly, to the format of the CHANGELOG.md file. This will be our generic template: But the most important aspect, in addition to keeping the file up to date during developments by adding the news or improvements we are making to the code under their respective sections, is that every time we start working on a new project release, we begin the paragraph with ## [Unreleased]. This is because, in the first workflow, the step related to the changelog will replace the word "Unreleased" with the newly created project version. In the subsequent workflow, we will create a temporary file (which will then be deleted in the latest step of the workflow), where we will extract the part of the changelog file related to the new version and populate the release notes with it. Conclusion Following these Tag and Release Workflows, you can automate the process of creating releases for your GitHub projects. This workflow saves time, ensures consistency, and improves collaboration among team members. Remember to customize the workflow to fit your project's specific requirements and enjoy the benefits of streamlined release management....

Aug 11, 2023

9 mins

GitHubGitHub Actions

GitHub Actions for Serverless Framework Deployments

Background Our team was building a Serverless Framework API for a client that wanted to use the Serverless Dashboard) for deployment and monitoring. Based on some challenges from last year, we agreed with the client that using a monorepo tool like Nx) would be beneficial moving forward as we were potentially shipping multiple Serverless APIs and frontend applications. Unfortunately, we discovered several challenges integrating with the Serverless Dashboard, and eventually opted into custom CI/CD with GitHub Actions. We’ll cover the challenges we faced, and the solution we created to mitigate our problems and generate a solution. Serverless Configuration Restrictions By default, the Serverless Framework does all its configuration via a serverless.yml file. However, the framework officially supports alternative formats) including .json, .js, and .ts. Our team opted into the TypeScript format as we wanted to setup some validation for our engineers that were newer to the framework through type checks. When we eventually went to configure our CI/CD via the Serverless Dashboard UI, the dashboard itself restricted the file format to just the YAML format. This was unfortunate, but we were able to quickly revert back to YAML as configuration was relatively simple, and we were able to bypass this hurdle. Prohibitive Project Structures With our configuration now working, we were able to select the project, and launch our first attempt at deploying the app through the dashboard. Immediately, we ran into a build issue: ` What we found was having our package.json in a parent directory of our serverless app prevented the dashboard CI/CD from being able to appropriately detect and resolve dependencies prior to deployment. We had been deploying using an Nx command: npx nx run api:deploy --stage=dev which was able to resolve our dependency tree which looked like: To resolve, we thought maybe we could customize the build commands utilized by the dashboard. Unfortunately, the only way to customize these commands is via the package.json of our project. Nx allows for package.json per app in their structure, but it defeated the purpose of us opting into Nx and made leveraging the tool nearly obsolete. Moving to GitHub Actions with the Serverless Dashboard We thought to move all of our CI/CD to GitHub Actions while still proxying the dashboard for deployment credentials and monitoring. In the dashboard docs), we found that you could set a SERVERLESS_ACCESS_KEY and still deploy through the dashboard. It took us a few attempts to understand exactly how to specify this key in our action code, but eventually, we discovered that it had to be set explicitly in the .env file due to the usage of the Nx build system to deploy. Thus the following actions were born: api-ci.yml ` api-clean.yml ` These actions ran smoothly and allowed us to leverage the dashboard appropriately. All in all this seemed like a success. Local Development Problems The above is a great solution if your team is willing to pay for everyone to have a seat on the dashboard. Unfortunately, our client wanted to avoid the cost of additional seats because the pricing was too high. Why is this a problem? Our configuration looks similar to this (I’ve highlighted the important lines with a comment): serverless.ts ` The app and org variables make it so it is required to have a valid dashboard login. This meant our developers working on the API problems couldn’t do local development because the client was not paying for the dashboard logins. They would get the following error: Resulting Configuration At this point, we had to opt to bypass the dashboard entirely via CI/CD. We had to make the following changes to our actions and configuration to get everything 100% working: serverless.ts - Remove app and org fields - Remove accessing environment secrets via the param option ` api-ci.yml - Add all our secrets to GitHub and include them in the scripts - Add serverless confg ` api-cleanup.yml - Add serverless config - Remove secrets ` Conclusions The Serverless Dashboard is a great product for monitoring and seamless deployment in simple applications, but it still has a ways to go to support different architectures and setups while being scalable for teams. I hope to see them make the following changes: - Add support for different configuration file types - Add better support custom deployment commands - Update the framework to not fail on login so local development works regardless of dashboard credentials The Nx + GitHub actions setup was a bit unnatural as well with the reliance on the .env file existing, so we hope the above action code will help someone in the future. That being said, we’ve been working with this on the team and it’s been a very seamless and positive change as our developers can quickly reference their deploys and know how to interact with Lambda directly for debugging issues already....

Sep 26, 2022

5 mins

DevOpsServerless FrameworkNxGitHub Actions

Avoiding Burnout for Remote Teams: A Software Engineer's Guide

Pull up a chair, my fellow coders, team leads, and everyone working from a desk in their pajamas. Let's talk about something that's been buzzing around like an annoying fly we've been trying to swat: burnout. Yeah, we all know what I'm talking about. The long hours, the lack of sunlight (my plant is getting a better tan than me - just kidding, I don't have a plant), the never-ending to-do list, and the work-life balance hanging by a thread. If you're nodding along, then you're in the right place. In this piece, we'll navigate the maze of remote work and uncover ways to keep that nasty burnout at bay. And I promise there won't be any code debugging here, just some light-hearted yet meaningful advice coming your way. Ready to dive in? Awesome, let's get started! Setting Clear Boundaries Working from home has its perks. No commute, comfortable attire, and flexible hours. But let's get real. The downside is that work can become a 24/7 gig if you aren't careful. My living room turned office, turned dining room, makes me feel like I'm always on duty. So how do we fight this? We do what we do best. We set some boundaries. Defining a workspace is essential, even if it's just a corner of the room. This physical boundary tells your brain "I'm in work mode now." Trust me; your brain will thank you for it. Next comes the schedule. I'm not talking about planning every minute of your day, but having a structured work schedule is crucial. Have a defined start time, breaks, and, most importantly, a shutdown time. And let me tell you, this shutdown time is non-negotiable. Like the last slice of pizza at a party, you don't touch it, and I've got to admit, I struggle with it, but I am working on it. Asynchronous Communication With a team spread across the globe, synchronicity is a luxury we can't afford. We've got people working from their night to match our day, and that's just not fair. What can we do about this? Embrace asynchronous communication. Let's do away with the pressure of immediate responses. People can respond in their time, respecting their work hours. And let's be honest, most of our communication doesn't need instant answers. Time differences are not villains but part of our remote work reality. Results Over Hours A common misconception about remote work is that "the more hours I work, the more productive I am." Well, that's as far from the truth as I am from my next vacation. The focus should be on results, not hours clocked in. Set realistic goals and trust your team to manage their time effectively. This trust is essential for a remote team. After all, we don't have someone peeping over our shoulder, making sure we're working. Or at least I hope not! Mental Health Support Mental health: the elephant in the room. Why do we tip-toe around it? Stress, anxiety, and burnout are real, and they're here. It's high time we address them. Resources like Employee Assistance Programs, mindfulness apps, and virtual fitness classes are excellent support systems. But they're not magic potions. They need regular utilization, and we need to make our team comfortable with seeking help. Let's make it our strength, not a weakness. Regular Breaks Do you know what's the quickest way to burnout? Working without breaks. I know we've all been guilty of it at some point. But let's change that. Taking breaks is not a luxury; it's a necessity. Short walks, quick exercises, or just stepping away from the screen can do wonders. I even tried the Pomodoro technique, and it's a game-changer. Another thing you could do is schedule lunch breaks in your calendar so your colleagues know when not to try to reach you. You can additionally set your focus times in your calendar so that you can maximize your time in flow state without interruptions. Prioritize Effective Communication Communication. It can make or break a remote team. Without physical cues, messages can be easily misinterpreted. Open, transparent, and empathetic communication is the solution. Regular check-ins and feedback sessions also help keep things running smoothly. After all, we're a team, and teams need to talk! Training and Development Boredom is a silent killer in remote work. And the best defense is learning. Offering training and development opportunities enhances skills and breaks the monotony. Who would want to learn something other than a new language or skill? Plus, it aligns with our long-term career goals. It's a win-win situation. Time Off I can't stress this enough. Time off is essential! We need to recharge, relax, and rejuvenate. Encourage your team to disconnect during their time off fully. Trust me; the world will only end if we check our emails for a few days. I tried it, and I'm still here! Empathy and Flexibility Last but not least, empathy and flexibility. Everyone's situation is different. Let's show understanding for those juggling childcare, living in different time zones, or dealing with personal issues. Let's be leaders who are empathetic and flexible. Conclusion In the world of remote work, prevention is better than cure. And the prevention of burnout comes with boundary setting, asynchronous communication, focus on results, mental health support, regular breaks, effective communication, continuous learning, time off, and empathy. With these in place, we can navigate the remote work culture while keeping our sanity intact. So let's dive in, shall we?...

Jul 31, 2023

4 mins

Software Engineering

The simplicity of deploying an MCP server on Vercel

The current Model Context Protocol (MCP) spec is shifting developers toward lightweight, stateless servers that serve as tool providers for LLM agents. These MCP servers communicate over HTTP, with OAuth handled clientside. Vercel’s infrastructure makes it easy to iterate quickly and ship agentic AI tools without overhead. Example of Lightweight MCP Server Design At This Dot Labs, we built an MCP server that leverages the DocuSign Navigator API. The tools, like `get_agreements`, make a request to the DocuSign API to fetch data and then respond in an LLM-friendly way. ` Before the MCP can request anything, it needs to guide the client on how to kick off OAuth. This involves providing some MCP spec metadata API endpoints that include necessary information about where to obtain authorization tokens and what resources it can access. By understanding these details, the client can seamlessly initiate the OAuth process, ensuring secure and efficient data access. The Oauth flow begins when the user's LLM client makes a request without a valid auth token. In this case they’ll get a 401 response from our server with a WWW-Authenticate header, and then the client will leverage the metadata we exposed to discover the authorization server. Next, the OAuth flow kicks off directly with Docusign as directed by the metadata. Once the client has the token, it passes it in the Authorization header for tool requests to the API. ` This minimal set of API routes enables me to fetch Docusign Navigator data using natural language in my agent chat interface. Deployment Options I deployed this MCP server two different ways: as a Fastify backend and then by Vercel functions. Seeing how simple my Fastify MCP server was, and not really having a plan for deployment yet, I was eager to rewrite it for Vercel. The case for Vercel: * My own familiarity with Next.js API deployment * Fit for architecture * The extremely simple deployment process * Deploy previews (the eternal Vercel customer conversion feature, IMO) Previews of unfamiliar territory Did you know that the MCP spec doesn’t “just work” for use as ChatGPT tooling? Neither did I, and I had to experiment to prove out requirements that I was unfamiliar with. Part of moving fast for me was just deploying Vercel previews right out of the CLI so I could test my API as a Connector in ChatGPT. This was a great workflow for me, and invaluable for the team in code review. Stuff I’m Not Worried About Vercel’s mcp-handler package made setup effortless by abstracting away some of the complexity of implementing the MCP server. It gives you a drop-in way to define tools, setup https-streaming, and handle Oauth. By building on Vercel’s ecosystem, I can focus entirely on shipping my product without worrying about deployment, scaling, or server management. Everything just works. ` A Brief Case for MCP on Next.js Building an API without Next.js on Vercel is straightforward. Though, I’d be happy deploying this as a Next.js app, with the frontend features serving as the documentation, or the tools being a part of your website's agentic capabilities. Overall, this lowers the barrier to building any MCP you want for yourself, and I think that’s cool. Conclusion I'll avoid quoting Vercel documentation in this post. AI tooling is a critical component of this natural language UI, and we just want to ship. I declare Vercel is excellent for stateless MCP servers served over http....

Aug 13, 2025

3 mins

VercelMCP

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

How to Retry Failed Steps in GitHub Action Workflows

Pre-requisites

Initial setup

Initialize your git repository

Create a workflow file

Testing your workflow

Retrying failed steps

Approach 1: Using the retry-step action

Approach 2: Duplicate steps

Bonus: Retrying multiple steps

Conclusion

Allan M. Jeremy

You might also like

Tag and Release Your Project with GitHub Actions Workflows

GitHub Actions for Serverless Framework Deployments

Avoiding Burnout for Remote Teams: A Software Engineer's Guide

The simplicity of deploying an MCP server on Vercel

Let's innovate together!

You might also like

Tag and Release Your Project with GitHub Actions Workflows

GitHub Actions for Serverless Framework Deployments

Avoiding Burnout for Remote Teams: A Software Engineer's Guide

The simplicity of deploying an MCP server on Vercel