Hosting Automations with Build Agents
Why use many host when few host do trick?
Right now, there’s something wrong at your job. An obvious gap, an inefficiency, a lesser of evils. Often they arise as the lack of integration between different internal systems. The problem isn’t inherently hard to solve. It’s a couple dozen or a couple hundred lines of code really, but you have to deploy that code somewhere.
The high cost of doing things right
Your small automation, integration, or whatever isn’t where the majority of the work is. The 80/20 rule looms large in all things. Let’s say I have a really basic problem to solve, a monorepo with hundreds of devs and thousands of stale PRs. I want want to abandon them automatically if they’ve older than 30 days.
Now, there’s a good chance you’re thinking this is dumb problem. You’re right, but also wrong. You might have the luxury of installing oauth based tools to your heart’s content, but those of us working in big corporations often don’t. There are extreme access controls on your repositories. If anything can be accessed at all, it probably requires setting up a service account. This is more burdensome and might not be compatible with the existing tool you have in mind.
Or consider this one. You have a legacy system that creates a new pipeline for each branch, duplicating the pipeline from main. This is messy, but also a compliance burden. Over time you will update the pipeline, but all these stale ones are floating around getting flagged. Ideally then, unless it’s an approved branch we’d automatically delete any pipelines that haven’t been recently used.
In either hypothetical (by which I mean I’ve faced both of these issues in my job), all you need is just to make a basic REST API call or, better yet, run the product’s CLI. For instance, if these are in ADO:
Abandon old PRs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$prs = az repos pr list `
--org https://dev.azure.com/msazure `
--project One `
--repository Compute-Fabric-HostAgent `
--status active `
--top 2000 `
| ConvertFrom-Json -AsHashTable
$oldPrs = ($prs | Where-Object {
$_.creationDate -lt (Get-Date).AddDays(-30)
}).pullRequestId
$oldPrDetails = $oldPrs | ForEach-Object {
az repos pr show --id $_ | ConvertFrom-Json -AsHashTable
}
$stalePrs = ($oldPrDetails | Where-Object {
$cutOff = (Get-Date).AddDays(-30)
$commitTime = git show --no-patch --format=%ci $_.lastMergeSourceCommit.commitId
$commitTime = [datetime]::ParseExact($commitTime, "yyyy-MM-dd HH:mm:ss K", $null)
# Might be null, check both dates
($_.lastMergeCommit.author.date -lt $cutOff) -and ($commitTime -lt $cutOff)
})
$stalePrs | ForEach-Object { az repos pr update --id $_.pullRequestId --status abandoned }
Delete old generated Pipelines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$pipelines = az pipelines build definition list `
--org https://dev.azure.com/myorg `
--project MyProject `
--repository FooBar `
--query "[].{id:id, name:name}" `
| ConvertFrom-Json -AsHashTable `
| Where-Object { $_.name -eq "LegacyAgent-OfficialBuild" } `
| Where-Object { $_.id -ne 'id_of_primary_one_to_keep' }
$stalePipelines = $pipelines | ForEach-Object {
az pipelines runs list `
--org https://dev.azure.com/myorg `
--project MyProject `
--pipeline-ids $_.id `
--query-order FinishTimeDesc `
--top 1
} | Where-Object { $_.finishTime -lt (Get-Date).AddDays(-14) }
$stalePipelines | ForEach-Object {
az pipelines delete `
--org https://dev.azure.com/myorg `
--project MyProject `
--id $_.id
--yes
}
Deployment
These are super simple, we just have two core issues:
- We need a place to host the script and run it on a schedule.
- We need to authenticate with our instance.
These are no small hurdles. We need hosting infrastructure and the capacity to update it. It needs to be maintained and compliant. Security patches, corporate policy, authenticating to the correct users, authenticating to other internal systems that have the data you need. All of that… for this? Yikes.
Build agents as intended
Build agents have a pretty narrow intended scope. They provide compute resources, orchestration, and access to your internal systems. This inputs tend to be source code / packages, and the outputs are a mix of packages, logs, and API calls. This makes them a great solution for anything that is deployed based on the state of your repo.
One year for hackathon, my group wanted to solve a problem: We had no hosted rustdoc for our internal projects. This was annoying, because a hosted experience really is just better. More accessible, no time delays, and you don’t need to have a valid build environment set up on the machine, switch branches, etc.
We created two build automations for this issue:
- A repo, service targeting solution
- A package registry targeting solution
Both of these involved a secondary hosting arrangement, but that’s because we needed a website for people to connect to. At the end of the day though, all we needed was the easy 1 time setup of a PaaS hosting solution, and could use build agents to manage all updates. Set and forget.
Service Targeted
This was a straightforward, traditional use of a build pipeline set to trigger on updates to the main
branch. Clone the repo, install rust, run cargo doc --no-deps
, publish the result to an Azure Static Web App that was authenticated to only be accessible from our corporate accounts.
Registry Targeted
This was a more interesting application, which inspired this post and later automations. Here, the goal was a little more ambitious. We wanted a docs.rs like experience for all our internal libraries. This would need to be hosted as an Azure App Service, because we were going to dramatically exceed file limits.
Here, the build agent’s job was a little more complicated, set to run on an hourly schedule:
- Clone repo
- Use ADO Feed APIs to scan our internal cargo registry and create a manifest of all packages
- Compare this manifest to the one from our storage account that backed the docs website. Only continue if a delta exists
- Install rust & compile our docgen tool
- Run the docgen tool against the delta. The tool downloads missing crates, runs
cargo doc
, then stitches that output together into the bigger web structure + adds entries needed for our index / search page. - Publish the new files to the storage account so they’re accessible on the website.
If you look at the implementation of docs.rs, it’s way more involved than this. That makes sense. It has way more features and is operating at a way bigger scale. At the time, it was missing a few minor configuration features and its S3 dependency made a low effort hosting solution unavailable. So what did we do? Bare minimum scripting and tooling to create an 80% equivalent and then use the build agents to solve all the hosting for that job system. Ez.
Scripting sweet spot
When we outlined the feature set of build agents, it should have sounded familiar. Their capabilities overlap 1:1 with our requirements for deploying our little scripts. Originally, the challenge with solving these problems was that they exist an awkward middle ground. The problems aren’t so large so as to justify building a “real service”, but not so small that they can be (reasonably) managed manually with occasional script runs by an administrator.
This tool is already there right now. It’s easy to use, and the additional agent time cost you’re adding in miniscule. If you’re big enough to hit a lot of this type of problem, it’s probably even mostly free (as in you already have agent pools with some amount of idle time to recoup).
Moreover, it ties into your existing workflow perfectly. Your automations exist in your repo, just like the rest of your service and (hopefully) devops related scripts + configuration.
Build Agents to the rescue
Revisiting our scripts above then, the task before us is straightforward. We just need to wrap the operations in a yaml that defines a new pipeline / workflow and set a schedule. Run it hourly, daily whatever. There will be auth tokens already available to us to add to the command, or better yet even a wrapper task pre-built for your system. Continuing with the ADO example, the PR automation could look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
trigger: none
schedules:
- cron: '0 0 * * *'
displayName: Nightly old PR purge
branches:
include:
- main
always: true
pool: ubuntu-latest
steps:
- task: AzureCLI@2
inputs:
azureSubscription: 'Your existing dev-test sub here'
scriptType: 'pscore'
inlineScript: |
$prs = az repos pr list `
--org https://dev.azure.com/myorg `
--project MyProject `
--repository FooBar `
| ConvertFrom-Json -AsHashTable
$stalePrs = ($prs | Where-Object {
$_.creationDate -lt (Get-Date).AddDays(-30)
}).pullRequestId
$stalePrs | ForEach-Object { az repos pr update --id $_ --status abandoned }
Increasing complexity
There’s really no limit to what you can accomplish with this type of system. My most recent application was to solve a classic example of big corporate nonsense. Microsoft, like other big tech companies, has massive internal only engineering systems for building its biggest projects. These systems are bespoke, they expect you to onboard a very specific way using very specific tools.
As a part of Azure Boost, there’s a new system with 1st class support for rust. But it’s only meant for managing builds related to the hyper specific custom hardware embedded target that is the Azure Boost hardware. The existing systems for our existing software don’t support rust natively yet. We want to write our code once and have it work in both worlds. We also want all the benefits of using open source standard tooling, so anything we add on should be a compatibility layer for CI, it should never impact the local dev loop experience.
One issue we had to solve with 2 systems is that they didn’t have a way to invoke cargo nor to download crates (be it from our crates.io mirror or from our internal registries). The initial solution? Vendoring the crates. This was a nightmare. Millions of extra lines in the repo, massive PR diffs, lower security / audibility, and lots of false positive compliance tickets being created. A less bad, but still painful alternative was people could vendor, wrap that output in a nuget package, and upload that. This had size limit issues + manual steps.
Having grown tired of this, I used build agents for an ugly, yet wonderful workaround. The build agent runs every 10 minutes. It has a list of registries it’s responsible for mirroring. It uses ADO API calls to determine any deltas. If any are found, the crates are downloaded. The raw .crate
file and an extracted “pre-vendored” version are added to a nugget package by the name <source_feed>.<crate_name>
and version number matching that of the crate. These are uploaded to a new mirror feed that existing source feeds can take as an upstream.
Is it a bit derpy? You bet. Was it super easy to set up in spare time over a few days? Also yes. Does it work? Perfectly.
Now, in our other repos our bootstrap scripts assert if they’re in related CI builds. If so, they can source the crates from these nuget dependencies (which are added to those files automatically as git hooks based on rust lock files). Then either the .crate
files can be placed in the cargo home registry cache or a vendor folder can be generated symlinking against each pre-extracted crate.
I know this isn’t an example of a “good solution”. It will be killed off when first class support for rust arrives in the two internal ES systems we’re dependent on. But, at extremely low cost it has solved many problems, at scale, within Azure. There are hundreds of engineers no longer facing these day to day inefficiencies. Millions of lines of sketchy, impossible to review diffs have been cut out of repositories. And it was only feasible because build agents were available as a low development and ongoing cost way to automatically create the necessary mirrors for all crates used within the organization.