Skip to content

fix(toolkit-lib): hotswap deployments swap all changes since first template of session#1400

Merged
aws-cdk-automation merged 12 commits intomainfrom
hotswap-resource-state
Apr 24, 2026
Merged

fix(toolkit-lib): hotswap deployments swap all changes since first template of session#1400
aws-cdk-automation merged 12 commits intomainfrom
hotswap-resource-state

Conversation

@ShadowCat567
Copy link
Copy Markdown
Contributor

@ShadowCat567 ShadowCat567 commented Apr 20, 2026

Fixes #885

Previously when performing a hotswap deployment, in order to figure out what changed, the CLI would create a diff between the Cloudformation template from the current hotswap deployment and the deployed Cloudformation state.
This strategy had some issues which are outlined in issue #885 when multiple hotswap deployment happen in a row without a Cloudformation deployment between them.

In this PR, a hotswap cache is introduced which saves the Cloudformation template created by a hotswap deployment in a folder called .hotswap-cache in cdk.out. If another hotswap deployment is performed afterwards, it will perform a diff against the template that is in this folder rather than getting the currently deployed Cloudformation template. When a full Cloudformation deployment happens through cdk deploy or if non hotswappable changes are detected when running cdk hotswap --hotswap-fallback, then the template for that stack will be cleared from the cache.
Whenever a hotswap deployment happens and there is no template in the hotswap cache for the current stack, the template from Cloudformation will be diff-ed against to detect hotswappable changes.

Added some integ tests for these changes and enforced that the CCAPI tests which involve Bedrock Agents only run in regions that support Bedrock Agents.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team April 20, 2026 18:03
Copy link
Copy Markdown
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just based on reading the PR description, why not keep this cache in memory?

Is that to catch cdk deploy --hotswap cases (as opposed to cdk watch) ?

Copy link
Copy Markdown
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Is there a way in which we can write an integ test against this? And have you tested it manually?

* Invalidate the hotswap cache for a stack (e.g. after a full CloudFormation deploy).
*/
export async function invalidateHotswapTemplateCache(assemblyDir: string, stackName: string): Promise<void> {
await fs.remove(cachePath(assemblyDir, stackName));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ force: true } ?

Comment on lines +383 to +384
if (existingHotswapCache(this.stackArtifact.assembly.directory, this.stackArtifact.stackName)) {
await invalidateHotswapTemplateCache(this.stackArtifact.assembly.directory, this.stackArtifact.stackName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking this is a TOCTOU. It might be rare, but it's a pattern to be cognizant of.

Any type of code that looks like:

if (condition) {
  // 1️⃣
  performActionBasedOnCondition();
}

Is vulnerable to a race condition that causes condition to be false at 1️⃣.

  • For fully in-memory conditions:
    • In a single-threaded application (the common case) that can't happen, and because that is so common that lulls you into forgetting about this problem.
    • For multi-threaded applications, you now need to start paying attention to whether some other thread can concurrently invalidate 1️⃣. In JavaScript, which is single-threaded, that either also can't happen or it will be obvious if it can or cannot due to await statements.
  • If condition is based on an resource that's external to your application, like the filesystem or a web service or what have you -- this is quite relevant!

If you do:

if (fileExists) {
  readFile();
}

Then you are vulnerable to readFile exploding with a FileNotFound error, if a concurrent process happens to delete it between the existence check and the readFile!

A traditional vulnerability in UNIX systems is a vulnerability to a file being replaced with a symlink in between some access check and actually operating on the file, and presto, you now have performed an action on a file you didn't intend to!


Common fixes to this are:

  • In the condition check, return/operate on an unfakeable handle that cannot disappear or change meaning. For example, a file handle instead of a file path.
  • Don't do a predictSuccess + act, instead do an act + recover. Try to perform the action, and if the result is a failure try to determine after the fact whether that error was actually acceptable/expeted and in that case ignore it.

In this case, you can try the removal and catch + ignore any errors. Or you can use a built-in mechanism for that: rm(..., { force: true }).

@ShadowCat567
Copy link
Copy Markdown
Contributor Author

ShadowCat567 commented Apr 21, 2026

Love it!

Is there a way in which we can write an integ test against this? And have you tested it manually?

Yes I have tested manually (and caught several nested stack issues while testing), I'll see about adding an integ test.

@ShadowCat567
Copy link
Copy Markdown
Contributor Author

ShadowCat567 commented Apr 21, 2026

Just based on reading the PR description, why not keep this cache in memory?

Is that to catch cdk deploy --hotswap cases (as opposed to cdk watch) ?

Yes, this solution is more focused on cdk deploy --hotswap cases (have not tested with watch yet). This was the solution I had in mind since the RFC, so I did not consider caching in memory does it make sense for me to look into whether that would work for this?

Have tested with both cdk deploy --hotswap and cdk watch - this works with both which is what I wanted out of this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(hotswap): Hotswap always deploys all resources changed since last full deployment

4 participants