Skip to content

Layers -15 Refactors validation logic to improve code quality and provide better error message#1847

Open
bhanvimenghani wants to merge 12 commits intokruize:mvp_demofrom
bhanvimenghani:fix_layer_optional_fields_null
Open

Layers -15 Refactors validation logic to improve code quality and provide better error message#1847
bhanvimenghani wants to merge 12 commits intokruize:mvp_demofrom
bhanvimenghani:fix_layer_optional_fields_null

Conversation

@bhanvimenghani
Copy link
Copy Markdown
Contributor

@bhanvimenghani bhanvimenghani commented Mar 12, 2026

Description

  • Fixed optional field handling: details field now correctly handles null and missing values using optString()
  • Added getRequiredString() helper method: DRY validation for required string fields with consistent error messages
    • Validates field existence, null checks, and empty string validation
    • Provides clear, specific error messages

Fixes # (issue)

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

  • New Test X
  • Functional testsuite

Test Configuration

  • Kubernetes clusters tested on:

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Add full CRUD support and stricter validation for layer management APIs while expanding test coverage and simplifying test cleanup.

New Features:

  • Introduce updateLayer and deleteLayer REST endpoints for managing existing layers.
  • Return structured JSON responses with status and httpcode for both successful and error layer API calls.
  • Validate create and update layer requests for required fields, value types, and array contents, including tunables, queries, and labels.

Bug Fixes:

  • Ensure optional fields like details are handled correctly when missing or null in layer creation.
  • Return appropriate 4xx errors for invalid or empty layer JSON bodies and invalid query parameters.
  • Align validation and error messages for tunable bounds, steps, and configuration to remove redundant prefixes and mismatches with tests.

Enhancements:

  • Refine layer creation logic with shared helpers for required string fields and richer validation feedback.
  • Track database update and delete operations for layers with new metrics timers.
  • Refactor internal success and error response handling into a common JSON response helper for consistency across layer APIs.
  • Update tests to use the layer delete API and cleanup helpers instead of direct DB manipulation, improving isolation and readability.

Tests:

  • Expand createLayer negative tests to cover invalid API structure, query, label, and tunable configurations, including new error messages.
  • Add tests for optional field handling in createLayer requests.
  • Add listLayers tests for performance, case sensitivity, sorting behavior, and additional special-character edge cases in layer names.
  • Adjust existing create/list layer tests to use API-based cleanup and validate new error/response semantics.

@bhanvimenghani bhanvimenghani self-assigned this Mar 12, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 12, 2026

Reviewer's Guide

Adds full CRUD support and significantly enhances validation and error handling for the Layer API, updates DB/metrics plumbing, and refactors tests to exercise new behaviors and messages.

Sequence diagram for updated createLayer API flow

sequenceDiagram
    actor Client
    participant LayerService
    participant Converters
    participant LayerValidation
    participant ExperimentDAOImpl
    participant MetricsConfig

    Client->>LayerService: HTTP POST /createLayer (JSON body)
    LayerService->>LayerService: read request body
    alt Empty body
        LayerService->>LayerService: sendErrorResponse(..., INVALID_LAYER_JSON)
        LayerService->>Client: JSON error response 400
    else Non empty body
        LayerService->>Converters: convertInputJSONToCreateLayer(inputData)
        alt Invalid JSON or missing fields
            Converters-->>LayerService: throws IllegalArgumentException or JSONException
            LayerService->>LayerService: catch IllegalArgumentException
            LayerService->>LayerService: sendErrorResponse(..., 400, validation message)
            LayerService->>Client: JSON error response 400
        else Valid KruizeLayer
            Converters-->>LayerService: KruizeLayer
            LayerService->>LayerValidation: validate(kruizeLayer)
            LayerValidation-->>LayerService: ValidationOutputData
            alt Validation fails
                LayerService->>LayerService: sendErrorResponse(..., 400, VALIDATION_FAILED)
                LayerService->>Client: JSON error response 400
            else Validation succeeds
                LayerService->>ExperimentDAOImpl: addLayerToDB(layerEntry)
                ExperimentDAOImpl->>MetricsConfig: start timerAddLayerDB
                ExperimentDAOImpl-->>LayerService: ValidationOutputData
                alt DB add succeeds
                    LayerService->>LayerService: sendSuccessResponse(message, 201)
                    LayerService->>Client: JSON success response 201
                else DB add fails
                    LayerService->>LayerService: sendErrorResponse(..., 500, ADD_LAYER_TO_DB_FAILURE)
                    LayerService->>Client: JSON error response 500
                end
            end
        end
    end

    note over LayerService: sendSuccessResponse/sendErrorResponse now delegate to sendJsonResponse for uniform JSON structure
Loading

Sequence diagram for new updateLayer API flow

sequenceDiagram
    actor Client
    participant LayerService
    participant Converters
    participant LayerValidation
    participant ExperimentDAOImpl
    participant MetricsConfig

    Client->>LayerService: HTTP PUT /updateLayer?name={layerName}
    LayerService->>LayerService: getParameter(LAYER_NAME)
    alt Missing or empty name param
        LayerService->>LayerService: sendErrorResponse(..., INVALID_LAYER_NAME)
        LayerService->>Client: JSON error response 400
    else Has name param
        LayerService->>LayerService: validate query params against UPDATE_LAYERS_QUERY_PARAMS_SUPPORTED
        alt Unsupported query params
            LayerService->>LayerService: sendErrorResponse(..., INVALID_QUERY_PARAMS)
            LayerService->>Client: JSON error response 400
        else Query params valid
            LayerService->>LayerService: read request body
            alt Empty body
                LayerService->>LayerService: sendErrorResponse(..., INVALID_LAYER_JSON)
                LayerService->>Client: JSON error response 400
            else Non empty body
                LayerService->>Converters: convertInputJSONToCreateLayer(inputData)
                alt Converter throws IllegalArgumentException
                    Converters-->>LayerService: exception
                    LayerService->>LayerService: sendErrorResponse(..., 400, VALIDATION_FAILED)
                    LayerService->>Client: JSON error response 400
                else Valid KruizeLayer
                    Converters-->>LayerService: KruizeLayer
                    LayerService->>LayerService: compare URL name vs kruizeLayer.getLayerName()
                    alt Names mismatch
                        LayerService->>LayerService: sendErrorResponse(..., LAYER_NAME_MISMATCH)
                        LayerService->>Client: JSON error response 400
                    else Names match
                        LayerService->>LayerValidation: validate(kruizeLayer)
                        LayerValidation-->>LayerService: ValidationOutputData
                        alt Validation fails
                            LayerService->>LayerService: sendErrorResponse(..., VALIDATION_FAILED)
                            LayerService->>Client: JSON error response 400
                        else Validation succeeds
                            LayerService->>ExperimentDAOImpl: loadLayerByName(layerName)
                            ExperimentDAOImpl-->>LayerService: List<KruizeLMLayerEntry>
                            alt Layer not found
                                LayerService->>LayerService: sendErrorResponse(..., LAYER_NOT_FOUND)
                                LayerService->>Client: JSON error response 404
                            else Layer exists
                                LayerService->>LayerService: convertToLayerEntry(kruizeLayer)
                                LayerService->>ExperimentDAOImpl: updateLayerToDB(layerEntry)
                                ExperimentDAOImpl->>MetricsConfig: start timerUpdateLayerDB
                                ExperimentDAOImpl-->>LayerService: ValidationOutputData
                                alt Update succeeds
                                    LayerService->>LayerService: sendSuccessResponse(message, 200)
                                    LayerService->>Client: JSON success response 200
                                else Update fails
                                    LayerService->>LayerService: sendErrorResponse(..., 500, UPDATE_LAYER_TO_DB_FAILURE)
                                    LayerService->>Client: JSON error response 500
                                end
                            end
                        end
                    end
                end
            end
        end
    end
Loading

Sequence diagram for new deleteLayer API flow

sequenceDiagram
    actor Client
    participant LayerService
    participant ExperimentDAOImpl
    participant MetricsConfig

    Client->>LayerService: HTTP DELETE /deleteLayer?name={layerName}
    LayerService->>LayerService: getParameter(LAYER_NAME)
    alt Missing or empty name param
        LayerService->>LayerService: sendErrorResponse(..., INVALID_LAYER_NAME)
        LayerService->>Client: JSON error response 400
    else Has name param
        LayerService->>LayerService: validate query params against DELETE_LAYERS_QUERY_PARAMS_SUPPORTED
        alt Unsupported query params
            LayerService->>LayerService: sendErrorResponse(..., INVALID_QUERY_PARAMS)
            LayerService->>Client: JSON error response 400
        else Query params valid
            LayerService->>ExperimentDAOImpl: loadLayerByName(layerName)
            ExperimentDAOImpl-->>LayerService: List<KruizeLMLayerEntry>
            alt Layer not found
                LayerService->>LayerService: sendErrorResponse(..., DELETE_LAYER_ENTRY_NOT_FOUND_WITH_NAME)
                LayerService->>Client: JSON error response 404
            else Layer exists
                LayerService->>ExperimentDAOImpl: deleteLayerByName(layerName)
                ExperimentDAOImpl->>MetricsConfig: start timerDeleteLayerDB
                ExperimentDAOImpl-->>LayerService: ValidationOutputData
                alt Delete succeeds
                    LayerService->>LayerService: sendSuccessResponse(message, 200)
                    LayerService->>Client: JSON success response 200
                else Delete fails
                    LayerService->>LayerService: sendErrorResponse(..., 500, message)
                    LayerService->>Client: JSON error response 500
                end
            end
        end
    end
Loading

Class diagram for updated Layer API, converters, DAO, and metrics

classDiagram
    class LayerService {
        +doPost(HttpServletRequest request, HttpServletResponse response) void
        +doGet(HttpServletRequest request, HttpServletResponse response) void
        +doPut(HttpServletRequest request, HttpServletResponse response) void
        +doDelete(HttpServletRequest request, HttpServletResponse response) void
        -convertToLayerEntry(KruizeLayer kruizeLayer) KruizeLMLayerEntry
        -sendSuccessResponse(HttpServletResponse response, String message, int httpStatusCode) void
        -sendErrorResponse(HttpServletResponse response, Exception e, int httpStatusCode, String errorMsg) void
        -sendJsonResponse(HttpServletResponse response, String message, int statusCode, String status) void
    }

    class Converters {
        <<utility>>
        -getRequiredString(JSONObject json, String fieldName, String objectType, String fieldDescription) String
        +convertInputJSONToCreateLayer(String inputData) KruizeLayer
    }

    class ExperimentDAO {
        <<interface>>
        +addLayerToDB(KruizeLMLayerEntry kruizeLayerEntry) ValidationOutputData
        +updateLayerToDB(KruizeLMLayerEntry kruizeLayerEntry) ValidationOutputData
        +deleteLayerByName(String layerName) ValidationOutputData
        +loadAllLayers() List~KruizeLMLayerEntry~
        +loadLayerByName(String layerName) List~KruizeLMLayerEntry~
    }

    class ExperimentDAOImpl {
        +addLayerToDB(KruizeLMLayerEntry kruizeLayerEntry) ValidationOutputData
        +updateLayerToDB(KruizeLMLayerEntry kruizeLayerEntry) ValidationOutputData
        +deleteLayerByName(String layerName) ValidationOutputData
        +loadAllLayers() List~KruizeLMLayerEntry~
        +loadLayerByName(String layerName) List~KruizeLMLayerEntry~
    }

    class MetricsConfig {
        +meterRegistry() MeterRegistry
        +timerAddLayerDB Timer
        +timerLoadAllLayers Timer
        +timerLoadLayerByName Timer
        +timerUpdateLayerDB Timer
        +timerDeleteLayerDB Timer
        +timerBAddLayerDB Timer.Builder
        +timerBLoadAllLayers Timer.Builder
        +timerBLoadLayerByName Timer.Builder
        +timerBUpdateLayerDB Timer.Builder
        +timerBDeleteLayerDB Timer.Builder
    }

    class KruizeLayer {
        +getLayerName() String
        +getLayerPresence() Object
        +getTunables() List~Tunable~
    }

    class KruizeLMLayerEntry {
        +layer_name String
        +presence String
        +details String
        +tunables String
    }

    class Tunable {
        +name String
        +value_type String
        +validate() void
        -validateBounds() void
        -validateCategorical() void
    }

    class ValidationOutputData {
        +isSuccess() boolean
        +getMessage() String
        +setSuccess(boolean success) void
        +setMessage(String message) void
    }

    LayerService --> Converters : uses
    LayerService --> ExperimentDAO : uses
    LayerService --> KruizeLayer : uses
    LayerService --> KruizeLMLayerEntry : converts to
    LayerService --> ValidationOutputData : uses
    ExperimentDAOImpl ..|> ExperimentDAO
    ExperimentDAOImpl --> KruizeLMLayerEntry : persists
    ExperimentDAOImpl --> ValidationOutputData : returns
    ExperimentDAOImpl --> MetricsConfig : uses timers
    Converters --> KruizeLayer : creates
    Converters --> Tunable : validates and constructs
    Tunable --> ValidationOutputData : throws validation exceptions referenced by errors
Loading

File-Level Changes

Change Details Files
Implement update and delete operations for the Layer API with consistent JSON responses and query validation.
  • Extend LayerService servlet description and registration to cover create, list, update, and delete endpoints
  • Implement doPut for /updateLayer with name parameter validation, payload/URL name consistency checks, converter-based validation, DB existence check, and update via DAO
  • Implement doDelete for /deleteLayer with name validation, supported query param enforcement, existence check, and delete via DAO
  • Introduce JSON-based success/error response helper, change sendSuccessResponse signature to accept explicit HTTP status code, and ensure all errors return structured JSON instead of sendError
src/main/java/com/autotune/analyzer/services/LayerService.java
src/main/java/com/autotune/analyzer/Analyzer.java
src/main/java/com/autotune/utils/ServerContext.java
src/main/java/com/autotune/utils/KruizeConstants.java
src/main/java/com/autotune/analyzer/utils/AnalyzerConstants.java
src/main/java/com/autotune/utils/KruizeSupportedTypes.java
src/main/java/com/autotune/analyzer/utils/AnalyzerErrorConstants.java
Strengthen Layer create/update JSON validation and error messaging, including optional field handling and tunable/query/label semantics.
  • Add getRequiredString helper to Converters to standardize required string extraction with detailed messages
  • Tighten convertInputJSONToCreateLayer to validate presence/format of apiVersion, kind, metadata, and layer_name, use optString for optional details, and enforce non-null/typed elements in queries, labels, and tunables arrays
  • Add specific validation and error messages for queries, labels, tunable names/value types/bounds, and API structure, and adjust Tunable validation exceptions to remove redundant prefixes so messages match test expectations
  • Add special handling in LayerService for IllegalArgumentException and JSONException to map converter/JSON failures into user-friendly BAD_REQUEST messages, including missing-field reporting for update
src/main/java/com/autotune/analyzer/serviceObjects/Converters.java
src/main/java/com/autotune/analyzer/kruizeLayer/Tunable.java
tests/scripts/helpers/utils.py
src/main/java/com/autotune/analyzer/services/LayerService.java
src/main/java/com/autotune/analyzer/utils/AnalyzerErrorConstants.java
Extend Experiment DAO and metrics to support layer updates and deletes.
  • Extend ExperimentDAO interface with updateLayerToDB and deleteLayerByName
  • Implement updateLayerToDB using Hibernate merge with timing metric and proper transaction rollback and messaging
  • Implement deleteLayerByName using HQL delete, handling zero-row cases and logging, and returning ValidationOutputData
  • Add new DB timer builders/instances for updateLayerToDB and deleteLayerByName in MetricsConfig and wire them into DAO
  • Add DELETE_FROM_LAYER_BY_NAME HQL constant
src/main/java/com/autotune/database/dao/ExperimentDAO.java
src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java
src/main/java/com/autotune/utils/MetricsConfig.java
src/main/java/com/autotune/database/helper/DBConstants.java
Refactor Python helpers and tests to use the new DELETE API, improve cleanup, and cover new validation and behavior.
  • Replace kubectl-based delete_layer_from_db helper with HTTP DELETE-based delete_layer and add cleanup_all_layers to remove all layers via the API
  • Remove the autouse cleanup_test_layers fixtures and explicitly clean up using delete_layer/cleanup_all_layers within tests to avoid cross-test coupling
  • Add positive tests for listLayers performance, case sensitivity, and sorting order
  • Add positive create-layer tests around optional fields (details, presence.queries without key) and negative matrix tests for queries, labels, tunable names/value types/bounds, and API structural fields, including custom JSON construction when fields are missing
  • Update existing tests’ expected error messages to match the new, cleaner validation messages and use delete_layer instead of delete_layer_from_db
tests/scripts/helpers/kruize.py
tests/scripts/local_monitoring_tests/rest_apis/test_list_layers.py
tests/scripts/local_monitoring_tests/rest_apis/test_create_layer.py
tests/scripts/helpers/utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new helper getRequiredString() and the various validation strings hardcode user-facing messages; consider wiring these through AnalyzerErrorConstants (similar to other APIs) so that error wording remains centralized and consistent across the service.
  • In MetricsConfig you define timerBUpdateLayerDB and timerBDeleteLayerDB, but only updateLayerToDB uses its timer; consider instrumenting deleteLayerByName with the corresponding timer to keep DB metrics coverage consistent.
  • The sendJsonResponse method recreates a Gson instance on every call via createGsonObject(); you could cache a single Gson instance in LayerService to avoid repeated object creation on hot paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new helper `getRequiredString()` and the various validation strings hardcode user-facing messages; consider wiring these through `AnalyzerErrorConstants` (similar to other APIs) so that error wording remains centralized and consistent across the service.
- In `MetricsConfig` you define `timerBUpdateLayerDB` and `timerBDeleteLayerDB`, but only `updateLayerToDB` uses its timer; consider instrumenting `deleteLayerByName` with the corresponding timer to keep DB metrics coverage consistent.
- The `sendJsonResponse` method recreates a `Gson` instance on every call via `createGsonObject()`; you could cache a single `Gson` instance in `LayerService` to avoid repeated object creation on hot paths.

## Individual Comments

### Comment 1
<location path="src/main/java/com/autotune/analyzer/services/LayerService.java" line_range="391-401" />
<code_context>
+            }
+
+            // Delete from database
+            ValidationOutputData deletedFromDB = experimentDAO.deleteLayerByName(layerName);
+
+            if (deletedFromDB.isSuccess()) {
+                LOGGER.debug(KruizeConstants.LayerAPIMessages.DELETE_LAYER_FROM_DB, layerName);
+                sendSuccessResponse(response, String.format(KruizeConstants.LayerAPIMessages.DELETE_LAYER_SUCCESS_MSG, layerName),
+                        HttpServletResponse.SC_OK);
+            } else {
+                sendErrorResponse(response, null, HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
+                        deletedFromDB.getMessage());
+            }
</code_context>
<issue_to_address>
**suggestion (bug_risk):** A race on delete can incorrectly surface as a 500 instead of a 404 when the layer disappears between the existence check and deletion.

You pre-check existence via `loadLayerByName`, then call `deleteLayerByName`. If the row is deleted between those calls, `deleteLayerByName` returns `success=false` with a "not found" message, which you currently map to a 500. For a REST delete, that concurrent "already deleted" case should be treated as 404 (or possibly 204), not an internal error. Consider detecting the `DELETE_LAYER_ENTRY_NOT_FOUND_WITH_NAME` case (or introducing a dedicated flag) and returning 404 instead of 500.

```suggestion
            // Delete from database
            ValidationOutputData deletedFromDB = experimentDAO.deleteLayerByName(layerName);

            if (deletedFromDB.isSuccess()) {
                LOGGER.debug(KruizeConstants.LayerAPIMessages.DELETE_LAYER_FROM_DB, layerName);
                sendSuccessResponse(
                        response,
                        String.format(KruizeConstants.LayerAPIMessages.DELETE_LAYER_SUCCESS_MSG, layerName),
                        HttpServletResponse.SC_OK
                );
            } else {
                String notFoundMessage = String.format(
                        AnalyzerErrorConstants.APIErrors.DeleteLayerAPI.DELETE_LAYER_ENTRY_NOT_FOUND_WITH_NAME,
                        layerName
                );

                if (notFoundMessage.equals(deletedFromDB.getMessage())) {
                    // Layer was concurrently deleted between existence check and delete call; treat as 404
                    sendErrorResponse(response, null, HttpServletResponse.SC_NOT_FOUND, notFoundMessage);
                } else {
                    // Unexpected failure while deleting the layer; surface as 500
                    sendErrorResponse(
                            response,
                            null,
                            HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
                            deletedFromDB.getMessage()
                    );
                }
            }
```
</issue_to_address>

### Comment 2
<location path="src/main/java/com/autotune/database/dao/ExperimentDAOImpl.java" line_range="738-689" />
<code_context>
+    public ValidationOutputData deleteLayerByName(String layerName) {
</code_context>
<issue_to_address>
**suggestion:** Delete-layer DB method does not use the timer/metrics hooks that were added for other layer DB operations.

Since `updateLayerToDB` is already instrumented with `Timer.Sample` using `timerDeleteLayerDB` and `timerBDeleteLayerDB`, it would be helpful to apply the same pattern here: wrap `deleteLayerByName` in a `Timer.Sample`, tag by status, and stop the timer in a `finally` block to keep metrics consistent and reliable.

Suggested implementation:

```java
    @Override
    public ValidationOutputData deleteLayerByName(String layerName) {
        // Track DB delete-layer timing metrics (same pattern as updateLayerToDB)
        Timer.Sample deleteLayerDBSample = Timer.start(meterRegistry);
        String deleteLayerDBStatus = KruizeConstants.MetricsDB.DB_OPERATION_STATUS_FAILURE;

        ValidationOutputData validationOutputData = new ValidationOutputData(false, null, null);
        Transaction tx = null;
        try (Session session = KruizeHibernateUtil.getSessionFactory().openSession()) {
            try {
                tx = session.beginTransaction();
                Query query = session.createQuery(DBConstants.SQLQUERY.DELETE_FROM_LAYER_BY_NAME, null);
                query.setParameter("layerName", layerName);
                int deletedCount = query.executeUpdate();

                if (deletedCount == 0) {

```

1. At the point in `deleteLayerByName` where the delete is considered successful (e.g. when `deletedCount > 0` and the transaction is committed, and `validationOutputData` is set to success), set `deleteLayerDBStatus` to the appropriate success constant, mirroring what `updateLayerToDB` uses (for example `KruizeConstants.MetricsDB.DB_OPERATION_STATUS_SUCCESS`).
2. Wrap the entire existing body of `deleteLayerByName` (everything after the opening brace and before the `return validationOutputData;`) in an outer `try`/`finally` block:
   - In the `try` block, keep the existing logic unchanged.
   - In the `finally` block, stop and record the timer sample in both timers, using the same tag set as `updateLayerToDB`. For example (adapt names/registry to exactly match how `updateLayerToDB` is implemented):
   ```java
   finally {
       try {
           Timer timer = timerDeleteLayerDB
                   .tag("status", deleteLayerDBStatus);
           deleteLayerDBSample.stop(timer);
       } catch (Exception e) {
           // optionally log but do not rethrow, to avoid impacting main flow
       }

       try {
           Timer timerB = timerBDeleteLayerDB
                   .tag("status", deleteLayerDBStatus);
           deleteLayerDBSample.stop(timerB);
       } catch (Exception e) {
           // optionally log
       }
   }
   ```
3. Ensure `meterRegistry`, `timerDeleteLayerDB`, `timerBDeleteLayerDB`, and `KruizeConstants.MetricsDB.DB_OPERATION_STATUS_*` are already defined/imported as in `updateLayerToDB`. If any of these are not yet available in `ExperimentDAOImpl`, import and initialize them exactly the same way they are for the update-layer DB metrics.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +391 to +401
// Delete from database
ValidationOutputData deletedFromDB = experimentDAO.deleteLayerByName(layerName);

if (deletedFromDB.isSuccess()) {
LOGGER.debug(KruizeConstants.LayerAPIMessages.DELETE_LAYER_FROM_DB, layerName);
sendSuccessResponse(response, String.format(KruizeConstants.LayerAPIMessages.DELETE_LAYER_SUCCESS_MSG, layerName),
HttpServletResponse.SC_OK);
} else {
sendErrorResponse(response, null, HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
deletedFromDB.getMessage());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): A race on delete can incorrectly surface as a 500 instead of a 404 when the layer disappears between the existence check and deletion.

You pre-check existence via loadLayerByName, then call deleteLayerByName. If the row is deleted between those calls, deleteLayerByName returns success=false with a "not found" message, which you currently map to a 500. For a REST delete, that concurrent "already deleted" case should be treated as 404 (or possibly 204), not an internal error. Consider detecting the DELETE_LAYER_ENTRY_NOT_FOUND_WITH_NAME case (or introducing a dedicated flag) and returning 404 instead of 500.

Suggested change
// Delete from database
ValidationOutputData deletedFromDB = experimentDAO.deleteLayerByName(layerName);
if (deletedFromDB.isSuccess()) {
LOGGER.debug(KruizeConstants.LayerAPIMessages.DELETE_LAYER_FROM_DB, layerName);
sendSuccessResponse(response, String.format(KruizeConstants.LayerAPIMessages.DELETE_LAYER_SUCCESS_MSG, layerName),
HttpServletResponse.SC_OK);
} else {
sendErrorResponse(response, null, HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
deletedFromDB.getMessage());
}
// Delete from database
ValidationOutputData deletedFromDB = experimentDAO.deleteLayerByName(layerName);
if (deletedFromDB.isSuccess()) {
LOGGER.debug(KruizeConstants.LayerAPIMessages.DELETE_LAYER_FROM_DB, layerName);
sendSuccessResponse(
response,
String.format(KruizeConstants.LayerAPIMessages.DELETE_LAYER_SUCCESS_MSG, layerName),
HttpServletResponse.SC_OK
);
} else {
String notFoundMessage = String.format(
AnalyzerErrorConstants.APIErrors.DeleteLayerAPI.DELETE_LAYER_ENTRY_NOT_FOUND_WITH_NAME,
layerName
);
if (notFoundMessage.equals(deletedFromDB.getMessage())) {
// Layer was concurrently deleted between existence check and delete call; treat as 404
sendErrorResponse(response, null, HttpServletResponse.SC_NOT_FOUND, notFoundMessage);
} else {
// Unexpected failure while deleting the layer; surface as 500
sendErrorResponse(
response,
null,
HttpServletResponse.SC_INTERNAL_SERVER_ERROR,
deletedFromDB.getMessage()
);
}
}

@@ -689,6 +689,83 @@ public ValidationOutputData addLayerToDB(KruizeLMLayerEntry kruizeLayerEntry) {
return validationOutputData;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Delete-layer DB method does not use the timer/metrics hooks that were added for other layer DB operations.

Since updateLayerToDB is already instrumented with Timer.Sample using timerDeleteLayerDB and timerBDeleteLayerDB, it would be helpful to apply the same pattern here: wrap deleteLayerByName in a Timer.Sample, tag by status, and stop the timer in a finally block to keep metrics consistent and reliable.

Suggested implementation:

    @Override
    public ValidationOutputData deleteLayerByName(String layerName) {
        // Track DB delete-layer timing metrics (same pattern as updateLayerToDB)
        Timer.Sample deleteLayerDBSample = Timer.start(meterRegistry);
        String deleteLayerDBStatus = KruizeConstants.MetricsDB.DB_OPERATION_STATUS_FAILURE;

        ValidationOutputData validationOutputData = new ValidationOutputData(false, null, null);
        Transaction tx = null;
        try (Session session = KruizeHibernateUtil.getSessionFactory().openSession()) {
            try {
                tx = session.beginTransaction();
                Query query = session.createQuery(DBConstants.SQLQUERY.DELETE_FROM_LAYER_BY_NAME, null);
                query.setParameter("layerName", layerName);
                int deletedCount = query.executeUpdate();

                if (deletedCount == 0) {
  1. At the point in deleteLayerByName where the delete is considered successful (e.g. when deletedCount > 0 and the transaction is committed, and validationOutputData is set to success), set deleteLayerDBStatus to the appropriate success constant, mirroring what updateLayerToDB uses (for example KruizeConstants.MetricsDB.DB_OPERATION_STATUS_SUCCESS).
  2. Wrap the entire existing body of deleteLayerByName (everything after the opening brace and before the return validationOutputData;) in an outer try/finally block:
    • In the try block, keep the existing logic unchanged.
    • In the finally block, stop and record the timer sample in both timers, using the same tag set as updateLayerToDB. For example (adapt names/registry to exactly match how updateLayerToDB is implemented):
    finally {
        try {
            Timer timer = timerDeleteLayerDB
                    .tag("status", deleteLayerDBStatus);
            deleteLayerDBSample.stop(timer);
        } catch (Exception e) {
            // optionally log but do not rethrow, to avoid impacting main flow
        }
    
        try {
            Timer timerB = timerBDeleteLayerDB
                    .tag("status", deleteLayerDBStatus);
            deleteLayerDBSample.stop(timerB);
        } catch (Exception e) {
            // optionally log
        }
    }
  3. Ensure meterRegistry, timerDeleteLayerDB, timerBDeleteLayerDB, and KruizeConstants.MetricsDB.DB_OPERATION_STATUS_* are already defined/imported as in updateLayerToDB. If any of these are not yet available in ExperimentDAOImpl, import and initialize them exactly the same way they are for the update-layer DB metrics.

@bhanvimenghani bhanvimenghani changed the title Refactors validation logic to improve code quality and provide better error message Layers -15 Refactors validation logic to improve code quality and provide better error message Mar 12, 2026
@bhanvimenghani bhanvimenghani added this to the Kruize 0.11.0 Release milestone Apr 8, 2026
@bhanvimenghani bhanvimenghani moved this to Under Review in Monitoring Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Under Review

Development

Successfully merging this pull request may close these issues.

1 participant