245 feature domain adaption implementation by inarteroger · Pull Request #256 · paucablop/chemotools

inarteroger · 2026-04-15T07:46:54Z

Adding Direct Standardization algorithm.

…em from the “main” branch.

…e "main" branch

Copilot

Pull request overview

This PR adds an initial implementation of a Direct Standardization (DS) transformer under chemotools.domain_adaption, along with a basic unit test, and includes several small typing/validation-related maintenance updates across the codebase.

Changes:

Add DirectStandardization transformer implementation and a corresponding test.
Tighten mode typing/constraints for MeanFilter and NorrisWilliams (removing unsupported "interp").
Standardize # type: ignore[...] annotations to use ty: error codes in several modules.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/domain_adaption/test_direct_standardization.py	Adds a unit test for DS behavior (shape + improvement), but does not currently enforce sklearn estimator compatibility.
chemotools/domain_adaption/_direct_standardization.py	Introduces the DS transformer, but currently has sklearn-API and fitted-state handling issues.
chemotools/smooth/_mean_filter.py	Removes unsupported `"interp"` mode and adds `Literal[...]` typing for `mode`.
chemotools/derivative/_norris_william.py	Removes unsupported `"interp"` mode and adds `Literal[...]` typing for `mode`.
chemotools/outliers/_hotelling_t2.py	Updates type-ignore annotations to `ty:` codes.
chemotools/inspector/core/spectra.py	Updates type-ignore annotations to `ty:` codes.
chemotools/inspector/core/regression.py	Updates type-ignore annotations to `ty:` codes.
chemotools/inspector/core/latent.py	Updates type-ignore annotations to `ty:` codes.
chemotools/inspector/_preprocessing_inspector.py	Updates type-ignore annotations to `ty:` codes.
chemotools/inspector/_pca_inspector.py	Updates type-ignore annotations to `ty:` codes.
chemotools/feature_selection/_vip_selector.py	Updates type-ignore annotations to `ty:` codes.
chemotools/feature_selection/_range_cut.py	Updates type-ignore annotations to `ty:` codes.
chemotools/feature_selection/_index_selector.py	Updates type-ignore annotations to `ty:` codes.
chemotools/_validation.py	Updates type-ignore annotations to `ty:` codes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T21:27:01Z

+        X_slave = np.random.randn((100,50))
+        X_portbale = X_slave*2+5
+        DS = DirectStandardization().fit(X_slave,X_master)
+        X_slave_transf = DS.transform(X_slave)
+
+    """
+
+    def __init__(self):
+        pass
+
+    def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization":
+        """
+        Fit the DirectStandardization to the input data.
+
+        Parameters
+        ----------
+        X_slave : np.ndarray of shape (n_samples, n_features)
+            The slave data
+        X_master : np.ndarray of shape (n_samples, n_features)
+            The master data
+
+        Returns
+        -------
+        self : DirectStandardization
+            The fitted model.
+        """
+        X_slave = np.asarray(X_slave, dtype=float)
+        X_master = np.asarray(X_master, dtype=float)


fit() currently takes (X_slave, X_master) instead of the scikit-learn convention (X, y=None). If this transformer is meant to be sklearn-compatible (per the test/docstring), consider using fit(self, X, y) where y is the master spectra, so utilities like check_estimator, pipelines, and meta-estimators work as expected.

Suggested change

X_slave = np.random.randn((100,50))

X_portbale = X_slave*2+5

DS = DirectStandardization().fit(X_slave,X_master)

X_slave_transf = DS.transform(X_slave)

"""

def __init__(self):

pass

def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization":

"""

Fit the DirectStandardization to the input data.

Parameters

----------

X_slave : np.ndarray of shape (n_samples, n_features)

The slave data

X_master : np.ndarray of shape (n_samples, n_features)

The master data

Returns

-------

self : DirectStandardization

The fitted model.

"""

X_slave = np.asarray(X_slave, dtype=float)

X_master = np.asarray(X_master, dtype=float)

X_slave = np.random.randn(100, 50)

X_master = X_slave * 2 + 5

DS = DirectStandardization().fit(X_slave, X_master)

X_slave_transf = DS.transform(X_slave)

"""

def __init__(self):

self.T = None

def fit(self, X: np.ndarray, y: np.ndarray = None) -> "DirectStandardization":

"""

Fit the DirectStandardization to the input data.

Parameters

----------

X : np.ndarray of shape (n_samples, n_features)

The slave data.

y : np.ndarray of shape (n_samples, n_features), default=None

The master data corresponding to ``X``.

Returns

-------

self : DirectStandardization

The fitted model.

"""

if y is None:

raise ValueError("y must be provided and contain the master spectra")

X_slave = np.asarray(X, dtype=float)

X_master = np.asarray(y, dtype=float)

Copilot · 2026-04-15T21:27:01Z

+"""
+The :mod:'chemotools.domain_adaption:DirectStandardization'
+module implements a Direct Standardization transformer
+"""
+
+# Authors: Ruggero Guerrini
+# License: MIT
+
+import numpy as np
+from sklearn.base import BaseEstimator, TransformerMixin
+
+
+class DirectStandardization(BaseEstimator, TransformerMixin):
+    """
+    Description
+    -----------
+    Implement a direct standardization transformer for the calibration
+    transfer application.
+    X_master contains the reference measurements acquired
+    on the master instrument.
+    X_slave contains the corresponding measurements of the same samples
+    acquired on the slave instrument.
+    The transformer estimates a mapping from the slave space to
+    the master space.
+    After fitting, new X_slave spectra can be transformed into
+    the X_master space.
+
+    Parameters
+    ----------
+    None
+
+    Attributes
+    ----------
+    T : np.ndarray of shape (n_features, n_features)
+        The pxp matrix that solver the problem X_slave T = X_master
+        using the method of least squares
+
+    Examples
+    --------
+        X_slave = np.random.randn((100,50))
+        X_portbale = X_slave*2+5
+        DS = DirectStandardization().fit(X_slave,X_master)
+        X_slave_transf = DS.transform(X_slave)


The module/class docstrings have several documentation issues: the Sphinx role markup in the module docstring is malformed, and the example code contains typos and invalid code (e.g., np.random.randn((100,50)) and an undefined X_master). Please fix the markup and make the example runnable so it can be used in generated docs.

Copilot · 2026-04-15T21:27:02Z

+class Test_Direct_Standardization:
+    """
+    Test that enhanced Direct Standardization maintains sklearn API compatibility
+    """
+
+    def test_shape_consistency_and_improvement(self):
+        # Arrange - I create a slave linked to my mster
+        np.random.seed(17)
+        X_master = np.random.rand(100, 50)
+        X_slave = 1.2 * X_master + 0.01 * np.random.randn(100, 50)
+
+        # Fit model
+        model = DirectStandardization().fit(X_slave, X_master)
+


This test claims sklearn API compatibility, but it doesn't run sklearn.utils.estimator_checks.check_estimator like other transformer tests in this repo, and it imports from a private module path. Consider adding a check_estimator(DirectStandardization()) test (or adjusting the estimator to pass) and importing from the public package API to keep tests consistent and catch API regressions.

Copilot · 2026-04-15T21:27:02Z

    _parameter_constraints: dict = {
        "window_length": [Interval(Integral, 3, None, closed="left")],
-        "mode": [
-            StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"})
-        ],
+        "mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})],
        "window_size": [
            Interval(Integral, 3, None, closed="left"),
            deprecated_parameter_constraint(),
        ],
    }

    def __init__(
-        self, window_length: int = 3, mode="nearest", window_size=DEPRECATED_PARAMETER
+        self,
+        window_length: int = 3,
+        mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest",
+        window_size=DEPRECATED_PARAMETER,


MeanFilter no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the actual accepted values so users don't pass a mode that will now be rejected.

Copilot · 2026-04-15T21:27:02Z

    _parameter_constraints: dict = {
        "window_length": [Interval(Integral, 3, None, closed="left")],
        "gap_size": [Interval(Integral, 1, None, closed="left")],
        "deriv": [Interval(Integral, 1, 2, closed="both")],
-        "mode": [
-            StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"})
-        ],
+        "mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})],
        "window_size": [
            Interval(Integral, 3, None, closed="left"),
            deprecated_parameter_constraint(),
        ],
        "derivative_order": [
            Interval(Integral, 1, 2, closed="both"),
            deprecated_parameter_constraint(),
        ],
    }

    def __init__(
        self,
        window_length: int = 5,
        gap_size: int = 3,
        deriv: int = 1,
-        mode="nearest",
+        mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest",
        window_size=DEPRECATED_PARAMETER,
        derivative_order=DEPRECATED_PARAMETER,


NorrisWilliams no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the accepted values.

Copilot · 2026-04-15T21:27:02Z

+        X_slave = np.asarray(X_slave, dtype=float)
+        X_master = np.asarray(X_master, dtype=float)
+        if X_master.shape != X_slave.shape:
+            raise ValueError("master and slave must have the same dimensions")
+        self.T = np.linalg.pinv(X_slave) @ X_master
+        return self


For sklearn compatibility, fit()/transform() should use validate_data(...) to enforce 2D numeric input and set n_features_in_, and fitted attributes should follow the trailing-underscore convention (e.g., T_ instead of T). As written, DirectStandardization won't satisfy sklearn's estimator checks and may accept inputs that later fail at @ multiplication.

Copilot · 2026-04-15T21:27:03Z

+    """
+
+    def test_shape_consistency_and_improvement(self):
+        # Arrange - I create a slave linked to my mster


Typo in comment: "mster" should be "master".

Suggested change

# Arrange - I create a slave linked to my mster

# Arrange - I create a slave linked to my master

Copilot · 2026-04-15T21:27:03Z

+        if self.T is None:
+            raise RuntimeError("Model not trained. Call train() first.")
+        return X_slave @ self.T


transform() checks if self.T is None, but self.T is never initialized in __init__, so calling transform() before fit() will raise AttributeError instead of the intended error. Use sklearn.utils.validation.check_is_fitted (and a fitted attribute like T_) or initialize the attribute in __init__ and check with hasattr/check_is_fitted.

Copilot · 2026-04-15T21:27:03Z

+            The data transformed
+        """
+        if self.T is None:
+            raise RuntimeError("Model not trained. Call train() first.")


The error message in transform() says "Call train() first", but the public API is fit(). This is confusing for users and makes debugging harder; update the message (and consider raising the standard NotFittedError used by scikit-learn).

inarteroger added 10 commits April 3, 2026 09:24

add domain adaption implementation

bcd8ea9

feat: initial implementation of Direct Standardization

c775a32

fix: CI errors

5799cde

fix: format and CI errors

2e1b0cd

fix: "Import block is un-sorted or un-formatted" error

c408728

fix: ty errors. I updated the functions for this branch by copying th…

dec6ef2

…em from the “main” branch.

fix: ty errors. I updated the functions for this branch by copying th…

5239e5d

…em from the “main” branch.

fix: ty errors. I updated the functions for this branch by copying th…

c626258

…e "main" branch

fix: format fix

d305132

fix: ty errors. I updated the functions for this branch by copying th…

1bc9e25

…e "main" branch

inarteroger added this to the v1.0.0 milestone Apr 15, 2026

inarteroger requested a review from paucablop April 15, 2026 07:46

inarteroger self-assigned this Apr 15, 2026

inarteroger added the enhancement New feature or request label Apr 15, 2026

inarteroger linked an issue Apr 15, 2026 that may be closed by this pull request

feature: Domain Adaption implementation #245

Open

paucablop requested a review from Copilot April 15, 2026 21:21

Copilot started reviewing on behalf of paucablop April 15, 2026 21:22 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

245 feature domain adaption implementation#256

245 feature domain adaption implementation#256
inarteroger wants to merge 10 commits intomainfrom
245-feature-domain-adaption-implementation

inarteroger commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Arrange - I create a slave linked to my mster
	# Arrange - I create a slave linked to my master

Conversation

inarteroger commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants