Skip to content

245 feature domain adaption implementation#256

Open
inarteroger wants to merge 10 commits intomainfrom
245-feature-domain-adaption-implementation
Open

245 feature domain adaption implementation#256
inarteroger wants to merge 10 commits intomainfrom
245-feature-domain-adaption-implementation

Conversation

@inarteroger
Copy link
Copy Markdown
Collaborator

Adding Direct Standardization algorithm.

@inarteroger inarteroger added this to the v1.0.0 milestone Apr 15, 2026
@inarteroger inarteroger requested a review from paucablop April 15, 2026 07:46
@inarteroger inarteroger self-assigned this Apr 15, 2026
@inarteroger inarteroger added the enhancement New feature or request label Apr 15, 2026
@inarteroger inarteroger linked an issue Apr 15, 2026 that may be closed by this pull request
@paucablop paucablop requested a review from Copilot April 15, 2026 21:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an initial implementation of a Direct Standardization (DS) transformer under chemotools.domain_adaption, along with a basic unit test, and includes several small typing/validation-related maintenance updates across the codebase.

Changes:

  • Add DirectStandardization transformer implementation and a corresponding test.
  • Tighten mode typing/constraints for MeanFilter and NorrisWilliams (removing unsupported "interp").
  • Standardize # type: ignore[...] annotations to use ty: error codes in several modules.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/domain_adaption/test_direct_standardization.py Adds a unit test for DS behavior (shape + improvement), but does not currently enforce sklearn estimator compatibility.
chemotools/domain_adaption/_direct_standardization.py Introduces the DS transformer, but currently has sklearn-API and fitted-state handling issues.
chemotools/smooth/_mean_filter.py Removes unsupported "interp" mode and adds Literal[...] typing for mode.
chemotools/derivative/_norris_william.py Removes unsupported "interp" mode and adds Literal[...] typing for mode.
chemotools/outliers/_hotelling_t2.py Updates type-ignore annotations to ty: codes.
chemotools/inspector/core/spectra.py Updates type-ignore annotations to ty: codes.
chemotools/inspector/core/regression.py Updates type-ignore annotations to ty: codes.
chemotools/inspector/core/latent.py Updates type-ignore annotations to ty: codes.
chemotools/inspector/_preprocessing_inspector.py Updates type-ignore annotations to ty: codes.
chemotools/inspector/_pca_inspector.py Updates type-ignore annotations to ty: codes.
chemotools/feature_selection/_vip_selector.py Updates type-ignore annotations to ty: codes.
chemotools/feature_selection/_range_cut.py Updates type-ignore annotations to ty: codes.
chemotools/feature_selection/_index_selector.py Updates type-ignore annotations to ty: codes.
chemotools/_validation.py Updates type-ignore annotations to ty: codes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +40 to +67
X_slave = np.random.randn((100,50))
X_portbale = X_slave*2+5
DS = DirectStandardization().fit(X_slave,X_master)
X_slave_transf = DS.transform(X_slave)

"""

def __init__(self):
pass

def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization":
"""
Fit the DirectStandardization to the input data.

Parameters
----------
X_slave : np.ndarray of shape (n_samples, n_features)
The slave data
X_master : np.ndarray of shape (n_samples, n_features)
The master data

Returns
-------
self : DirectStandardization
The fitted model.
"""
X_slave = np.asarray(X_slave, dtype=float)
X_master = np.asarray(X_master, dtype=float)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fit() currently takes (X_slave, X_master) instead of the scikit-learn convention (X, y=None). If this transformer is meant to be sklearn-compatible (per the test/docstring), consider using fit(self, X, y) where y is the master spectra, so utilities like check_estimator, pipelines, and meta-estimators work as expected.

Suggested change
X_slave = np.random.randn((100,50))
X_portbale = X_slave*2+5
DS = DirectStandardization().fit(X_slave,X_master)
X_slave_transf = DS.transform(X_slave)
"""
def __init__(self):
pass
def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization":
"""
Fit the DirectStandardization to the input data.
Parameters
----------
X_slave : np.ndarray of shape (n_samples, n_features)
The slave data
X_master : np.ndarray of shape (n_samples, n_features)
The master data
Returns
-------
self : DirectStandardization
The fitted model.
"""
X_slave = np.asarray(X_slave, dtype=float)
X_master = np.asarray(X_master, dtype=float)
X_slave = np.random.randn(100, 50)
X_master = X_slave * 2 + 5
DS = DirectStandardization().fit(X_slave, X_master)
X_slave_transf = DS.transform(X_slave)
"""
def __init__(self):
self.T = None
def fit(self, X: np.ndarray, y: np.ndarray = None) -> "DirectStandardization":
"""
Fit the DirectStandardization to the input data.
Parameters
----------
X : np.ndarray of shape (n_samples, n_features)
The slave data.
y : np.ndarray of shape (n_samples, n_features), default=None
The master data corresponding to ``X``.
Returns
-------
self : DirectStandardization
The fitted model.
"""
if y is None:
raise ValueError("y must be provided and contain the master spectra")
X_slave = np.asarray(X, dtype=float)
X_master = np.asarray(y, dtype=float)

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +43
"""
The :mod:'chemotools.domain_adaption:DirectStandardization'
module implements a Direct Standardization transformer
"""

# Authors: Ruggero Guerrini
# License: MIT

import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin


class DirectStandardization(BaseEstimator, TransformerMixin):
"""
Description
-----------
Implement a direct standardization transformer for the calibration
transfer application.
X_master contains the reference measurements acquired
on the master instrument.
X_slave contains the corresponding measurements of the same samples
acquired on the slave instrument.
The transformer estimates a mapping from the slave space to
the master space.
After fitting, new X_slave spectra can be transformed into
the X_master space.

Parameters
----------
None

Attributes
----------
T : np.ndarray of shape (n_features, n_features)
The pxp matrix that solver the problem X_slave T = X_master
using the method of least squares

Examples
--------
X_slave = np.random.randn((100,50))
X_portbale = X_slave*2+5
DS = DirectStandardization().fit(X_slave,X_master)
X_slave_transf = DS.transform(X_slave)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module/class docstrings have several documentation issues: the Sphinx role markup in the module docstring is malformed, and the example code contains typos and invalid code (e.g., np.random.randn((100,50)) and an undefined X_master). Please fix the markup and make the example runnable so it can be used in generated docs.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +34
class Test_Direct_Standardization:
"""
Test that enhanced Direct Standardization maintains sklearn API compatibility
"""

def test_shape_consistency_and_improvement(self):
# Arrange - I create a slave linked to my mster
np.random.seed(17)
X_master = np.random.rand(100, 50)
X_slave = 1.2 * X_master + 0.01 * np.random.randn(100, 50)

# Fit model
model = DirectStandardization().fit(X_slave, X_master)

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test claims sklearn API compatibility, but it doesn't run sklearn.utils.estimator_checks.check_estimator like other transformer tests in this repo, and it imports from a private module path. Consider adding a check_estimator(DirectStandardization()) test (or adjusting the estimator to pass) and importing from the public package API to keep tests consistent and catch API regressions.

Copilot uses AI. Check for mistakes.
Comment on lines 59 to +72
_parameter_constraints: dict = {
"window_length": [Interval(Integral, 3, None, closed="left")],
"mode": [
StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"})
],
"mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})],
"window_size": [
Interval(Integral, 3, None, closed="left"),
deprecated_parameter_constraint(),
],
}

def __init__(
self, window_length: int = 3, mode="nearest", window_size=DEPRECATED_PARAMETER
self,
window_length: int = 3,
mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest",
window_size=DEPRECATED_PARAMETER,
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MeanFilter no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the actual accepted values so users don't pass a mode that will now be rejected.

Copilot uses AI. Check for mistakes.
Comment on lines 77 to 99
_parameter_constraints: dict = {
"window_length": [Interval(Integral, 3, None, closed="left")],
"gap_size": [Interval(Integral, 1, None, closed="left")],
"deriv": [Interval(Integral, 1, 2, closed="both")],
"mode": [
StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"})
],
"mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})],
"window_size": [
Interval(Integral, 3, None, closed="left"),
deprecated_parameter_constraint(),
],
"derivative_order": [
Interval(Integral, 1, 2, closed="both"),
deprecated_parameter_constraint(),
],
}

def __init__(
self,
window_length: int = 5,
gap_size: int = 3,
deriv: int = 1,
mode="nearest",
mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest",
window_size=DEPRECATED_PARAMETER,
derivative_order=DEPRECATED_PARAMETER,
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NorrisWilliams no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the accepted values.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +71
X_slave = np.asarray(X_slave, dtype=float)
X_master = np.asarray(X_master, dtype=float)
if X_master.shape != X_slave.shape:
raise ValueError("master and slave must have the same dimensions")
self.T = np.linalg.pinv(X_slave) @ X_master
return self
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sklearn compatibility, fit()/transform() should use validate_data(...) to enforce 2D numeric input and set n_features_in_, and fitted attributes should follow the trailing-underscore convention (e.g., T_ instead of T). As written, DirectStandardization won't satisfy sklearn's estimator checks and may accept inputs that later fail at @ multiplication.

Copilot uses AI. Check for mistakes.
"""

def test_shape_consistency_and_improvement(self):
# Arrange - I create a slave linked to my mster
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "mster" should be "master".

Suggested change
# Arrange - I create a slave linked to my mster
# Arrange - I create a slave linked to my master

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +89
if self.T is None:
raise RuntimeError("Model not trained. Call train() first.")
return X_slave @ self.T
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform() checks if self.T is None, but self.T is never initialized in __init__, so calling transform() before fit() will raise AttributeError instead of the intended error. Use sklearn.utils.validation.check_is_fitted (and a fitted attribute like T_) or initialize the attribute in __init__ and check with hasattr/check_is_fitted.

Copilot uses AI. Check for mistakes.
The data transformed
"""
if self.T is None:
raise RuntimeError("Model not trained. Call train() first.")
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message in transform() says "Call train() first", but the public API is fit(). This is confusing for users and makes debugging harder; update the message (and consider raising the standard NotFittedError used by scikit-learn).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Domain Adaption implementation

2 participants