245 feature domain adaption implementation#256
Conversation
…em from the “main” branch.
…em from the “main” branch.
There was a problem hiding this comment.
Pull request overview
This PR adds an initial implementation of a Direct Standardization (DS) transformer under chemotools.domain_adaption, along with a basic unit test, and includes several small typing/validation-related maintenance updates across the codebase.
Changes:
- Add
DirectStandardizationtransformer implementation and a corresponding test. - Tighten
modetyping/constraints forMeanFilterandNorrisWilliams(removing unsupported"interp"). - Standardize
# type: ignore[...]annotations to usety:error codes in several modules.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/domain_adaption/test_direct_standardization.py | Adds a unit test for DS behavior (shape + improvement), but does not currently enforce sklearn estimator compatibility. |
| chemotools/domain_adaption/_direct_standardization.py | Introduces the DS transformer, but currently has sklearn-API and fitted-state handling issues. |
| chemotools/smooth/_mean_filter.py | Removes unsupported "interp" mode and adds Literal[...] typing for mode. |
| chemotools/derivative/_norris_william.py | Removes unsupported "interp" mode and adds Literal[...] typing for mode. |
| chemotools/outliers/_hotelling_t2.py | Updates type-ignore annotations to ty: codes. |
| chemotools/inspector/core/spectra.py | Updates type-ignore annotations to ty: codes. |
| chemotools/inspector/core/regression.py | Updates type-ignore annotations to ty: codes. |
| chemotools/inspector/core/latent.py | Updates type-ignore annotations to ty: codes. |
| chemotools/inspector/_preprocessing_inspector.py | Updates type-ignore annotations to ty: codes. |
| chemotools/inspector/_pca_inspector.py | Updates type-ignore annotations to ty: codes. |
| chemotools/feature_selection/_vip_selector.py | Updates type-ignore annotations to ty: codes. |
| chemotools/feature_selection/_range_cut.py | Updates type-ignore annotations to ty: codes. |
| chemotools/feature_selection/_index_selector.py | Updates type-ignore annotations to ty: codes. |
| chemotools/_validation.py | Updates type-ignore annotations to ty: codes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| X_slave = np.random.randn((100,50)) | ||
| X_portbale = X_slave*2+5 | ||
| DS = DirectStandardization().fit(X_slave,X_master) | ||
| X_slave_transf = DS.transform(X_slave) | ||
|
|
||
| """ | ||
|
|
||
| def __init__(self): | ||
| pass | ||
|
|
||
| def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization": | ||
| """ | ||
| Fit the DirectStandardization to the input data. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| X_slave : np.ndarray of shape (n_samples, n_features) | ||
| The slave data | ||
| X_master : np.ndarray of shape (n_samples, n_features) | ||
| The master data | ||
|
|
||
| Returns | ||
| ------- | ||
| self : DirectStandardization | ||
| The fitted model. | ||
| """ | ||
| X_slave = np.asarray(X_slave, dtype=float) | ||
| X_master = np.asarray(X_master, dtype=float) |
There was a problem hiding this comment.
fit() currently takes (X_slave, X_master) instead of the scikit-learn convention (X, y=None). If this transformer is meant to be sklearn-compatible (per the test/docstring), consider using fit(self, X, y) where y is the master spectra, so utilities like check_estimator, pipelines, and meta-estimators work as expected.
| X_slave = np.random.randn((100,50)) | |
| X_portbale = X_slave*2+5 | |
| DS = DirectStandardization().fit(X_slave,X_master) | |
| X_slave_transf = DS.transform(X_slave) | |
| """ | |
| def __init__(self): | |
| pass | |
| def fit(self, X_slave: np.ndarray, X_master: np.ndarray) -> "DirectStandardization": | |
| """ | |
| Fit the DirectStandardization to the input data. | |
| Parameters | |
| ---------- | |
| X_slave : np.ndarray of shape (n_samples, n_features) | |
| The slave data | |
| X_master : np.ndarray of shape (n_samples, n_features) | |
| The master data | |
| Returns | |
| ------- | |
| self : DirectStandardization | |
| The fitted model. | |
| """ | |
| X_slave = np.asarray(X_slave, dtype=float) | |
| X_master = np.asarray(X_master, dtype=float) | |
| X_slave = np.random.randn(100, 50) | |
| X_master = X_slave * 2 + 5 | |
| DS = DirectStandardization().fit(X_slave, X_master) | |
| X_slave_transf = DS.transform(X_slave) | |
| """ | |
| def __init__(self): | |
| self.T = None | |
| def fit(self, X: np.ndarray, y: np.ndarray = None) -> "DirectStandardization": | |
| """ | |
| Fit the DirectStandardization to the input data. | |
| Parameters | |
| ---------- | |
| X : np.ndarray of shape (n_samples, n_features) | |
| The slave data. | |
| y : np.ndarray of shape (n_samples, n_features), default=None | |
| The master data corresponding to ``X``. | |
| Returns | |
| ------- | |
| self : DirectStandardization | |
| The fitted model. | |
| """ | |
| if y is None: | |
| raise ValueError("y must be provided and contain the master spectra") | |
| X_slave = np.asarray(X, dtype=float) | |
| X_master = np.asarray(y, dtype=float) |
| """ | ||
| The :mod:'chemotools.domain_adaption:DirectStandardization' | ||
| module implements a Direct Standardization transformer | ||
| """ | ||
|
|
||
| # Authors: Ruggero Guerrini | ||
| # License: MIT | ||
|
|
||
| import numpy as np | ||
| from sklearn.base import BaseEstimator, TransformerMixin | ||
|
|
||
|
|
||
| class DirectStandardization(BaseEstimator, TransformerMixin): | ||
| """ | ||
| Description | ||
| ----------- | ||
| Implement a direct standardization transformer for the calibration | ||
| transfer application. | ||
| X_master contains the reference measurements acquired | ||
| on the master instrument. | ||
| X_slave contains the corresponding measurements of the same samples | ||
| acquired on the slave instrument. | ||
| The transformer estimates a mapping from the slave space to | ||
| the master space. | ||
| After fitting, new X_slave spectra can be transformed into | ||
| the X_master space. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| None | ||
|
|
||
| Attributes | ||
| ---------- | ||
| T : np.ndarray of shape (n_features, n_features) | ||
| The pxp matrix that solver the problem X_slave T = X_master | ||
| using the method of least squares | ||
|
|
||
| Examples | ||
| -------- | ||
| X_slave = np.random.randn((100,50)) | ||
| X_portbale = X_slave*2+5 | ||
| DS = DirectStandardization().fit(X_slave,X_master) | ||
| X_slave_transf = DS.transform(X_slave) |
There was a problem hiding this comment.
The module/class docstrings have several documentation issues: the Sphinx role markup in the module docstring is malformed, and the example code contains typos and invalid code (e.g., np.random.randn((100,50)) and an undefined X_master). Please fix the markup and make the example runnable so it can be used in generated docs.
| class Test_Direct_Standardization: | ||
| """ | ||
| Test that enhanced Direct Standardization maintains sklearn API compatibility | ||
| """ | ||
|
|
||
| def test_shape_consistency_and_improvement(self): | ||
| # Arrange - I create a slave linked to my mster | ||
| np.random.seed(17) | ||
| X_master = np.random.rand(100, 50) | ||
| X_slave = 1.2 * X_master + 0.01 * np.random.randn(100, 50) | ||
|
|
||
| # Fit model | ||
| model = DirectStandardization().fit(X_slave, X_master) | ||
|
|
There was a problem hiding this comment.
This test claims sklearn API compatibility, but it doesn't run sklearn.utils.estimator_checks.check_estimator like other transformer tests in this repo, and it imports from a private module path. Consider adding a check_estimator(DirectStandardization()) test (or adjusting the estimator to pass) and importing from the public package API to keep tests consistent and catch API regressions.
| _parameter_constraints: dict = { | ||
| "window_length": [Interval(Integral, 3, None, closed="left")], | ||
| "mode": [ | ||
| StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"}) | ||
| ], | ||
| "mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})], | ||
| "window_size": [ | ||
| Interval(Integral, 3, None, closed="left"), | ||
| deprecated_parameter_constraint(), | ||
| ], | ||
| } | ||
|
|
||
| def __init__( | ||
| self, window_length: int = 3, mode="nearest", window_size=DEPRECATED_PARAMETER | ||
| self, | ||
| window_length: int = 3, | ||
| mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest", | ||
| window_size=DEPRECATED_PARAMETER, |
There was a problem hiding this comment.
MeanFilter no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the actual accepted values so users don't pass a mode that will now be rejected.
| _parameter_constraints: dict = { | ||
| "window_length": [Interval(Integral, 3, None, closed="left")], | ||
| "gap_size": [Interval(Integral, 1, None, closed="left")], | ||
| "deriv": [Interval(Integral, 1, 2, closed="both")], | ||
| "mode": [ | ||
| StrOptions({"nearest", "constant", "reflect", "wrap", "mirror", "interp"}) | ||
| ], | ||
| "mode": [StrOptions({"nearest", "constant", "reflect", "wrap", "mirror"})], | ||
| "window_size": [ | ||
| Interval(Integral, 3, None, closed="left"), | ||
| deprecated_parameter_constraint(), | ||
| ], | ||
| "derivative_order": [ | ||
| Interval(Integral, 1, 2, closed="both"), | ||
| deprecated_parameter_constraint(), | ||
| ], | ||
| } | ||
|
|
||
| def __init__( | ||
| self, | ||
| window_length: int = 5, | ||
| gap_size: int = 3, | ||
| deriv: int = 1, | ||
| mode="nearest", | ||
| mode: Literal["nearest", "constant", "reflect", "wrap", "mirror"] = "nearest", | ||
| window_size=DEPRECATED_PARAMETER, | ||
| derivative_order=DEPRECATED_PARAMETER, |
There was a problem hiding this comment.
NorrisWilliams no longer allows mode="interp" (constraints/type now exclude it), but the docstring for mode still lists "interp" as a supported option. Please update the docstring to match the accepted values.
| X_slave = np.asarray(X_slave, dtype=float) | ||
| X_master = np.asarray(X_master, dtype=float) | ||
| if X_master.shape != X_slave.shape: | ||
| raise ValueError("master and slave must have the same dimensions") | ||
| self.T = np.linalg.pinv(X_slave) @ X_master | ||
| return self |
There was a problem hiding this comment.
For sklearn compatibility, fit()/transform() should use validate_data(...) to enforce 2D numeric input and set n_features_in_, and fitted attributes should follow the trailing-underscore convention (e.g., T_ instead of T). As written, DirectStandardization won't satisfy sklearn's estimator checks and may accept inputs that later fail at @ multiplication.
| """ | ||
|
|
||
| def test_shape_consistency_and_improvement(self): | ||
| # Arrange - I create a slave linked to my mster |
There was a problem hiding this comment.
Typo in comment: "mster" should be "master".
| # Arrange - I create a slave linked to my mster | |
| # Arrange - I create a slave linked to my master |
| if self.T is None: | ||
| raise RuntimeError("Model not trained. Call train() first.") | ||
| return X_slave @ self.T |
There was a problem hiding this comment.
transform() checks if self.T is None, but self.T is never initialized in __init__, so calling transform() before fit() will raise AttributeError instead of the intended error. Use sklearn.utils.validation.check_is_fitted (and a fitted attribute like T_) or initialize the attribute in __init__ and check with hasattr/check_is_fitted.
| The data transformed | ||
| """ | ||
| if self.T is None: | ||
| raise RuntimeError("Model not trained. Call train() first.") |
There was a problem hiding this comment.
The error message in transform() says "Call train() first", but the public API is fit(). This is confusing for users and makes debugging harder; update the message (and consider raising the standard NotFittedError used by scikit-learn).
Adding Direct Standardization algorithm.