Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

# Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

## 📋 Summary

The MMCS (Multiphase Monte Carlo Sampling) implementation performs an expensive matrix transpose operation that creates an unnecessary full matrix copy. This impacts performance, especially for high-dimensional sampling scenarios with large sample sizes.

## 🔍 Problem Description

### Current Implementation

In `include/sampling/mmcs.hpp` (line 234) and related files, the code performs:

```cpp
MT Samples = TotalRandPoints.transpose(); //do not copy TODO!
for (int i = 0; i < total_samples; i++)
{
    Samples.col(i) = T * Samples.col(i) + T_shift;
}
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
S.block(0, total_number_of_samples_in_P0, P.dimension(), total_samples) = 
    Samples.block(0, 0, P.dimension(), total_samples);
```

### The Problem

**Current Flow (Inefficient):**

<img width="1664" height="850" alt="Image" src="https://github.com/user-attachments/assets/ac103c55-4e73-4f41-b11f-eb05b3036ea4" />

**Memory Flow:**
1. Read `TotalRandPoints` (n×d elements)
2. **Allocate new matrix** `Samples` (d×n elements) ← **WASTED MEMORY**
3. **Copy and transpose** all elements ← **EXPENSIVE OPERATION**
4. Transform each column: `T * Samples.col(i) + T_shift`
5. Copy transformed data to `S`

**Issues:**
1. **Memory Overhead**: Creates a full copy of the transposed matrix
   - For typical case: 50 dimensions × 5000 samples = 250,000 elements
   - Memory: ~2MB (double precision) wasted per iteration
   
2. **Computation Overhead**: O(n×d) transpose operation
   - Unnecessary for the actual computation needed
   
3. **Memory Bandwidth**: Doubles memory traffic
   - Read from `TotalRandPoints` → Write to `Samples` → Read from `Samples` → Write to `S`

### Matrix Dimensions

```
TotalRandPoints: (num_samples × dimension)
  - Each row represents one sample point
  - Shape: [total_samples, P.dimension()]

Samples (transposed copy): (dimension × num_samples)
  - Each column represents one sample point (transposed)
  - Shape: [P.dimension(), total_samples]

S (final result): (dimension × total_samples)
  - Each column is a transformed sample
  - Shape: [P.dimension(), total_number_of_samples_in_P0 + total_samples]
```

## 💡 Proposed Solution

### Optimized Implementation

Instead of creating a transposed copy, work directly with rows of `TotalRandPoints`:

```cpp
// Optimized: avoid transpose copy by working directly with rows
S.conservativeResize(P.dimension(), total_number_of_samples_in_P0 + total_samples);
for (int i = 0; i < total_samples; i++)
{
    // Transform each sample: T * sample + T_shift, then store as column in S
    S.col(total_number_of_samples_in_P0 + i).noalias() = 
        T * TotalRandPoints.row(i).transpose() + T_shift;
}
```

### Optimized Flow

**Optimized Flow (Efficient):**

<img width="1284" height="760" alt="Image" src="https://github.com/user-attachments/assets/4a8a3a6f-f72c-49fd-a690-923308a3b1ad" />

**Memory Flow:**
1. Read `TotalRandPoints.row(i)` (single row, d elements)
2. Transform: `T * row(i).transpose() + T_shift` (in-place computation)
3. Assign directly to `S.col(i)` with `noalias()` ← **NO COPY**
4. Repeat for all rows

**Key Improvement:**
- **Before**: Read → Copy → Transpose → Transform → Copy = 5 operations
-  **After**: Read → Transform → Assign = 3 operations

**Benefits:**
1.  **No Memory Copy**: Eliminates intermediate `Samples` matrix
2. **Direct Computation**: Transform and assign in one step
3. **Better Cache Usage**: Single pass through data
4. **Uses `noalias()`**: Prevents Eigen from creating temporaries


### References

- [Eigen Documentation - Aliasing](https://eigen.tuxfamily.org/dox/group__TopicAliasing.html)
- [Eigen Documentation - Matrix Operations](https://eigen.tuxfamily.org/dox/group__TutorialMatrixArithmetic.html)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling #413

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

📋 Summary

🔍 Problem Description

Current Implementation

The Problem

Matrix Dimensions

💡 Proposed Solution

Optimized Implementation

Optimized Flow

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling #413

Description

Performance Issue: Unnecessary Matrix Copy in MMCS Sampling

📋 Summary

🔍 Problem Description

Current Implementation

The Problem

Matrix Dimensions

💡 Proposed Solution

Optimized Implementation

Optimized Flow

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions