Skip to content

Faulty Stego Embedding Validation #3

@birne420

Description

@birne420

Currently, the following (faulty) stego validation methodology is implemented:

  1. Encode Message
  2. Decode Message
  3. Compare
# embed the message
secret_data = method.encode(data=file.samples, message=secret_msg)

# extract the message (the embedding has not yet been written to the file)
decoded_message = method.decode(secret_data, len(secret_msg))

# compare
np.array_equal(secret_msg, decoded_message)

A correct validation methodology (much closer to a real-world scenario) would require the stego embedding to be written to a file, and then re-read the entire file/samples to take quantization noise into account:

  1. Encode Message
  2. Write Message to file
  3. Read Message from file
  4. Decode Message
  5. Compare
# embed the message
secret_data = method.encode(data=file.samples, message=secret_msg)

# write embedded message to file
file.samples = secret_data
outfile_suffix = file.save_steganography_file(output, file.samples, type(method).__name__)

# load samples back from the written embedding
check = WavFile.load(output / outfile_suffix)
check_message = method.decode(check.samples, len(secret_msg))

# compare
np.array_equal(secret_msg, check_message)

# with following altered WavFile class
@dataclass
class WavFile:
    samplerate: int
    samples: np.ndarray
    path: Path

    @staticmethod
    def load(path: Path) -> WavFile:
        samples, fs = sf.read(path, dtype='float32')
        return WavFile(samplerate=fs, samples=samples, path=path)

    def save_steganography_file(self, output_path: Path, a_samples: np.ndarray, suffix: str | None = None):
        sf.write(file=output_path / self._steganography_filename(suffix), data=a_samples, samplerate=self.samplerate)
        return self._steganography_filename(suffix)

    def _steganography_filename(self, suffix: str | None = None):
        return "{0}_{2}{1}".format(self.path.stem, self.path.suffix, "stego-" + (suffix if suffix is not None else ""))

We implemented both approaches in https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/taf-wrapper/entrypoint.py (line 193 “recoverable” vs. line 205 “recoverable2”).
The different methodologies yield partly very different results, depending on the stego method, especially LSB. See https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/io/example_output/taf/tab.pdf for our full results for different formats/methods (16kHz and 44.1kHz tested).
Audio files we used for testing (converted to different formats using ffmpeg): https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/tree/main/io/dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions