Currently, the following (faulty) stego validation methodology is implemented:
- Encode Message
- Decode Message
- Compare
# embed the message
secret_data = method.encode(data=file.samples, message=secret_msg)
# extract the message (the embedding has not yet been written to the file)
decoded_message = method.decode(secret_data, len(secret_msg))
# compare
np.array_equal(secret_msg, decoded_message)
A correct validation methodology (much closer to a real-world scenario) would require the stego embedding to be written to a file, and then re-read the entire file/samples to take quantization noise into account:
- Encode Message
- Write Message to file
- Read Message from file
- Decode Message
- Compare
# embed the message
secret_data = method.encode(data=file.samples, message=secret_msg)
# write embedded message to file
file.samples = secret_data
outfile_suffix = file.save_steganography_file(output, file.samples, type(method).__name__)
# load samples back from the written embedding
check = WavFile.load(output / outfile_suffix)
check_message = method.decode(check.samples, len(secret_msg))
# compare
np.array_equal(secret_msg, check_message)
# with following altered WavFile class
@dataclass
class WavFile:
samplerate: int
samples: np.ndarray
path: Path
@staticmethod
def load(path: Path) -> WavFile:
samples, fs = sf.read(path, dtype='float32')
return WavFile(samplerate=fs, samples=samples, path=path)
def save_steganography_file(self, output_path: Path, a_samples: np.ndarray, suffix: str | None = None):
sf.write(file=output_path / self._steganography_filename(suffix), data=a_samples, samplerate=self.samplerate)
return self._steganography_filename(suffix)
def _steganography_filename(self, suffix: str | None = None):
return "{0}_{2}{1}".format(self.path.stem, self.path.suffix, "stego-" + (suffix if suffix is not None else ""))
We implemented both approaches in https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/taf-wrapper/entrypoint.py (line 193 “recoverable” vs. line 205 “recoverable2”).
The different methodologies yield partly very different results, depending on the stego method, especially LSB. See https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/io/example_output/taf/tab.pdf for our full results for different formats/methods (16kHz and 44.1kHz tested).
Audio files we used for testing (converted to different formats using ffmpeg): https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/tree/main/io/dataset
Currently, the following (faulty) stego validation methodology is implemented:
A correct validation methodology (much closer to a real-world scenario) would require the stego embedding to be written to a file, and then re-read the entire file/samples to take quantization noise into account:
We implemented both approaches in https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/taf-wrapper/entrypoint.py (line 193 “recoverable” vs. line 205 “recoverable2”).
The different methodologies yield partly very different results, depending on the stego method, especially LSB. See https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/blob/main/io/example_output/taf/tab.pdf for our full results for different formats/methods (16kHz and 44.1kHz tested).
Audio files we used for testing (converted to different formats using ffmpeg): https://gitti.cs.uni-magdeburg.de/birnbaum/audio-stego-stega-toolset/-/tree/main/io/dataset