1.0

Before 2025-01-13, these notes are the changes that are done before major version 2.0

Features

  • Added a configurable whitelist for hyperlinks using regular expressions, allowing precise control over which links remain in sanitized documents.

  • Introduced an option to strip comments from Excel workbooks, removing hidden annotations and preventing potential information leakage.

  • Added decryption support for CDFV2‐encrypted Microsoft Office files so that previously locked documents can now be fully sanitized.

  • Expanded the result payload to include granular details—such as file size before and after sanitization, exact modifications applied, and relevant warnings—enabling more thorough auditing and debugging.

Changes

  • Refactored the CDR module for improved code organization, readability, and maintainability across the sanitization pipeline.

  • Updated the file‐type checker to recognize a broader range of legacy and modern formats, reducing false positives and false negatives during detection.

  • Restructured the cdr configuration schema in config.yaml, consolidating related settings and clarifying default values for easier management.

Fixes

  • Improved PDF image sanitization by detecting and deduplicating identical images, preventing redundant processing and reducing output size.

  • Fixed APK files being misdetected as ZIP archives, ensuring accurate file‐type classification during scanning.

  • Optimized LSB‐based image sanitization for significantly faster execution without sacrificing accuracy.

  • Addressed OpenDocument (ODT/ODS) ZIP compression corruption so that documents remain intact after sanitization.

  • Resolved issues in PPT image sanitization by correctly detecting EMF magic bytes, ensuring all embedded images are processed properly.

  • Fixed the temporary‐file naming logic to guarantee unique names and prevent runtime errors caused by collisions.

  • Improved hyperlink sanitization in Office files to remove only unsafe or broken links, preserving valid relationships.

  • Corrected macro handling so that macro object metadata is now correctly appended to the result payload after sanitization.

  • Updated the OLE‐object macro‐removal routine to support a wider range of embedded objects.

  • Applied minor bug fixes to eliminate various logical errors discovered during testing.

Last updated

Was this helpful?