Advanced configurations

CDR takes config in YAML format. For each file type, user can choose via configuration, which objects to remove. For example, you can configure to remove macro while keeping hyperlinks.

log_backup_count: Backup file count for log rollover (CDR rotates log file every week, ex. 4 will rotate in a month)
include_object: Include sanitized object details in result
skip_errors: Default False - True If there is error in some part of sanitization do not continue to sanitize file, return failed
hyperlink_allowlist: Regular expression list or filepath that contains regex patterns line by line
pdf_conf:
    hyperlink: Sanitize hyperlinks in file
    annotations: Annotations in PDF files, annotations other than URI  
    embedded_file: Remove embedded files in document
    metadata: File metadata
    macro: true
    decrypt_force: True If pdf encrypted but has no password decrypt pdf and sanitize (this removes encryption when saving sanitized file)
    image: Sanitize images in file
    html_conf:
    script: JavaScript code removal
    hyperlink: Sanitize hyperlinks in file
word_conf:
    hyperlink: Sanitize hyperlinks in file
    metadata: File metadata
    image: Sanitize images in file
    macro: Macro code removal
    ole_object: OLE object removal
    activex_object: ActiveX object removal
    dde: DDE payload removal
    embedded_file: Sanitize embedded files in document
excel_conf:
    hyperlink: Sanitize hyperlinks in file
    metadata: File metadata
    comment: Excel record comments
    image: Sanitize images in file
    macro: Macro code removal
    ole_object: OLE object removal
    activex_object: ActiveX object removal
    dde: DDE payload removal
    embedded_file: Sanitize embedded files in document
ppt_conf:
    hyperlink: Sanitize hyperlinks in file
    metadata: File metadata
    image: Sanitize images in file
    macro: Macro code removal
    ole_object: OLE object removal
    activex_object: ActiveX object removal
    dde: DDE payload removal
    embedded_file: Sanitize embedded files in document
opendocument_conf:
    hidden_text: Opendocument hidden text
    hyperlink: Sanitize hyperlinks in file
    macro: Opendocument scripting framework macro
    ole_object: OLE object removal
    image: Sanitize images in file
xml_conf:
    macro: macro objects
    script: script objects
    cdata: Character Data
archive_conf:
    zip_compress_all: Compress all files even if it's not sanitized
image_conf:
    lsb_image: Sanitize lsb bits(1 bit) from image (RGB, RGBA, L(grayscale), P(indexed color)) to prevent steganography

Although hyperlinks are an integral part of navigating the web, they also pose a significant security risk as a common attack vector. To ensure the safety of users, hyperlinks are typically removed by default during the CDR process. However, we recognize that certain hyperlinks are essential for business operations. To address this need, we offer a flexible configuration that allows administrators to permit specific hyperlinks based on customizable regex patterns. This feature empowers administrators to define and allow only the links or domains that match safe, predefined patterns, striking a balance between usability and security.

Allowlist Pattern

hyperlink_allowlist: ['https://.*w3schools\.com.*', 'http://.*testuri\.org.*']

Whitelist hyperlink supported file types: MS Office, HTML, PDF, Opendocument

Last updated

Was this helpful?