Advanced configurations
CDR takes config in YAML format. For each file type, user can choose via configuration, which objects to remove. For example, you can configure to remove macro while keeping hyperlinks.
log_backup_count: Backup file count for log rollover (CDR rotates log file every week, ex. 4 will rotate in a month)
include_object: Include sanitized object details in result
skip_errors: Default False - True If there is error in some part of sanitization do not continue to sanitize file, return failed
hyperlink_allowlist: Regular expression list or filepath that contains regex patterns line by line
pdf_conf:
hyperlink: Sanitize hyperlinks in file
annotations: Annotations in PDF files, annotations other than URI
embedded_file: Remove embedded files in document
metadata: File metadata
macro: true
decrypt_force: True If pdf encrypted but has no password decrypt pdf and sanitize (this removes encryption when saving sanitized file)
image: Sanitize images in file
html_conf:
script: JavaScript code removal
hyperlink: Sanitize hyperlinks in file
word_conf:
hyperlink: Sanitize hyperlinks in file
metadata: File metadata
image: Sanitize images in file
macro: Macro code removal
ole_object: OLE object removal
activex_object: ActiveX object removal
dde: DDE payload removal
embedded_file: Sanitize embedded files in document
excel_conf:
hyperlink: Sanitize hyperlinks in file
metadata: File metadata
comment: Excel record comments
image: Sanitize images in file
macro: Macro code removal
ole_object: OLE object removal
activex_object: ActiveX object removal
dde: DDE payload removal
embedded_file: Sanitize embedded files in document
ppt_conf:
hyperlink: Sanitize hyperlinks in file
metadata: File metadata
image: Sanitize images in file
macro: Macro code removal
ole_object: OLE object removal
activex_object: ActiveX object removal
dde: DDE payload removal
embedded_file: Sanitize embedded files in document
opendocument_conf:
hidden_text: Opendocument hidden text
hyperlink: Sanitize hyperlinks in file
macro: Opendocument scripting framework macro
ole_object: OLE object removal
image: Sanitize images in file
xml_conf:
macro: macro objects
script: script objects
cdata: Character Data
archive_conf:
zip_compress_all: Compress all files even if it's not sanitized
image_conf:
lsb_image: Sanitize lsb bits(1 bit) from image (RGB, RGBA, L(grayscale), P(indexed color)) to prevent steganography
Allow Hyperlinks
Although hyperlinks are an integral part of navigating the web, they also pose a significant security risk as a common attack vector. To ensure the safety of users, hyperlinks are typically removed by default during the CDR process. However, we recognize that certain hyperlinks are essential for business operations. To address this need, we offer a flexible configuration that allows administrators to permit specific hyperlinks based on customizable regex patterns. This feature empowers administrators to define and allow only the links or domains that match safe, predefined patterns, striking a balance between usability and security.
Allowlist Pattern
hyperlink_allowlist: ['https://.*w3schools\.com.*', 'http://.*testuri\.org.*']
Whitelist hyperlink supported file types: MS Office, HTML, PDF, Opendocument
Last updated
Was this helpful?