DLP

From Halon, SMTP software for hosting providers
Jump to: navigation, search

The Halon SMTP software features a Data Loss Prevention (DLP) engine, that you can use to comply with DLP policy requirements. Our engine operate on a level that is called "data in motion", that is on data (e-mail) that is in-transit between two endpoints (clients and/or mail servers). It features different techniques in order to detect policy violations (all covered below). Once a violation is detected, the administrator may choose an appropriate action such as quarantine, log or reject the message.

Implementation

The Data Loss Prevention (DLP) engine is implemented by a process called maildlpd. It is used from within the DATA flow using the ScanDLP function. And it behaves very much like an anti-virus engine in the sense that it operation on patterns (user-defined), unpacks compressed archives, searches for violations and once done returns them back to the Content Flow so that an action may be taken.

  • Different part of the organization may have different policies.
  • There is a basic module in the Content Flow that may be used, and advanced users may choose to call the HSL function ScanDLP themselves.
  • It should primarily be used to detect outbound violations.
if (in_array("PROFANITY"ScanDLP(["PROFANITY"])))
    
Reject("Message was not sent due to profanity"); 

Filter types

When creating a DLP policy, you get to select the policy type, described in the following sections.

Content scanning

"Content" scanning allows for user-defined rules (regular expressions) to detect well known patterns such as, credit card numbers or "secret" project names. This is useful when you know that no such information should leave the organization. Matching is done case-insensitive.

This example may detect credit card numbers.

\b4\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b6011\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b4\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b3(?:0[0-5]|6\d|8\d)\d\s?-?\s?\d{6}\s?-?\s?\d{4}\b
\b(?:213\s?-?\s?1|180\s?-?\s?0)\d{3}\s?-?\s?(?:\d{4}\s?-?\s?){2}\b
\b3[47]\d{2}\s?-?\s?\d{6}\s?-?\s?\d{5}\b
\b5[1-5]\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b
\b35\d{2}\s?-?\s?(?:\d{4}\s?-?\s?){3}\b

You can add a comment by appending it to a row using the following syntax (without a blank space in between):

(?# Your comment goes here)

File type

"File name" and "MIME type" detection may not be a true DLP feature, but for example a software company may have filter to detect source code files (text/x-c or .cpp), and quarantine them until an administration/senior developer has cleared the intent.

Our engine implements a technology called "magic", it searches the beginning of a file to detect the appropriate MIME type for that file. Tools to detect file types (regardless of extension) are available in almost every Unix installation and is called "file". To the detect the MIME type of a file run file -mime-type filename.ext. The result shown are what should be used in your rules.

# file --mime-type main.cpp 
main.cpp: text/x-c

add ^text/x-c$ on a single line. Matching is done as regular expressions, therefore the start ^ and end $ should be marked ^text/x-c$. As for file extension you should escape . and mark the end as well \.cpp$. This is semi-important so that the filter .cpp doesn't match a filename like acpp-report.doc. Matching is done case-insensitive.

Document fingerprinting

"MD5 fingerprint", "SHA1 fingerprint" and "SHA2 fingerprint" allows for exact file matching, it should primary be used on files that is static by nature, such a images, binaries etc. because even the smallest change will alter the document fingerprint. MD5, SHA1 and SHA2 are all one-way hash algorithms, they take any data or document as input and outputs a string of text unique to that document. They are (for this purpose) equally good.

Tools to generate these hashes are available on all operating system. In Linux these tools are called "md5sum" and "sha1sum".

# md5sum document.ext
b07a682853e7bbafea145fa189dc7444 document.ext
# sha1sum document.ext
0cd377adf7ebbef00d7e4b0b388c05e21cfda9c7 document.ext

add b07a682853e7bbafea145fa189dc7444 on a single line on a MD5 fingerprint rule.

File name

File names are matched as regular expressions. For example, in order to block Windows executable (.exe) files, even if zipped, use the following file name pattern:

\.exe$

Testing

You may now test the rule by sending a ZIP file containing (c/cpp) source code files. The message should now rest in the quarantine, and may be released or deleted.

Oct 2 17:04:57 (warning) maildlpd: [67332] [...] Attachment mime-type 'text/x-c' violates DLP policy 'SOURCECODE'
Oct 2 17:04:57 (info) maildlpd: [67332] [...] Found DLP violation SOURCECODE
Oct 2 17:04:58 (info) mailscand: [67332] [...] Message was accepted for <[email protected]> (quarantined)