MalDocA - Malicious Document Analyzer
MalDocA is a library to parse and extract features from Microsoft Office documents. It supports both OLE and OOXML documents.
The project's goal is to analyze potentially malicious documents to improve user safety and security.
REQUIREMENTS
GENERAL
Some testdata files contain malicious code! Hence, we use a xor-encoding for some testdata files as a safety measure (key = 0x42). Additionally, they are prefixed by "MALICIOUS_" and postfixed by "_xor_0x42_encoded". In general, be very careful when opening / processing test files!
For convenience, we provide a python script ("testdata_encode.py") to encode / decode those files. The script's output is stored in the same path, having "_xored" as file name appendix. Keep in mind that encoding a file twice decodes it again, i.e. restores the original file.
Example usage: python testdata_encode.py maldoca/service/testdata/c98661bcd5bd2e5df06d3432890e7a2e8d6a3edcb5f89f6aaa2e5c79d4619f3d.docx
WINDOWS
- Bazel has some Windows related problems, e.g. maximum path length limitations. Make sure to read the best-practices to avoid them.
- Enable symlink support (how-to) as it is required by Bazel.
CHECKOUT
git clone --recurse-submodules https://github.com/google/maldoca.git
cd maldoca
BUILD
Linux: bazel build --config=linux //maldoca/...
Windows: bazel build --config=windows //maldoca/...
TEST
Linux: bazel test --config=linux //maldoca/...
Windows: bazel test --config=windows //maldoca/...
DOCKER
We provide a docker file in "docker/Dockerfile". This is the reference platform we use for continuous integration and optionally (arguably recommended) for development as well. Please check the documentation in "docker/Dockerfile" on how to build and use for development.
CONTACT
DISCLAIMER
This is not an official Google product.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
