PDF | Search Results | Didier Stevens

Tuesday 11 June 2024

Update: pdf-parser.py Version 0.7.9

Filed under: My Software,Update — Didier Stevens @ 0:00

I added option -j –jsonoutput to my pdf-parser.py tool.

This option produces JSON output with the content of all of the streams, unfiltered.

To have the filtered stream content as JSON output, include option -f.

pdf-parser_V0_7_9.zip (http)
MD5: E435A374A233C9DFEDA8A4E16887FB99
SHA256: 99F50D4F030A5B3E9F9CBA20A7BB8C51FBA368526077CCA3466C784DA39D42DB

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Filed under: maldoc,Malware,Quickpost — Didier Stevens @ 18:07

Here is a YARA rule I developed to detect PDF/ActiveMime maldocs I wrote about in “Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs“.

It looks for files that start with %PDF- (this header can be obfuscated) and contain string QWN0aXZlTWlt (string ActiveMim in BASE64), possibly obfuscated with whitespace characters.

rule rule_pdf_activemime {
    meta:
        author = "Didier Stevens"
        date = "2023/08/29"
        version = "0.0.1"
        samples = "5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d,098796e1b82c199ad226bff056b6310262b132f6d06930d3c254c57bdf548187,ef59d7038cfd565fd65bae12588810d5361df938244ebad33b71882dcf683058"
        description = "look for files that start with %PDF- and contain BASE64 encoded string ActiveMim (QWN0aXZlTWlt), possibly obfuscated with extra whitespace characters"
        usage = "if you don't have to care about YARA performance warnings, you can uncomment string $base64_ActiveMim0 and remove all other $base64_ActiveMim## strings"
    strings:
        $pdf = "%PDF-"
//        $base64_ActiveMim0 = /[ \t\r\n]*Q[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim1 = /Q  [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim2 = /Q \t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim3 = /Q \r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim4 = /Q \n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim5 = /Q\t [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim6 = /Q\t\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim7 = /Q\t\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim8 = /Q\t\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim9 = /Q\r [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim10 = /Q\r\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim11 = /Q\r\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim12 = /Q\r\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim13 = /Q\n [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim14 = /Q\n\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim15 = /Q\n\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim16 = /Q\n\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim17 = /QW [ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim18 = /QW\t[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim19 = /QW\r[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim20 = /QW\n[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim21 = /QWN[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
    condition:
        $pdf at 0 and any of ($base64_ActiveMim*)
}

The regex used to detect characters QWN0aXZlTWlt interspersed with whitespace characters (YARA string $base64_ActiveMim0) has no atoms (for YARA’s Aho-Corasic algorithm) larger than 1 byte, and thus generates a warning, that prohibits its use for hunting with VirusTotal.

That is why I replaced that regex with 21 regexes that all start with 3 fixed bytes and thus allow YARA to select atoms that are large enough.

Quickpost info

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Filed under: maldoc,Malware,My Software,Quickpost — Didier Stevens @ 10:50

jpcert reported a new type of maldoc: “MalDoc in PDF – Detection bypass by embedding a malicious Word file into a PDF file –“.

These maldocs are PDF files that embed a Word document (ActiveMime) in MIME format.

ActiveMime documents can be analyzed by combining my emldump.py tool and oledump.py.

ActiveMime documents were heavily obfuscated in the past, and this is also the case here. As emldump.py version 0.0.11 was only able to handle the obfuscation of 2 of the 3 samples mentioned by jpcert, I released a new version to handle more obfuscation.

Here is an analysis example for sample 5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d.

Run emldump (version 0.0.12 or later) with option -F to fix the obfuscation of the mime-version header:

To find the part where the ActiveMime file was hidden, use option -E %HEADASCII% to view the first 20 characters of each part:

Here we can see that part 14 is not a JPEG file, but an ActiveMime file.

We extract it and pipe it into oledump.py:

That ActiveMime file contains VBA code:

These maldocs (at least the 3 samples shared by jpcert) can be detected by pdfid with option -e to display extra information:

There are a lot of bytes outside streams (usually for PDFs, there shouldn’t be) and the count of stream and endstream documents is different.

But like I said, these are detections for these 3 samples, it’s possible to modify those samples to remove the anomalies.

Quickpost info

Comments (2)

Sunday 12 February 2023

Update: pdf-parser.py Version 0.7.8

Filed under: My Software,Update — Didier Stevens @ 12:15

A small feature update for pdf-parser.py Statistics include unreferenced objects now:

pdf-parser_V0_7_8.zip (http)
MD5: 7BBEA9497666397CBBB88B012A710210
SHA256: FE393865861E00B48124B99CD5AEBBB5A632F1FBD883F4E4044DF8C8FA75BE9D

Thursday 10 November 2022

Update: pdf-parser.py Version 0.7.7

Filed under: My Software,Update — Didier Stevens @ 0:00

This is a small update: you can now select which hash algorithm to use for option -H by setting environment variable DSS_DEFAULT_HASH_ALGORITHMS.

And the statistics options (-a) also display a list of objects with streams.

pdf-parser_V0_7_7.zip (http)
MD5: BCAE193F171184F979603DFB1380FF43
SHA256: 576C429FA88CF0A7A110DAB25851D90670C88EC4CD7728329E754E06D8D26A70

Comments (1)

Monday 27 June 2022

Quickpost: Cracking PDF Owner Passwords

Filed under: Encryption,PDF — Didier Stevens @ 0:00

I added code to John the Ripper to crack PDF owner passwords (JtR cracks PDF user passwords only).

Source code can be found here.

Compiled Windows (Cygwin) and Linux (Ubuntu) executables can be found here.

This change introduces a new format: $pdfo$.

There is no tool for the moment to create this format. Just use pdf2john.pl to create a $pdf$ hash, and then change it into a $pdfo$ hash. To crack the owner password, one needs to recover the user password first.

This is the illustrated process:

There will be a PR for this change.

Cracking PDF owner passwords is just an academic exercise (writing this code was also just an exercise), as tools like QPDF can decrypt PDFs encrypted with a PDF owner password only without requiring the cleartext PDF owner password as argument.

Quickpost info

Comments (1)

Thursday 26 May 2022

Update: pdf-parser.py Version 0.7.6

Filed under: My Software,Update — Didier Stevens @ 9:56

This new version of pdf-parser fixes a couple of bug and has a work around for non compliant PDFs.

pdf-parser_V0_7_6.zip (http)
MD5: 3B6F837AF147422B1256596BCA69D737
SHA256: 34379A9987B2286706AF4C43AC72C93611AE3E9C0C571DD729EBB09C7A707A0D

Comments (1)

Friday 20 August 2021

Update: pdf-parser.py Version 0.7.5

Filed under: My Software,PDF,Uncategorized,Update — Didier Stevens @ 0:00

This is a bug fix version.

pdf-parser_V0_7_5.zip (https)
MD5: D39E98981E6FEA48BF61CA2F78ED0B09
SHA256: 5D970AFAC501A71D4FDDEECBD63060062226BF1D587A6A74702DDA79B5C2D3FB

Comments (1)

Update: pdfid.py Version 0.2.8

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version

pdfid_v0_2_8.zip (https)
MD5: 9DDE1D9010D860303B03F3317DAF07B4
SHA256: 0D0AA12592FA29BC5E7A9C3CFA0AAEBB711CEF373A0AE0AD523723C64C9D02B4

Comments (3)

Sunday 31 January 2021

New Tool: pdftool.py

Filed under: My Software,PDF — Didier Stevens @ 0:00

pdftool.py is a new tool I developed. This version has only one command: iu (incremental updates).

With this command, one can check if a PDF has incremental updates, and then select different versions of this PDF with incremental updates.

A PDF with incremental updates, is a PDF that has been modified by appending changes to the document at the end of the PDF file, without modifying the original content.

Here is a video explaining incremental updates and the use of my tool.

I reference 2 blog posts in the video: “Solving a Little PDF Puzzle” and “Shoulder Surfing a Malicious PDF Author“.

pdftool_V0_0_1.zip (https)
MD5: ED2BBE886008C737CC06E22F4F0FE8A1
SHA256: 401E88FBFAEC4382A50FE59430D04FE6111F9911958AB09BA7530C26043FDA87

Comments (3)

Didier Stevens

Tuesday 11 June 2024

Update: pdf-parser.py Version 0.7.9

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Sunday 12 February 2023

Update: pdf-parser.py Version 0.7.8

Thursday 10 November 2022

Update: pdf-parser.py Version 0.7.7

Monday 27 June 2022

Quickpost: Cracking PDF Owner Passwords

Thursday 26 May 2022

Update: pdf-parser.py Version 0.7.6

Friday 20 August 2021

Update: pdf-parser.py Version 0.7.5

Update: pdfid.py Version 0.2.8

Sunday 31 January 2021

New Tool: pdftool.py

Pages

Top Posts

Categories

Blog Stats

Twitter @DidierStevens

Archives