Understanding a Minimal PDF Header with Hexdump

What is a PDF Header?

A PDF header is the initial part of any PDF file that declares the version of the PDF format and sets up some basic structure that PDF readers (like Adobe Acrobat) use to read the file. It contains metadata and instructions on how the PDF's internal objects are arranged and how to find them.

Hexdump of a Minimal PDF

The PDF header structure starts with a human-readable format but is stored as binary in the file. Below is a hex dump of a minimal PDF structure:

00000000 25 50 44 46 2d 31 2e 34 0a 31 20 30 20 6f 62 6a |%PDF-1.4.1 0 obj|
00000010 0a 3c 3c 20 2f 54 79 70 65 20 2f 43 61 74 61 6c |.<< /Type /Catalog|
00000020 6f 67 20 3e 3e 0a 65 6e 64 6f 62 6a 0a 78 72 65 |og >>.endobj.xref|
00000030 0a 30 20 31 0a 30 30 30 30 30 30 30 30 30 20 36 |.0 1.000000000 6|
00000040 35 35 33 35 20 66 0a 74 72 61 69 6c 65 72 0a 3c |5535 f.trailer.<|
00000050 3c 20 2f 52 6f 6f 74 20 31 20 30 20 52 20 3e 3e |<>|
00000060 0a 73 74 61 72 74 78 72 65 66 0a 30 0a 25 25 45 |.startxref.0.%%E|
00000070 4f 46 0a |OF. |

The hex dump above represents the binary encoding of the following string in human-readable format:

%PDF-1.4\n1 0 obj\n<< /Type /Catalog >>\nendobj\nxref\n0 1\n0000000000 65535 f\ntrailer\n<< /Root 1 0 R >>\nstartxref\n0\n%%EOF

Breaking Down the PDF Structure

1. %PDF-1.4

This is the PDF version identifier. It’s stored as the ASCII values for the characters %PDF-1.4, which translates to hex as 25 50 44 46 2d 31 2e 34. This tells the PDF reader what version of PDF is being used.

2. 1 0 obj and endobj

This defines an object in the PDF. Each object has a unique ID. Here 1 0 obj declares that this is object 1, generation 0. In hex, 31 20 30 20 6f 62 6a means 1 0 obj.

3. << /Type /Catalog >>

This is a dictionary object that specifies the type of object. In this case, it’s a Catalog, which is the root of the PDF structure. The hex for this string is 3c 3c 20 2f 54 79 70 65 20 2f 43 61 74 61 6c 6f 67 20 3e 3e.

4. xref

The cross-reference table starts here, helping the PDF reader locate objects. The hex is 78 72 65 66.

5. 0 1

This indicates the number of objects in the cross-reference table, which is just one in this case. The hex value is 30 20 31.

6. 0000000000 65535 f

This is an entry in the cross-reference table that specifies the byte offset of the object in the file. The hex is 30 30 30 30 30 30 30 30 30 20 36 35 35 33 35 20 66.

7. trailer

The trailer section closes the PDF and includes key metadata. The hex is 74 72 61 69 6c 65 72.

8. << /Root 1 0 R >>

This refers to the root object, which is object 1. The hex for this is 3c 3c 20 2f 52 6f 6f 74 20 31 20 30 20 52 20 3e 3e.

9. startxref

This marks the start of the cross-reference table. Its hex value is 73 74 61 72 74 78 72 65 66.

10. %%EOF

This marks the end of the file. The hex is 25 25 45 4f 46, which is critical for telling the PDF reader to stop reading.