A PDF header is the initial part of any PDF file that declares the version of the PDF format and sets up some basic structure that PDF readers (like Adobe Acrobat) use to read the file. It contains metadata and instructions on how the PDF's internal objects are arranged and how to find them.
The PDF header structure starts with a human-readable format but is stored as binary in the file. Below is a hex dump of a minimal PDF structure:
The hex dump above represents the binary encoding of the following string in human-readable format:
%PDF-1.4\n1 0 obj\n<< /Type /Catalog >>\nendobj\nxref\n0 1\n0000000000 65535 f\ntrailer\n<< /Root 1 0 R >>\nstartxref\n0\n%%EOF
%PDF-1.4
This is the PDF version identifier. It’s stored as the ASCII values for the characters %PDF-1.4
, which translates to hex as 25 50 44 46 2d 31 2e 34
. This tells the PDF reader what version of PDF is being used.
1 0 obj
and endobj
This defines an object in the PDF. Each object has a unique ID. Here 1 0 obj
declares that this is object 1, generation 0. In hex, 31 20 30 20 6f 62 6a
means 1 0 obj
.
1 0 obj
declares that this is object 1.endobj
marks the end of the object, which in hex is 65 6e 64 6f 62 6a
.<< /Type /Catalog >>
This is a dictionary object that specifies the type of object. In this case, it’s a Catalog, which is the root of the PDF structure. The hex for this string is 3c 3c 20 2f 54 79 70 65 20 2f 43 61 74 61 6c 6f 67 20 3e 3e
.
xref
The cross-reference table starts here, helping the PDF reader locate objects. The hex is 78 72 65 66
.
0 1
This indicates the number of objects in the cross-reference table, which is just one in this case. The hex value is 30 20 31
.
0000000000 65535 f
This is an entry in the cross-reference table that specifies the byte offset of the object in the file. The hex is 30 30 30 30 30 30 30 30 30 20 36 35 35 33 35 20 66
.
trailer
The trailer section closes the PDF and includes key metadata. The hex is 74 72 61 69 6c 65 72
.
<< /Root 1 0 R >>
This refers to the root object, which is object 1. The hex for this is 3c 3c 20 2f 52 6f 6f 74 20 31 20 30 20 52 20 3e 3e
.
startxref
This marks the start of the cross-reference table. Its hex value is 73 74 61 72 74 78 72 65 66
.
%%EOF
This marks the end of the file. The hex is 25 25 45 4f 46
, which is critical for telling the PDF reader to stop reading.