I need help with removing headers and footers from PDF documents using Python (PyMuPDF / fitz).

Archiviert 2 months ago

4 Nachrichten

3 Mitglieder

Erstellt 3 months ago

Aktualisiert 3 months ago

Laiba

Verified

1. The PDFs are not well-formatted — some have only a date at the bottom, some have custom headers/footers at inconsistent Y-positions.
2. I want to automatically detect and remove the header/footer text from all pages.
3. I already handled image removal using xref (i.e., document.xref_is_image(xref)), so that part is done.
Now I need a similar approach for header/footer detection (text blocks, XObjects, or any reliable method).
Preferably using PyMuPDF (fitz)
If anyone has code examples or a solid strategy for detecting/removing header/footer text based on coordinates or object types, I’d really appreciate your guidance.
Thank you!

I need help with removing headers and footers from PDF documents using Python (PyMuPDF / fitz).

Antworten (3)