Coding Global Background
Coding Global

I need help with removing headers and footers from PDF documents using Python (PyMuPDF / fitz).

Archiviert 8 days ago
4 Nachrichten
3 Mitglieder
a month ago
In Discord öffnen
L
Laiba
Verified

1. The PDFs are not well-formatted — some have only a date at the bottom, some have custom headers/footers at inconsistent Y-positions. 2. I want to automatically detect and remove the header/footer text from all pages. 3. I already handled image removal using xref (i.e., document.xref_is_image(xref)), so that part is done. Now I need a similar approach for header/footer detection (text blocks, XObjects, or any reliable method). Preferably using PyMuPDF (fitz) If anyone has code examples or a solid strategy for detecting/removing header/footer text based on coordinates or object types, I’d really appreciate your guidance. Thank you!

Antworten (4)