If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:
Merge PDF Documents
Split PDF Pages into a New Document
Decrypt Input as Necessary (Password Required)
Encrypt Output as Desired
Fill PDF Forms with FDF Data and/or Flatten Forms
Apply a Background Watermark
Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
Update PDF Metadata
Attach Files to PDF Pages or the PDF Document
Unpack PDF Attachments
Burst a PDF Document into Single Pages
Uncompress and Re-Compress Page Streams
Repair Corrupted PDF (Where Possible)
Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.
Friday, July 08 2005 @ 07:18 AM PDT
Contributed by: drolevar
Views: 26,098
It turns out, that PDFTK doesn't work with PDF forms of version 1.6. They are filled without an error, but the fields appear empty in Acrobat 7.0. The older Acrobat versions complain about the newer PDF version, but display the contents of the fields.
What can be the problem?
Friday, July 08 2005 @ 03:48 AM PDT
Contributed by: Anonymous
Views: 13,561
I wanted to print the front pages ONLY of patents downloaded as PDFs. I couldn't work out how to do this with PDFTK in one pass, so I did as follows. This is in a COMMAND.COM DOS box (not CMD.EXE) under Windows 2000 and assumes a directory c:zdir that is clean apart from the particular PDFs under consideration. It uses Horst Schaeffer's wonderful LMOD (List MODifier) to make the batch files. The idea is to use PDFTK to split the first page only from each pdf, using LMOD to give them unique serial numbers, then a second pass with PDFTK to reassemble the new files into a single one, followed by a call to Acrobat Reader's command line function for printing (still have to press the OK button manually). Those "skilled in the art" will understand the rest; hope line wrapping and your system's odd habit of dropping backslashes doesn't screw things up too much.
PRPATPDF.BAT
------------
@echo off
c:dosutilspushd.exe
c:
cd c:zdir
for %%a in (do_it.bat pdf.dir front_pages.pdf $*.pdf) do if exist %%a del %%a > nul
dir *.pdf /O:N /N /b > pdf.dir
lmod /L* pdftk [] cat 1-1 output $[#].pdf do_it.bat
call do_it.bat
pdftk $*.pdf cat output front_pages.pdf
"C:Program FilesAdobeAcrobat 6.0ReaderAcroRd32.exe" /p "front_pages.pdf"
for %%a in (do_it.bat pdf.dir front_pages.pdf $*.pdf) do if exist %%a del %%a > nul
c:dosutilspopd.exe
Tuesday, June 21 2005 @ 10:27 AM PDT
Contributed by: Admin
Views: 15,216
Some pdftk users process hundreds of files. Performing this work on a Windows machine can yield unexpected results. The problem arises from the Windows command-prompt shell, not pdftk. The problem arises because for every long filename, Windows creates a short, DOS-compatible (8.3) filename. This short filename might end up matching a wildcard expression, even when the long filename does not. When using pdftk, the result is that you end up with more input files than you wanted.
This article offers a couple workarounds and then describes the case where this problem arose.
I wrote an article for MacTech Magazine (Nov. 2004 ed.) about collecting form data using HTML forms and then packing this data into a PDF form for download. I posted the code online, along with a working example.
While merging data into a PDF form is old news to pdftk users, this example has an interesting twist. It uses pdftk's dump_data_fields operation to discover exactly what the PDF form wants, then creates a dynamic HTML form using this information. I.e., it automatically creates an HTML form to match your PDF form.
This HTML form is bare-bones, but it makes a good foundation for your web interface. It helps if you fill in the PDF fields' "Short Description," available via Acrobat's field properties dialog.
My MacTech article described the process in detail and introduced the reader to related topics, such as PDF forms, pdftk, and the FDF format. This artcile offers some tips on getting started and shows how to discover form field data using pdftk. Download the example PHP code here.
I haven't tested this on Adobe's new-fangled Acrobat 7 forms.
I ported forge_fdf to python for work, I thought it might be useful to
you to use/post along with forge_fdf.php on the website for others. I've
attached it, it's a direct block-for-block port, it could probably be
optimised a fair bit but it was the least amount of effort to port it
directly :) It makes an interesting comparison between the languages,
just look at the line count ;)
Forge_fdf is a little PHP script I created for casting form data into the FDF syntax. This is handy for filling online PDF forms automatically, or for filling forms using pdftk.
Thursday, April 21 2005 @ 08:22 AM PDT
Contributed by: Anonymous
Views: 11,941
I would like to split a large pdf file into several small files based on the account no on the page and would be able to give custom file name to each pdf. This I would like to do it in a batch job in solaris. How can I do it.
Any help is greatly appreciated.
I suggest using pdftotext to extract your PDF's text. run pdftotext --help to see its options. Its output uses the the formfeed (0x0C) character to show page breaks. Scan the output from pdftotext for the account number, or possibly some other distinctive feature, to find where to split the PDF. Count pages as you go along by counting formfeeds. Then create a pdftk command-line to perform the split and output the new file to your custom file name. Script using bash, if you like.
PDF has features for controlling how a document first appears in the viewer. These include page layout settings: Single Page, Continuous, and Continuous Facing Pages. These also include page mode settings: Show Bookmarks, Show Thumbnails, and Full Screen.
Pdftk does not currently have built-in features for setting these options, but you will find they are easy to set using a sed script. My article about editing PDF using sed will give you the background on this technique. Here I will desciribe how it applies to page layout and page mode settings.
Friday, February 04 2005 @ 08:52 AM PST
Contributed by: Admin
Views: 19,246
Here is an email I received today that describes a common PDF problem. I sketched out a solution, and some kind folks have offered scripts. Feel free to contribute.
In the old HP 6350 scanner (with sheet feeder), they have included
the HP Precision scan program that allows user to scan two-side paper in two passes.
Specifically after scanning all pages in other side, you will be
asked to turn over the whole pile and scan the back side, i.e.
first pass pages 1 3 5 7 second pass pages 8 6 4 2
Combined output pages 1 2 3 4 5 6 7 8
I wonder if your master piece "pdftk" can be used to serve this
purpose, i.e.
Taking the first page from PDF A, then the last page from PDF B; then
the 2nd page from PDF A, and then the (n-1) page from PDF B...
Tuesday, January 25 2005 @ 10:11 AM PST
Contributed by: Admin
Views: 19,013
Pdftk is a command-line program, so it helps to know command-line (a/k/a "shell") programming. Especially when you want to do something that pdftk doesn't know how to do by itself.
A good example of this is using pdftk to combine PDF in a special order. This article describes how to use the Windows Command Prompt batch language to combine PDFs by file creation date. I also show how to do this using the bash shell which comes with MSYS.