Thursday, August 25 2005 @ 11:21 AM PDT
Contributed by: Admin
Views: 78,586
Sorry to be so quiet for so long; I've been tilting at windmills.
Here are direct links to the projects I created for PDF Hacks. I still have some contributions to fold in (thanks!), and I need to add descriptions, but it's a start.
Monday, July 25 2005 @ 02:34 AM PDT
Contributed by: bratkus
Views: 10,521
hello all,
During the last 2 months i've developed a pdf testing lib.
JpdfUnit is a framework for testing a generated pdf document with the JUnit Test Framework so JPdfUnit is a high level api. The framework is designed for an easy access to the PDFBox library. This functionality provides the user a lot of possibilities in pdf document handling. For instance you can test the meta data of the pdf document like the author or the creation date or search for content via Strings, fragments of words or even regular expressions. Different simple ready-to-use assertions allow the user to compare the expected data to the concrete data of the pdf document. JPdfUnit is developed to test one pdf document. You have got three different kinds of using the framework i.e. you can inherit our DocumentTestCase shown in the example or you can work with our DocumentTester class to avoid inheritance from our framework. Here is a small example:
public class OioTest extends DocumentTestCase {
public OioTest(String name) {
super(name);
}
protected DocumentDataSource getDataSource() {
DocumentDataSource datasource =
new PdfDataSource("etc/testing-pdfs/DocumentInformationTest.pdf");
return datasource;
}
public void testAssertAuthorsNameEquals() {
assertAuthorsNameEquals("Benjamin Bratkus");
}
}
The MacTech article I mentioned, Fill Online PDF Forms Using HTML Forms, is now freely available as part of MacTech's Sampler! The Sampler is a 167-page PDF that includes these 21 articles:
Creating A Cocoa AppController Class
Python For AppleScripters
Perforce
NoCode Browser
The Application Formerly Known As...
The Webserver In OS X
Thinking Logically
Using Entourage And Mail With An Exchange Server
Backup! Backup! Backup!
Remote Control
BZFlag: A SourceForge Open Source Project
Getting Started With PHP
More Finder Scripting
Fill Online PDF Forms Using HTML Forms
Becoming A Blogger With iBlog
Active Directory & Mac OS X
Screen Savers In Cocoa
Securing Mac OS X
The Motorola RAZR Is Cool, But I Want To Use Bluetooth
Podcasting 101
The Terminal: Why?
This PDF file ~was~ a 26MB download, but with some work I got it down to 12MB. I sent this smaller PDF to MacTech and hopefully it will be available soon.
Friday, July 08 2005 @ 07:18 AM PDT
Contributed by: drolevar
Views: 30,705
It turns out, that PDFTK doesn't work with PDF forms of version 1.6. They are filled without an error, but the fields appear empty in Acrobat 7.0. The older Acrobat versions complain about the newer PDF version, but display the contents of the fields.
What can be the problem?
Friday, July 08 2005 @ 03:48 AM PDT
Contributed by: Anonymous
Views: 16,047
I wanted to print the front pages ONLY of patents downloaded as PDFs. I couldn't work out how to do this with PDFTK in one pass, so I did as follows. This is in a COMMAND.COM DOS box (not CMD.EXE) under Windows 2000 and assumes a directory c:zdir that is clean apart from the particular PDFs under consideration. It uses Horst Schaeffer's wonderful LMOD (List MODifier) to make the batch files. The idea is to use PDFTK to split the first page only from each pdf, using LMOD to give them unique serial numbers, then a second pass with PDFTK to reassemble the new files into a single one, followed by a call to Acrobat Reader's command line function for printing (still have to press the OK button manually). Those "skilled in the art" will understand the rest; hope line wrapping and your system's odd habit of dropping backslashes doesn't screw things up too much.
PRPATPDF.BAT
------------
@echo off
c:dosutilspushd.exe
c:
cd c:zdir
for %%a in (do_it.bat pdf.dir front_pages.pdf $*.pdf) do if exist %%a del %%a > nul
dir *.pdf /O:N /N /b > pdf.dir
lmod /L* pdftk [] cat 1-1 output $[#].pdf do_it.bat
call do_it.bat
pdftk $*.pdf cat output front_pages.pdf
"C:Program FilesAdobeAcrobat 6.0ReaderAcroRd32.exe" /p "front_pages.pdf"
for %%a in (do_it.bat pdf.dir front_pages.pdf $*.pdf) do if exist %%a del %%a > nul
c:dosutilspopd.exe
Tuesday, June 21 2005 @ 10:27 AM PDT
Contributed by: Admin
Views: 17,878
Some pdftk users process hundreds of files. Performing this work on a Windows machine can yield unexpected results. The problem arises from the Windows command-prompt shell, not pdftk. The problem arises because for every long filename, Windows creates a short, DOS-compatible (8.3) filename. This short filename might end up matching a wildcard expression, even when the long filename does not. When using pdftk, the result is that you end up with more input files than you wanted.
This article offers a couple workarounds and then describes the case where this problem arose.
Sunday, May 29 2005 @ 04:08 PM PDT
Contributed by: Brian High
Views: 14,975
Where I work, we need to edit the document properties (metadata) of PDFs so that our search engine can find the documents easier. Up until now, we had to purchase copies of Adobe Acrobat (Full Version) just to do this simple task. This seemed like a lot of money to pay for something so simple.
Last Friday I stopped by my local library and checked out PDF Hacks and saw the page about pdftk. This was exactly what I was looking for! I toyed with it for a little while and decided to write a front-end for it. I know there is PDFTK Builder, but it does not do what I need.
So I have created two graphical front-ends to pdftk which are specifically to edit metadata (using pdftk's update_info feature).
Both scripts allow you to edit Title, Author, Subject, and Keywords.
The first script is a very simple VBScript program that can be run under Windows:
pdfmeta.vbs.
Just drag the PDF onto the script icon and follow the prompts. No other software is required, so long as you have Windows Script Host installed. (Most Windows systems come with WSH installed by default.)
I also wrote a nicer Perl/Tk version that is cross-platform which is here: pdfmeta.pl.
This is a nicer interface and is a little better written, so that it would be very easy for you to modify the script to use different fields. (Just change one line.)
If your operating system supports it, you can drag a PDF file onto the script icon, or you can just execute the script and there is a drag and drop interface for selecting the input file.
I have tested pdfmeta.pl under Debian GNU/Linux and Windows (2K Pro and XP Pro).
All Windows users need to do is install the Standard Edition of ActiveState's ActivePerl before using pdfmeta.pl.
(Most Linux/Unix systems should already be set up for Perl/Tk support.)
Please let me know of any bugs or suggestions. Thanks!
I have downloaded and tried a *large* number of programs that say they do this, but have encountered problems with the images generated by every one of them, including significant changes to the coloring of the images, text overlaid on images becoming unreadable, etc.
Among other things, I have documents that have different sized pages.
Adobe Acrobat 7.0 did the best job of saving each page as a JPEG with the best looking colors, etc., but I want to do this in batch mode.
So I was going to write a program that utilized the Acrobat API to do so: query for page properties, export as JPEG, etc.
Has anyone seen any good discussions of doing this in books, etc.? I didn't see any samples in the SDK documentation that addressed this directly.
I wrote an article for MacTech Magazine (Nov. 2004 ed.) about collecting form data using HTML forms and then packing this data into a PDF form for download. I posted the code online, along with a working example.
While merging data into a PDF form is old news to pdftk users, this example has an interesting twist. It uses pdftk's dump_data_fields operation to discover exactly what the PDF form wants, then creates a dynamic HTML form using this information. I.e., it automatically creates an HTML form to match your PDF form.
This HTML form is bare-bones, but it makes a good foundation for your web interface. It helps if you fill in the PDF fields' "Short Description," available via Acrobat's field properties dialog.
My MacTech article described the process in detail and introduced the reader to related topics, such as PDF forms, pdftk, and the FDF format. This artcile offers some tips on getting started and shows how to discover form field data using pdftk. Download the example PHP code here.
I haven't tested this on Adobe's new-fangled Acrobat 7 forms.
I ported forge_fdf to python for work, I thought it might be useful to
you to use/post along with forge_fdf.php on the website for others. I've
attached it, it's a direct block-for-block port, it could probably be
optimised a fair bit but it was the least amount of effort to port it
directly :) It makes an interesting comparison between the languages,
just look at the line count ;)
Forge_fdf is a little PHP script I created for casting form data into the FDF syntax. This is handy for filling online PDF forms automatically, or for filling forms using pdftk.