PDF files

PDF file tricks.

No tricks yet, but I generally use evince as a viewer for PDF files. You can download acroread from adobe, but it can be a bit of a pig.

I have sometimes come across PDF files which contain images that I would like to extract and use for other purposes. Here are two approaches that work to pull images out of a PDF file.

pdfimage -j file.pdg xxx

This will extract all (well, almost all) images from the PDF file and store them as xxx-nnn.jpg or xxx-nnn.ppm. The string "xxx" is used as a prefix for all of the files generated. Not everything that looks like it should be an image in the pdf file will be extracted though, and at this point the details are beyond me.

convert file.pdf xxx.jpg

This command will convert each page of the pdf file into an image and generates a series of files of the form xxx-n.jpg. This works quite well, and after this you can attempt to use the gimp (good luck!) or some other tool to crop and scale the image you are really after out of the page sized image.

The command ls /usr/bin/pdf* shows a long list of commands, worthy of investigation, including:

pdftotext
pdftops
pdfinfo
pdftohtml
pdffonts
pdftosrc

There may be more clever tricks also, if you find any, let me know.

Have any comments? Questions? Drop me a line!

Adventures in Computing / ttrebisky@as.arizona.edu