Monday, July 21, 2008

PDF Unencrypting

Updated 07/29/2008: added the copy of the bookmarks (index) from the original document.

Usually, when you buy a PDF book, it's encrypted and protected by a password. No problem, I agree with authors and publishers that need to protect their work, even if this technique is a very poor protection.

Unfortunately usually some portable readers, like my PRS-505, don't have encryption/password support, so you can't upload those PDFs on them.

So you need to unencrypt the PDF for reading it on the device, a perfectly legal operation if you legally bought the book.

The easiest way to unencrypt a PDF should be to use the pdftk command line tool:

pdftk abook.pdf input_pw your_password_here output abook_unencrypted.pdf

Unfortunately it always emits this error:

Error: Failed to open PDF file:
OWNER PASSWORD REQUIRED, but not given (or incorrect)

Of course the password was correct and I tried with different combinations of input/owner/user password parameters. I think it's a bug of pdftk (version 1.41).

So I wrote a small Java program for unecrypting my files:

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.SimpleBookmark;

public class Unencrypt {
public static void main( String[] args )
throws IOException, DocumentException {
String inFile = args[0];
String outFile = args[1];
String password = args[2];
PdfReader pdfReader = new PdfReader(inFile, password.getBytes());
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(outFile));;
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
PdfImportedPage importedPage = copy.getImportedPage(pdfReader, i);
For executing it, download this file, and type:

java -cp bpdf.jar com.benfante.bpdf.Unencrypt \
abook.pdf abook_unencrypted.pdf \

*WARNING*: I'm not responsible of the use you can do of this code. It's not intended as a tool for infringing copyrights, just for reading PDFs you legally owns on devices that don't support encrypting.

Friday, July 18, 2008

Pattern matching

Lo so, è un oszvnrioseae baanle, ma sepsso il nstoro celvelro fiouznna mtolo mliego di qlsaauisi cotemupr. Ad emspeio, a meno che il vrosto clevlreo non aibba qlachue pomeblra, dsvtoere eesrse peatentertfme in gdaro di lrggeee qeusto tteso, anhce se le lrtetee dllee plraoe, a parte la pirma e l'ulmita, snoo sttae messe alla rnfiusa. Quetso adcace perhcè qnaduo lgeaimgo in rtlaeà non psatemiro aioetntzne alle sngiole ltrteee, ma alla porala nel suo clspemoso.

Se volete giocare a ricombinare le parole di vostri testi, provate questo piccolo programmino che ho scritto:

Scarica questo file se vuoi dare un'occhiata ai sorgenti (rilasciati sotto Apache License, Version 2.0).

Questa e molte altre informazioni, e alcuni suggerimenti di miglioramento delle proprie capacità di apprendimento, le potete trovare in un interessante libro che sto leggendo: "Pragmatic Thinking and Learning" di Andy Hunt.

Update 08-14-2008: I just finished to read this book, of course on my PRS-505. It's a great book, a must-to-read if you need to be a continuous learner. And if you are in the IT business, of course you need to be. I'm a self-employed, mostly home-based, consultant and developer: the "Manage Focus" chapter is illuminating, and very helpful for trying to enhance the productivity of my work environment. In general the book is perfect for acquiring consciousness of your learning paths and deficiencies, with a lot of good tips for improving yourself. What I bought from the book? At present, I started writing "Morning Pages"; I already have a two monitor desktop, and as a Linux user I'm already used to virtual desktop; limiting "down" time and being conscious of it; caring of distractions; trying a better deliberate reading and acquiring of information.

Tuesday, July 8, 2008

Reading PDFs on a PRS-505

Update 07/28/2008: maybe the conversion PDF to LRF is obsolete on the PRS-505. With this firmware update the PDFs are rendered much much better (but still not better than the following conversion).

Since some months I'm an happy user of a Sony Portable Reader System (PRS-505).

It's a fantastic device, and actually I can comfortably read digital articles and books, instead of burning my eyes in front of a lightly monitor.

I can even use it with Linux, thanks to Calibre.

The PRS-505 can natively read PDF files, but the rendering is not always appropriate, and it's very difficult to read long text without headaches. So, I usually convert them from PDF to LRF.

Look at the result. On the left the original PDF. On the right the same page in the resulting LRF.

The conversion is very easy. You mainly need Ghostscript, for embedding all fonts in the PDF, and the pdflrf converter.

I wrote a simple script for easily run them with the correct parameters:

# by Lucio Benfante

-dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer \
-dCompatibilityLevel=1.4 -dMaxSubsetPct=100 \
-dSubsetFonts=true -dEmbedAllFonts=true \
-sOutputFile=$1.embedding -f $1

pdflrf -r --rotation="0" --colors=8 -i $1.embedding -o $1.lrf

rm $1.embedding

Launch it with passing your PDF file as its single parameter: myBook.pdf

Please let me know if it's useful for you, or if you could find a better set of parameters for gs/pdflrf.