Converting a PDF that has multiple pages per page into all single pages

I recently started a project converting a magazine’s PDFs into a digital reader app.

Up until now, the PDFs that were given to me were all set up so that each page of the magazine was a separate page of the PDF.

Recently I have been getting them formatted differently however. Now they are set up so the front and back pages are single pages, but very inside page is set up as a landscape shaped PDF page that contains two different magazine pages per PDF page (I hope that makes sense).

The problem I have is the digital reader app cannot support this and I am at a loss trying to figure out how to split each page.

Any advice?

Answer

@Scott is correct. It sounds like they are sending you full spreads rather than the single page PDFs, and that should be very easy for them to fix.

However, for the “programmer/designers” among us, I thought I would share another solution. This requires:

  • pdfinfo from xpdf to identify the bounding box of the page in points
  • gs from Ghostscript to split the left and right pages
  • pdftk from the PDF Toolkit to put the PDF back together again.

Once these are all installed and in your PATH, you should be able to do the following.

Imagine you are dealing with a file named Example.pdf. You first run:

pdfinfo -f 1 -l 1000 -box Example.pdf

This will give you a lot of info about your PDF, including, for each page, the size in points as Page 1 size: 595 x 842 pts (A4). We need this information for our gs command. Specifically, each page (after being split) is 421 points wide by 595 points high. Since Ghostscript’s pdfwrite‘s resolution is 720 dpi and 1 inch is 72 points in PDF, we’ll multiply those values by 10 in the pdfwrite step below.

gs \
    -o out%03d_A.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [0 0]>> setpagedevice" \
    -f Example.pdf
gs \
    -o out%03d_B.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [-421 0]>> setpagedevice" \
    -f Example.pdf

If you are using Windows, replace the “\” with “^“.

The above commands will create a bunch of files named something like “out001_A.pdf”, “out001_B.pdf” and so on. Notice for the right pages, we use a negative page offset.

Finally, use pdftk to put it all back together again.

pdftk out*_?.pdf cat output combined.pdf

This worked like a charm for me 😉

Attribution
Source : Link , Question Author : Brodie , Answer Author : Ananda Mahto

Leave a Comment