Share

Tips for Converting PDFs to Plone Content

Converting PDFs to Plone content can be challenging the first few times. These are a few tips that cover the basic conversion process.

Using an Appropriate Browser

Always use the Firefox web browser when editing content in Plone.

Why we recommend Firefox.

Keyboard Shortcuts

Many steps in this process involve selecting, copying, cutting, and pasting text. This is generally easiest using the following keyboard shortcuts on a Windows PC:

  • Select All: Control+A
  • Copy: Control+C
  • Cut: Control+X
  • Paste: Control+V
  • Find: Control+F

On a Mac, these will be "Command" (or "Apple") rather than "Control" keyboard shortcuts.

Copy and clean up the text in Notepad

  1. Open the PDF in Adobe Acrobat, or another PDF reader
  2. Open the program called "Notepad" (click on the Microsoft icon in the lower left of your computer screen, then "All Programs", then "Accessories", and then you will see Notepad.)
  3. In the PDF, Select All and Copy the contents.
  4. Paste the contents of the PDF into Notepad.
  5. In Notepad, click on the Format menu, and select "Word Wrap". This allows long lines to "wrap" and ensures that you can see all of the content.
  6. Verify that the text in Notepad reads in the same order as the PDF.

Potential problems

In some PDFs (particularly two- or three-column PDFs) the text may not be able to be copied and pasted in one action, since it will appear out of order in Notepad. PDFs sometimes store text out of order to achieve a specific visual result.

To get around this, you can select blocks of text (a few paragraphs at a time) and Copy and Paste into Notepad until the whole document is copied. Many times, the "breakpoint" is evident because the highlighted text jumps into another column, or skips text.

Moving content to Plone

  1. In Plone, navigate to the folder in which you will be creating the page
  2. In the green bar, select "Add New -> Page"
  3. Add the title of the PDF in the Title field. Do not duplicate this title later in the Body Text field.
  4. In Notepad, use Select All and Copy the contents.
  5. In Plone, scroll down to the Body Text field (with the rich text editor) and place the cursor in that field.
  6. From the "Style" dropdown menu (which currently shows "Normal paragraph") select "(remove style)"
  7. Place the cursor (again!) in the Body Text field
  8. Paste the content into the Body Text field.
  9. Select All of the content in the Body Text field.
  10. From the "Style" dropdown menu (which currently shows "<no style>") select "Normal paragraph"
  11. Cut and Paste the first sentence or two of the body text to the Description or Summary field below the title. Or, create a short summary describing the document contents.
  12. Save the Plone page to create the object.

Editing the content in Plone

After saving the initial copy of the PDF text as a new page in Plone, you can now edit the page to format the text. Click the "Edit" tab in the green bar at the top to edit the page, and use the information in the "Formatting the text in Plone" and "Additional Text Cleanup" sections to reapply formatting to the text.

When editing an existing document, you will see a "Save" icon (floppy disk) in the rich text editor toolbar. You can click this at any time to save the changes you've made without having to save and edit again.

Formatting the text in Plone

Headings and Subheadings

You can find these styles by clicking on the Style dropdown menu (far right side) of your tool bar.

Styles

  • Headings are the largest font size and used to emphasize the main categories of the fact sheet
  • Subheadings are the second largest font size and are used to emphasize sub-categories
  • h4 is the third largest font size and used to emphasize "sub-sub-categories"
  • Discreet is small, light gray text and used for image captions, author information, footnotes, etc.Kupu Styles Menu

Bold and Italics

Avoid using Bold unless it is to emphasize a small number of words within the text. Don't use Bold in place of headings.

Since Italics are hard to read online, only use these in cases where it's required by convention (e.g. publication titles.)

Lists (numbered or bulleted)

Creating lists makes text more readable on the web. There are two types of lists available on the tool bar:

  • Bulleted (Unordered list) for listing items when sequence is not important.
  • Numbered (Ordered list) for listing items when sequence is important.

Examples of Styles

Creating Lists

  1. Remove any "bullet" icons or numbers in front of the words
  2. Highlight the items that will appear in the list
  3. In the toolbar, click either the Numbered or Bulleted list icon.
  4. If "empty" bullets show in the list, place your cursor to the right of the bullet, and press the Backspace key.
  5. If multiple lines are combined into one, put the cursor in front of the first word of the second item in the list, then press the Enter key, and that item will be moved to the next line with a new bullet or number.

Indented text

You can indent text by highlighting the paragraph of text and clicking the "increase quote level" icon. This should only be used rarely, generally for including quotes as part of the text.

Call-outs

Call-outs highlight a short piece of text by putting the text in a colored box.

Call Out

Pull quotes

Pull quotes are similar to a call-out, but put the text (usually a quote that's relevant to, but not part of, the body text) in a box on the right side of the page.

Pull Quote

Tables

Tables should only be used for tabular data, and not for layout purposes.

When copying from a PDF, tables must be recreated, since the data is not stored as a table in the PDF. There are two methods that can be used.

Short tables

  1. In the body text tool bar, click on the table icon (second from the right). A popup window will present some options for the table:
    • Table Class (select "listing" for most cases)
    • Rows
    • Columns
    • Create Headings (leave checked)
  2. Click the "Add Table" button.
  3. Add content into each cell.

Longer tables

In some cases, it may be easier to recreate the table in Microsoft Excel and then use the Convert Excel tables to HTML utility to add them to Plone. The instructions for using that utility are in a video on that link.

Additional Text Cleanup

Removing excess whitespace

  1. In the rich text editor in Plone, click the HTML button in the green toolbar.
  2. Select All and Copy all of the HTML code
  3. Paste the HTML code into Notepad
  4. In Notepad, select "Edit" and then "Replace" to do a "Find and replace" on the HTML code. This feature will go through the entire text and replace one piece of text with another piece of text.
    Find and Replace Dialog
  5. Type "<br />" (note the space) in the "Find what" box.
  6. Type a space in the "Replace with" box.
  7. Click the "Replace All" button
  8. Repeat this for the following HTML code:
    • <p>&nbsp;</p>
    • &nbsp;
    This will replace all excess line breaks and whitespace in the HTML code.
  9. From Notepad, select all the text, copy it, and paste into Plone to overwrite the HTML code.
  10. Click the HTML button again.
  11. Preview the Body Text field
  12. Save the page

Hyphenated Words

In many cases, programs that generate PDFs will hyphenate words when laying out the document. Unfortunately, these words (e.g. "Pennsylva- nia") come through the copy and paste process.

They are usually easy to detect and fix, since Firefox's built-in spell check will underline them in red.  However, some words will not be caught because they're valid words (e.g. "re- solving") and require manual fixes. The easiest way to identify them is to search for "- " (dash space) using the browser search (Control+F.)