Advertise Mobile SDKs Books Events Forum News Social Networking Support Us
Follow @iphonedevsdk on Twitter

Interface 2, Advanced iOS
Mockup & Code Gen
($9.99)

Make your own iPhone apps
and run them live!
(free)

Pic Frame Dynamo: Photo Editing
($0.99)

Abiliator
($1.99)

Want your application or service advertised on iPhone Dev SDK?

Go Back   iPhone Dev SDK Forum > iPhone SDK Development Forums > iPhone SDK Development

Reply
 
LinkBack Thread Tools Display Modes
Old 05-23-2011, 11:52 PM   #1 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default PDFs and selecting/highlighting text: Translating touches

Hey there all!

In my app I can currently load and display basic PDF's using Apple's somewhat limited CGPDF API functions (ZoomingPDF example on dev.apple). I can even look at the objects in the underlying PDF structure and place those in an array with not too much trouble.

I decided recently that it would be dandy if I could enable touch & swipe highlighting on my drawn PDF view (I assume apps like GoodReader and iAnnotate do this, but don't know for certain).

Of course, PDF's are basically graphics, not bona-fide text like in a UITextField, so I figured I would need to:
  1. draw the highlight effect as an overlay.
  2. convert co-ordinates and rect of the overlay into an underlying PDF object selection
  3. write out to the PDF using a custom library.

The part I can't get my head around is translating my touch co-ordinates and rects into a word or sentence selection from my PDF's data structure. Does anyone have an idea on how to do this?

PS. I have searched for 3rd-party libraries to allow this with no success (I found FastPDFKit, but that doesn't support the touch translation I'm after). I did also find a very nice C library called PDFEdit, but I'm unsure on how to get iOS's touch commands to work with it.
mokargas is offline   Reply With Quote
Old 05-24-2011, 06:11 PM   #2 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default

After a hell of a lot of research, I found out I'm going to have to ditch CGPDF for the most part, write my own parser (or compile a static library from Poppler or another OSS PDF libary) and manually position selectable UIKit representations for each PDF object.


It's a project in itself, and makes me appreciate apps like GoodReader and iAnnotate. I'll try post code when I get around to it.
mokargas is offline   Reply With Quote
Old 05-25-2011, 02:13 AM   #3 (permalink)
Nuisance Developer
 
Join Date: Jul 2009
Location: Italy
Posts: 4,691
dany_dev is on a distinguished road
Default

i'm really interested on that argument, please let us know.....
__________________
dany_dev is offline   Reply With Quote
Old 05-25-2011, 08:36 AM   #4 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default Poppler on iOS

Might need to start a new question for this.

Trying to cross-compile Poppler 0.16.5 (Poppler) to iOS using the following template (https://github.com/kstenerud/iOS-Universal-Framework). Having no luck at all, but then again, I've never tried compiling a library for iOS.

Does anyone know what the procedure is in regards to poppler? It seems to consist of several makefiles, not sure which ones to use.
mokargas is offline   Reply With Quote
Old 05-25-2011, 09:12 AM   #5 (permalink)
Nuisance Developer
 
Join Date: Jul 2009
Location: Italy
Posts: 4,691
dany_dev is on a distinguished road
Default

what about that?
https://github.com/mobfarm/FastPdfKit

i haven't tried but look some promising....

"Search with highlighted results;"
__________________
dany_dev is offline   Reply With Quote
Old 05-26-2011, 09:05 PM   #6 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default

Quote:
Originally Posted by dany_dev View Post
what about that?
https://github.com/mobfarm/FastPdfKit

i haven't tried but look some promising....

"Search with highlighted results;"
Hey Dany, I evaluated FastPdfKit, but it doesn't meet my needs. Basically, by highlighting, I mean touch and swipe highlighting (like you'd do with the cursor on a desktop PDF editor). The source is, AFAIK, not accessible.

FastPDFKit doesn't allow working with the text directly as if it was a UITextField. It's also a tad expensive for what it provides.

Considering other formats now as well, such as ePub, which is basically archived HTML documents.
mokargas is offline   Reply With Quote
Old 05-29-2011, 10:04 PM   #7 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default

Although ePub would have been a nice solution (view-able nicely in a webview, can get selections via JS), it doesn't support annotations yet.

Back on the PDF track we go.
mokargas is offline   Reply With Quote
Old 06-20-2011, 04:37 PM   #8 (permalink)
Registered Member
 
Join Date: Aug 2009
Posts: 1
katyaskr is on a distinguished road
Default

Quote:
Originally Posted by mokargas View Post
Although ePub would have been a nice solution (view-able nicely in a webview, can get selections via JS), it doesn't support annotations yet.

Back on the PDF track we go.
Did you figure out how to compile poppler? I seem to be on the same track as you for translating touches on PDF documents and got stuck at the poppler library.
If you can post some code, that would be great. Thanks!
katyaskr is offline   Reply With Quote
Old 07-18-2011, 05:31 PM   #9 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Default

Quote:
Originally Posted by katyaskr View Post
Did you figure out how to compile poppler? I seem to be on the same track as you for translating touches on PDF documents and got stuck at the poppler library.
If you can post some code, that would be great. Thanks!
Nope, could not get Poppler to compile on the iPhone. Wasted about a day on that. Currently I'm drawing directly on top of a CGPDF and ignoring actual text selection for now. I did have some success using CGPDF's low level functions, specifically using operator tables to extract text, I just could not seem to marry a visual selection (ie a CGRect) with a underlying text selection.

Hoping iOS5 makes this all easier to do.
mokargas is offline   Reply With Quote
Old 07-20-2011, 05:10 PM   #10 (permalink)
Registered Member
 
Join Date: Jul 2011
Posts: 241
MattW is on a distinguished road
Default

Quote:
Originally Posted by mokargas View Post
Hey there all!

In my app I can currently load and display basic PDF's using Apple's somewhat limited CGPDF API functions (ZoomingPDF example on dev.apple). I can even look at the objects in the underlying PDF structure and place those in an array with not too much trouble.

I decided recently that it would be dandy if I could enable touch & swipe highlighting on my drawn PDF view (I assume apps like GoodReader and iAnnotate do this, but don't know for certain).

Of course, PDF's are basically graphics, not bona-fide text like in a UITextField, so I figured I would need to:
  1. draw the highlight effect as an overlay.
  2. convert co-ordinates and rect of the overlay into an underlying PDF object selection
  3. write out to the PDF using a custom library.

The part I can't get my head around is translating my touch co-ordinates and rects into a word or sentence selection from my PDF's data structure. Does anyone have an idea on how to do this?

PS. I have searched for 3rd-party libraries to allow this with no success (I found FastPDFKit, but that doesn't support the touch translation I'm after). I did also find a very nice C library called PDFEdit, but I'm unsure on how to get iOS's touch commands to work with it.
I've just written an iPad app that does exactly what you describe - iWillHighlight - I wrote my own PDF parser from scratch and, let me tell you, it's quite an undertaking. The PDF file format is immense and the document describing it takes some getting your head around (it's 700 pages, or so) and is not what you'd call self-explanatory when you first begin reading.

Because the PDF format is so old, layer upon layer of features have been added, and each PDF writer interprets and uses them slightly differently. I lost count of the number of times I thought I'd finished the parser, only to stumble across a PDF file that did something a little different and broke it. I'm sure I haven't caught all the obscure edge cases from 20 years ago!

What I did was use CGPDF for the rendering, and my parser extracts the text data and position information. I then draw an overlay over the top to display the highlighted words.
__________________

Highlight PDF text like no other app: iHighlight (now available for iPad and iPhone!)
-----
Create iPhone lists with no typing: Insta-List
-----
Make spelling fun, and create your own tests: iWillSpell
-----
A fast, elegant flashlight app: Insta-Light
-----


FourSixteen Productions
MattW is offline   Reply With Quote
Old 07-21-2011, 06:15 AM   #11 (permalink)
Registered Member
 
Join Date: Aug 2010
Posts: 68
mokargas is on a distinguished road
Talking

Quote:
Originally Posted by MattW View Post
I've just written an iPad app that does exactly what you describe - iWillHighlight - I wrote my own PDF parser from scratch and, let me tell you, it's quite an undertaking. The PDF file format is immense and the document describing it takes some getting your head around (it's 700 pages, or so) and is not what you'd call self-explanatory when you first begin reading.

Because the PDF format is so old, layer upon layer of features have been added, and each PDF writer interprets and uses them slightly differently. I lost count of the number of times I thought I'd finished the parser, only to stumble across a PDF file that did something a little different and broke it. I'm sure I haven't caught all the obscure edge cases from 20 years ago!

What I did was use CGPDF for the rendering, and my parser extracts the text data and position information. I then draw an overlay over the top to display the highlighted words.
Glad to hear you had success Matt! That shows that it is at least possible. I'm slowly writing a parser using the operator tables as a project/torture, but it's more time consuming and complicated than I expected, and I am aware of the differences in the spec format over time, what a nightmare!

What I need to do in the app no longer requires being able to snap to text lines, merely draw on top, since many of the PDF's going into it are actually (and this is terribly stupid) scans of older documents from the 90's.

I hope your app sells well, because there's a hell of a lot of work involved.
mokargas is offline   Reply With Quote
Old 07-21-2011, 01:35 PM   #12 (permalink)
Registered Member
 
Join Date: Jul 2011
Posts: 241
MattW is on a distinguished road
Default

Quote:
Originally Posted by mokargas View Post
Glad to hear you had success Matt! That shows that it is at least possible. I'm slowly writing a parser using the operator tables as a project/torture, but it's more time consuming and complicated than I expected, and I am aware of the differences in the spec format over time, what a nightmare!

What I need to do in the app no longer requires being able to snap to text lines, merely draw on top, since many of the PDF's going into it are actually (and this is terribly stupid) scans of older documents from the 90's.

I hope your app sells well, because there's a hell of a lot of work involved.
Ah yes, those PDFs. Very annoying. I'm probably going to have to add a free-draw mode to my app as well in the future.

My app was born from hearing people complaining about how complicated and cumbersome it is to highlight text in iPad PDF apps. The concept phase involved me thinking 'why hasn't anybody done this before?', and at the end of development day 1 my thoughts were now, 'ahhhh! that's why nobody's done this before.'

It was a frustrating, but ultimately very rewarding project. The most frustrating part being when the text positions spat out of my parser didn't, in some cases, match what was being rendered. I couldn't blame the data because CGPDF was rendering it correctly...

Let me know if you have any problems with the operator tables, and I'll try to help you out.
__________________

Highlight PDF text like no other app: iHighlight (now available for iPad and iPhone!)
-----
Create iPhone lists with no typing: Insta-List
-----
Make spelling fun, and create your own tests: iWillSpell
-----
A fast, elegant flashlight app: Insta-Light
-----


FourSixteen Productions
MattW is offline   Reply With Quote
Old 05-06-2012, 12:45 PM   #13 (permalink)
New User
 
Join Date: May 2012
Posts: 2
dsun is an unknown quantity at this point
Default

Quote:
Originally Posted by MattW View Post
Ah yes, those PDFs. Very annoying. I'm probably going to have to add a free-draw mode to my app as well in the future.

My app was born from hearing people complaining about how complicated and cumbersome it is to highlight text in iPad PDF apps. The concept phase involved me thinking 'why hasn't anybody done this before?', and at the end of development day 1 my thoughts were now, 'ahhhh! that's why nobody's done this before.'

It was a frustrating, but ultimately very rewarding project. The most frustrating part being when the text positions spat out of my parser didn't, in some cases, match what was being rendered. I couldn't blame the data because CGPDF was rendering it correctly...

Let me know if you have any problems with the operator tables, and I'll try to help you out.
Hi Matt,

First of all, I think you're really pro for having successfully done this already I'm still struggling with drawing a CGRect on top of each glyph. Could you please answer a few questions for me?

1) How do you keep track of the parameters, such as a glyph's bounds and positions, when you parse a PDF?

2) Do you draw a highlight by redrawing the entire page of a PDF (i.e. call drawLayer:inContext: of the PDF content view and draw a different colored layer directly in that view), or do you have a separate transparent view (that sits on top of your PDF content view) on which you draw the highlight?

3) After you draw a highlight, do you save it to the PDF file, or have some other way to persist it?

Thanks,
Derek
dsun is offline   Reply With Quote
Old 05-06-2012, 09:58 PM   #14 (permalink)
Registered Member
 
Join Date: Jul 2011
Posts: 241
MattW is on a distinguished road
Default

Quote:
Originally Posted by dsun View Post
1) How do you keep track of the parameters, such as a glyph's bounds and positions, when you parse a PDF?
You have to build up a list of where each word starts and ends. Every PDF contains each font used within it, and each font is described to you in the file. The font information tells you everything you need to know about it, but the pertinent bits to what you're trying to do is the size of each glyph in the font.

Then, in the actual text of the document, the beginning of each line of text has a position set (and this can be done in several different ways - absolute position, next line down from where the current one started, and so on). You'll then get the letters that follow that position and, referencing the information you gleaned from the font, you can figure out the width and height of each word by adding the letter widths together.

Another thing to watch for is some PDF files have spaces between each word, others don't and they supply you a position for each letter within the word instead.

Quote:
2) Do you draw a highlight by redrawing the entire page of a PDF (i.e. call drawLayer:inContext: of the PDF content view and draw a different colored layer directly in that view), or do you have a separate transparent view (that sits on top of your PDF content view) on which you draw the highlight?
NO! The rendering of a PDF page takes (relatively speaking) forever. It'll be the slowest part of your app. Just have a transparent overlay on top of the page and do your highlight rendering there.

Quote:
3) After you draw a highlight, do you save it to the PDF file, or have some other way to persist it?
I don't touch the PDF file at all. I have a companion file that stores the highlight data - this has its pros and cons. The nice thing is that you are in total control of your file format (unlike the nightmare that is the PDF file format! ) and therefore you can keep it simple and make it fast. The downside is that people can't just e-mail their highlighted PDF file to others unless you write an 'export to PDF' function. I've found that the number of requesting this functionality to be quite small, and the simplicity of the first method trumps the second. My app allows people to e-mail or print the actual highlighted text (which is presumably the part of the document they're interested in in the first place!), and that seems to be fine for 99.9% of users.

Hope this helps!
__________________

Highlight PDF text like no other app: iHighlight (now available for iPad and iPhone!)
-----
Create iPhone lists with no typing: Insta-List
-----
Make spelling fun, and create your own tests: iWillSpell
-----
A fast, elegant flashlight app: Insta-Light
-----


FourSixteen Productions
MattW is offline   Reply With Quote
Old 05-07-2012, 11:26 AM   #15 (permalink)
New User
 
Join Date: May 2012
Posts: 2
dsun is an unknown quantity at this point
Default

Quote:
Originally Posted by MattW View Post
You have to build up a list of where each word starts and ends. Every PDF contains each font used within it, and each font is described to you in the file. The font information tells you everything you need to know about it, but the pertinent bits to what you're trying to do is the size of each glyph in the font.

Then, in the actual text of the document, the beginning of each line of text has a position set (and this can be done in several different ways - absolute position, next line down from where the current one started, and so on). You'll then get the letters that follow that position and, referencing the information you gleaned from the font, you can figure out the width and height of each word by adding the letter widths together.

Another thing to watch for is some PDF files have spaces between each word, others don't and they supply you a position for each letter within the word instead.



NO! The rendering of a PDF page takes (relatively speaking) forever. It'll be the slowest part of your app. Just have a transparent overlay on top of the page and do your highlight rendering there.



I don't touch the PDF file at all. I have a companion file that stores the highlight data - this has its pros and cons. The nice thing is that you are in total control of your file format (unlike the nightmare that is the PDF file format! ) and therefore you can keep it simple and make it fast. The downside is that people can't just e-mail their highlighted PDF file to others unless you write an 'export to PDF' function. I've found that the number of requesting this functionality to be quite small, and the simplicity of the first method trumps the second. My app allows people to e-mail or print the actual highlighted text (which is presumably the part of the document they're interested in in the first place!), and that seems to be fine for 99.9% of users.

Hope this helps!
Thanks so much Matt! I'll give it a try now
dsun is offline   Reply With Quote
Reply

Bookmarks

Tags
highlighting, pdf, selection

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



» Advertisements
» Online Users: 358
7 members and 351 guests
doffing81, dre, iOS.Lover, jenniead38, Kirkout, PlutoPrime, Wikiboo
Most users ever online was 1,387, 04-10-2012 at 04:21 AM.
» Stats
Members: 175,663
Threads: 94,120
Posts: 402,898
Top Poster: BrianSlick (7,990)
Welcome to our newest member, LezB44
Powered by vBadvanced CMPS v3.1.0

All times are GMT -5. The time now is 02:08 AM.
Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0