Discussion on:

5
Comments

Join the conversation!

Follow via:
RSS
Email Alert
Are you working with PDFs or Word documents in your code? If so, how are you doing it? Are you happy with that technique? So far, I've been delighted with Syncfusion's Essential Studio, it has done exactly what I needed, and was much less expensive than its competition.

J.Ja
0 Votes
+ -
Another Way
steven@... 11th Nov 2009
I needed to extract text out of PDF files about 3 years ago. I used an open source project named XPDF. I first shelled out a call to XPDF for the given file. It created a text file corresponding to the PDF file, closely following the format of the original.

This only works on *searchable* PDFs (text behind the image).
This looks quite worthwhile. I've often needed to extract text from pdf's but have been unable.
I hate to sound like a noob but can you give a step by step how you got the script to actually extract the text with the programs you mentioned?
Although am picking bits up Programming and scripting i've still much to learn. Tks.
0 Votes
+ -
Contributr
In my application, I am doing a wide variety of searches using Bing (see my article from a few weeks ago about using Bing to search). From the search results, I am getting the URLs of the items I want to download, and I add them to a List object. After I've done all of my searches, I get the unique URLs by using listOfUrls.Distinct(), and iterate through the distinct list, passing each one to the code here. Once the code here gives me back the plain text (this way, my application is ignorant about the source itself, other than the URL, which is not relevant to the application), it performs a good deal of analysis on the retrieved text.

Hope that makes sense!

J.Ja
We create the PDFs in the first place it's just a way of presenting content, not storing or transporting it.

I'd have to say my design was crap, if I ever got in a position where I needed to do that.

So not for me.

We create PDFs with ItextSharp libraries, near every paid for component suite to do that we've tried, had been bloated, flakey and far too often not a good enough fit for the price.


Keyboard Shortcuts:
Prev
Next
Toggle
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the TechRepublic Community and join the conversation! Signing-up is free and quick, Do it now, we want to hear your opinion.