• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.
  • We, the systems administration staff, apologize for this unexpected outage of the boards. We have resolved the root cause of the problem and there should be no further disruptions.

Standards for PDFs on CD-ROMs

DonM

Moderator
Moderator
Marquis
Ok, here's a real kettle of worms, and I really do want this information, so I'm going to open it up.

I want to keep this away from specific items and focus on standards and software. There are a lot of software packages and tools that can be used to create PDFs. But I don't have a lot of money (hey, my son went off to college this year, and so I'm working on a fourth mortgage and a fifth ulcer, ok?)...

What are reasonable expectations for what a PDF should have. I want cheap and easy, and Don could do this at home while begging his wife for forgiveness on buying all that Traveller stuff over the years but now she can't buy those new 4E books (I married a gamer, but she plays the wrong game).

The obvious things to me are: clean OCR, clean images and small size. I'd like bookmarks, but if I had to decide between free and bookmarks, free wins.

For the Winter War convention, I've used Word 2003 printing to PDF995. Of course PDF995 can't easily do bookmarks. Word 2007 does do bookmarks but the PDF size is bigger and the image not as clean as PDF995 (like Gates couldn't afford good image code?). But for generating images (or cleaning bad images up), or taking images to ocr, I have no experience.

This isn't really for the CT CD-ROM project. Marc's handling that end of things. However, I realize I'm into an area I lack knowledge, and am hoping that some of the more industrious members of our forums might enlighten me...
 
Until someone more knowledgable comes along, I can offer that 200 DPI resolution is reported to be the minimum for OCR software to work.

Just my 2 cents.
 
This isn't really for the CT CD-ROM project. Marc's handling that end of things. However, I realize I'm into an area I lack knowledge, and am hoping that some of the more industrious members of our forums might enlighten me...

Well there are some massively skilled computer users on this BBS (Mickazoid, et al) but you might be forced to google for Forums that deal with OCRing and creating PDFs. I would also try the Adobe BBS and see if there are forums there that can point you in the right direction. You can also purchase OEM software, depending on how you feel on the subject. The prices are generally much more reasonable.

Back for the GURPS Vehicle Builder testplay (GURPS Vehicles was turned into a computer program by the geniuses over there) I did the scanning from several books to include various weapons. It was a small part and they liked it enough to get me a playtest copy of GVB. I also built one of the robots in the manual.

Things I learned: (I'm a home-scanner ;))

Font is important. Both size and type. I used GURPS Ultra Tech 3e & other 3rd edition books and my OCR/Scanner, which wasn't bad considering it was free, had a tough time with the GURPS font in those books. I'd say it was 75% correct after a scan, and believe me 25% error is quite a bit.

I used the same software for scanning other RPG books, and it was like day and night. 90% to 95% correctness, because the font was simply different.
(easier for the program to read). Despite the drop in errors, I'm still picking errors out of that 5% (scanned in 2003-4). Get ready to edit and re-edit your work.

I can program in a script language. Generally what I did was keep track of
repetitive "search and replace" work that I had to do, such that in some cases the OCR saw "100" as "1%"; I'd log that, then write that into my
scripts and after any scans have the script find all "1%" and change it into "100". You'll find many other things the OCR will mistakenly create.

If you have a problem font, see about making xerox copies and enlarging the pages, then scan the copies. Make sure your copies are nice and dark
and your xerox is also "clean" no smudges and so on.

Obviously, clean your scanner too :D.

Next was scanning tables. GURPS has lots of tables for it's weapons. Scanners can get confused easily, especially with whitespace. Even though it looks logical to the human eye the scanner can often go off in a different direction from too much whitespace. Underline all rows in a table. As soon as I did this the scanner scanned correctly, from left to right, across the page, since it had a nice dark line to follow.

What will be tough is scanning pages that have many columns, sidebars and inserts (basically anything that's not simply single paragraphs that cross the entire page). Get ready to scan, study the output and then process the data.

The Windows FAX viewer (comes free with XP) has a built-in OCR processor in it. The free scanner tool (also with XP) allows me to scan text into a TIFF file. I then pump this thru Windows FAX viewer for the actual text. Scanning an entire page might not be a good idea. If your software can draw a box around a section of text, then just scan that it might be easier, depending on how much time you have. Get ready to experiment.

However, all of the above DOES NOT produce a PDF. It just takes the images scanned and produces text. A PDF not only has the image of the page and it's font, but also is linked to the text behind that. That I don't know how to do. Probably requires Adobe Professional or another program that does the equivalent. They're not cheap, hence a lot of people go for OEM software. I'm not advocating that BTW, I've never purchased any myself, one look at the $$ though and
you'll understand. As soon as you see what goes into producing a searchable PDF you also see why they charge $$. ;)


>
 
Last edited:
OCR works best between 150 and 300 DPI; some packages can use finer, but many can't. A few OCR programs can't handle above 200DPI.

150DPI is minimum acceptable printing resolution.
 
Yes, font is so important when scanning......

As far as making a PDF, OpenOffice is free and makes PDFs from any of its modules: documents, spreadsheets, presentations.
 
Are you talking about dealing with scans, or with creating new from scratch PDFs? Two completely different topics, as new PDFs don't have the OCR issue.

There are a plethora of free tools for making documents and PDFs from those documents.
 
whartung: Both... I'm in a topic I know very little about, so I'm trying to find out more.
 
Open Office is free and creates PDF files... your scanner should have come with OCR software ..Open office has some OCR software built into it I belive so go get it and roll with it
 
Back
Top