Author Topic: Differences in MB per format  (Read 2211 times)

Offline LR827

  • TS Addict
  • *****
  • Posts: 1840
  • Let's take care of each other
    • View Profile
    • http://www.deardrroth.com/
Differences in MB per format
« on: February 11, 2013, 08:54:52 AM »
When I scan a page from a book and save it in jpeg format, it weighs 1.7 MB; in jp2 (jpeg 2000) format 1.8 MB; in PDF format about 8 MB; and in BMP format 25 MB.

Why the huge difference? I understand -- as I learned at TS some years ago -- that BMP format is most compatible with pasting into a Word doc -- but why is it so big?

Thanks!

Offline Xairbusdriver

  • Administrator
  • TS Addict
  • *****
  • Posts: 26388
  • 27" iMac (mid-17), Big Sur, Mac mini, Catalina
    • View Profile
    • Mid-South Weather
Differences in MB per format
« Reply #1 on: February 11, 2013, 12:32:39 PM »
Never used BMP (never used Word, for that matter!), so I have no experience with it. However, my guess is that it's similar to the "PICT" format used in early Macs. It's simply a bit for bit copy of the file. PDF, of course, is a proprietary format for the text with font and styling formatting as well as page layout and images. Lots of extra data here besides just an image. JPEG (both varieties) are 'lossy' formats where some data is lost every time a file is saved (even if it is not edited). So, depending on what the image is, a file can be much smaller than the original.

Scanning, of course, usually results in an image of some format (jpeg, tiff, png, etc.) so the "text" is no longer really text, it's simply an image of the text. If you want actual, editable text (usually without much, if any formatting) you'll need OCR functions to convert the character images back into real ASCII characters. That may or may not create a different sized file from what the scanner produced. Remember, a scanned file size is very dependent on the resolution used.
THERE ARE TWO TYPES OF COUNTRIES
Those that use metric = #1 Measurement system
And the United States = The Banana system
CAUTION! Childhood vaccinations cause adults! :yes:

Offline LR827

  • TS Addict
  • *****
  • Posts: 1840
  • Let's take care of each other
    • View Profile
    • http://www.deardrroth.com/
Differences in MB per format
« Reply #2 on: February 11, 2013, 04:49:32 PM »
Thanks, Jim. This is a text of my husband's that was published about 10 yrs ago. We don't have a digital document. We do own the copyright, though. I'm just taking pictures of the pages so that we can re-publish it without re-typing the whole doc. Thanks again!

Offline tacit

  • TS Addict
  • *****
  • Posts: 1628
    • View Profile
    • http://www.xeromag.com/
Differences in MB per format
« Reply #3 on: February 13, 2013, 02:43:01 PM »
QUOTE(LR827 @ Feb 11 2013, 02:54 PM) <{POST_SNAPBACK}>
When I scan a page from a book and save it in jpeg format, it weighs 1.7 MB; in jp2 (jpeg 2000) format 1.8 MB; in PDF format about 8 MB; and in BMP format 25 MB.

Why the huge difference? I understand -- as I learned at TS some years ago -- that BMP format is most compatible with pasting into a Word doc -- but why is it so big?

Thanks!


A BMP is not compressed. The scan of the page is 25 MB.

A JPEG is compressed. JPEG uses "lossy" compression--it deliberately degrades the quality of the image in order to make it smaller on disk. JPEG was invented for situations where image quality is not important but file size is very important, such as the Web. It should never be used for situations where image quality is important.

Different programs will save the same JPEG at different sizes because the JPEG standard is designed to let programs determine how much image quality to discard to save space. Larger amounts of quality degradation means smaller files.

PDF files also use JPEG compression. By default they try to save as much quality as they can.

If you are attempting to scan printed pages of nothing but words (no pictures) in order to reprint them, there are some things you can do to keep quality. Do not scan them as color or grayscale; scan them as bitmap. Make the scans high resolution--at least 600 dpi, and preferably 1200. Do not save them as JPEG; use TIFF. (You can save a TIFF with LZW compression, which makes the file smaller without sacrificing quality.)

The images scanned this way will look worse on your computer screen, but much better in print.
« Last Edit: February 13, 2013, 02:45:05 PM by tacit »
A whole lot about me: www.xeromag.com/franklin.html

Offline LR827

  • TS Addict
  • *****
  • Posts: 1840
  • Let's take care of each other
    • View Profile
    • http://www.deardrroth.com/
Differences in MB per format
« Reply #4 on: February 16, 2013, 07:21:56 AM »
QUOTE(tacit @ Feb 13 2013, 03:43 PM) <{POST_SNAPBACK}>
If you are attempting to scan printed pages of nothing but words (no pictures) in order to reprint them, there are some things you can do to keep quality. Do not scan them as color or grayscale; scan them as bitmap. Make the scans high resolution--at least 600 dpi, and preferably 1200. Do not save them as JPEG; use TIFF. (You can save a TIFF with LZW compression, which makes the file smaller without sacrificing quality.)


Thanks, tacit. My concern was for the eventual size of the document and how unwieldy it would be to manage. I recalled, as I mentioned, that I had learned here -- from you, I think -- that Word "liked" BMP images best, so I started using that until I saw the size. And that was only at 300 dpi. I imagine it will even be a significant multiple of that at 600 or 1200. I have not gotten back to it yet because we are visiting relatives in another state right now, but I will try that when I get back. I have used extra care in removing the binding of the book so that each page is perfectly flat, so the quality of the image is very important.

Thanks again,
Lorraine

Offline Xairbusdriver

  • Administrator
  • TS Addict
  • *****
  • Posts: 26388
  • 27" iMac (mid-17), Big Sur, Mac mini, Catalina
    • View Profile
    • Mid-South Weather
Differences in MB per format
« Reply #5 on: February 16, 2013, 10:58:33 AM »
Try TIFF, it has several compressed options and will be more useable in different Systems. Try the same page at both 300 and 600 dpi. Print each scan on as many printers as you can use and see if you (or whoever you can get to help you with the printing) can see any difference. Saving file space, so more people will download them, may end up more important than image quality that may be invisible to the normal eye. wink.gif Of course, if this project is for the Library of Congress, you may want to use 2400 dpi and no compression! laughhard.gif
THERE ARE TWO TYPES OF COUNTRIES
Those that use metric = #1 Measurement system
And the United States = The Banana system
CAUTION! Childhood vaccinations cause adults! :yes:

Offline tacit

  • TS Addict
  • *****
  • Posts: 1628
    • View Profile
    • http://www.xeromag.com/
Differences in MB per format
« Reply #6 on: February 18, 2013, 05:49:55 PM »
QUOTE(LR827 @ Feb 16 2013, 01:21 PM) <{POST_SNAPBACK}>
Thanks, tacit. My concern was for the eventual size of the document and how unwieldy it would be to manage. I recalled, as I mentioned, that I had learned here -- from you, I think -- that Word "liked" BMP images best, so I started using that until I saw the size. And that was only at 300 dpi. I imagine it will even be a significant multiple of that at 600 or 1200. I have not gotten back to it yet because we are visiting relatives in another state right now, but I will try that when I get back. I have used extra care in removing the binding of the book so that each page is perfectly flat, so the quality of the image is very important.

Thanks again,
Lorraine


The pages you're scanning--are they just text, or do they contain color pictures?

If you are scanning 8x10 pages at 300 dpi, that works out to be about 20 MB each if you are scanning in RGB color. If the pages are just text, scanning in RGB color is not the right thing to do.

By way of comparison, 1200 dpi bitmaps will be only 13 MB, and will print much, much better. If you save them as LZW compressed TIFF files, they'll likely be half that size.
A whole lot about me: www.xeromag.com/franklin.html

Offline LR827

  • TS Addict
  • *****
  • Posts: 1840
  • Let's take care of each other
    • View Profile
    • http://www.deardrroth.com/
Differences in MB per format
« Reply #7 on: February 20, 2013, 07:23:41 AM »
QUOTE(tacit @ Feb 18 2013, 06:49 PM) <{POST_SNAPBACK}>
The pages you're scanning--are they just text, or do they contain color pictures?

If you are scanning 8x10 pages at 300 dpi, that works out to be about 20 MB each if you are scanning in RGB color. If the pages are just text, scanning in RGB color is not the right thing to do.

By way of comparison, 1200 dpi bitmaps will be only 13 MB, and will print much, much better. If you save them as LZW compressed TIFF files, they'll likely be half that size.


Thanks, tacit. I noticed that I could scan in Color, Black & White, or Text. I set the latter, and set it for 1200 dpi. It automatically changed the saving option to "tiff." Since the pages are 6 x 9, I changed from the default size from 8.5 x 11 to 6 x9. I scanned 2 pages with those settings and they came out at 570 kb and 620 kb! Tiny! I don't understand how that works, but they printed out very nicely.

Thanks again,
Lorraine

Offline gunug

  • TS Addict
  • *****
  • Posts: 6710
  • TS Palindrome
    • View Profile
Differences in MB per format
« Reply #8 on: February 20, 2013, 10:42:34 AM »
Thanks to all from me; I wanted to know what people were doing about stuff like this but hadn't had a chance to ask!  I'm going to save this stuff as a PDF and put it in a folder of reference stuff I've been saving!
"If there really is no beer in heaven then maybe at least the
computers will work all of the time!"