• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

Compressing the UWP for 8-bit machines

OK someone talk me off the ledge.... I've started researching compression algorithms. LZ77, Arithmetic Coding, LZ4...

It's only worthwhile if you plan to stream the content in en masse.

Then you need the actual memory for the decompression algorithm.

Decompression is typically much cheaper, CPU wise, than compression, so you have that going for you.

Do you actually intend to run this (or expect it to be run) on an actual C-64? From what I've see so far it sounds more like you're running on simulators, and that they're not necessarily running at native speed either.

If you do not intend to actually run (or expect to be run) on a C-64, then feel free to add disks and disk drives. I know you can run at least 2 drives, can C-64 support 4 drives?

Didn't the later models have a higher density drive? I know the Ataris did.

Those are "real cheap" ways to vastly improve storage.
 
Do you actually intend to run this (or expect it to be run) on an actual C-64? [...]

If you do not intend to actually run (or expect to be run) on a C-64, then feel free to add disks and disk drives. [...]

You're right; I lack the hardware anymore, so I run VICE and choose the diskette format. The best for my purposes is the image of the 3.5" 800k disks.

And I can have four drives going at the same time.

And I'm running at 200% speed, which helps and is not too fast to keep up with.

But I've got a big data problem: SEARCH TIME and FILES PER DISK.

(1) If I put data in subsectors, search time is acceptable, but I can only have about 290 files per diskette (unless I mess with the diskette image a bit). 290 subsectors is 18 sectors. Four disks of 18 sectors each is 72 sectors, or only about half of Charted Space.

(2) If I put data in sectors, there is PLENTY of space for FOUR HUNDRED sectors. I mean just THINK ABOUT THAT. 100 sectors on one disk image. Fantastic!

...but search time is slow. SO searching for jump destinations based on your jump number is mostly wasted time with a sector. If we stick to a subsector, though, our time is better.



HOW MANY SECTORS IS ENOUGH?

That's the real question. For my purposes, what's the right number of sectors per diskette? At what point is it reasonable to swap disks (or change drives)?

If I had to be honest with myself, I'd say that 18 sectors is enough: for the sake of Hop drive, it's ten parsecs per rating; given that one sector is 32 x 40 parsecs, a one sector buffer is sufficient to support Hop. Such a buffer around four core sectors results in sixteen sectors.

So subsectors would work with D81 images.



AND THERE IS AN OPTION

Disk images typically reside on one track only, which is why there's a file limit: once the track is filled up, there's no room for more file references. But actually that's not true: you can fake out a disk controller into thinking that the directory is continued on another track. So -- AS LONG AS THE DISK ISN'T VALIDATED -- more files can be referenced than normal.

There's enough disk space for dozens of sectors. So this is tempting.




ANOTHER WAY TO DO IT

Hm. I just had an IDEA.

Commodore allows REL files, which rely on fixed-size records of data for quick access. A REL record can be up to 254 bytes.

So let's consolidate into 4x5 Clusters, or 20 hexes per Cluster. Typically, half of those are empty, so 254 bytes should be plenty. Ten systems at maximum name length would be 240 bytes, and fourteen systems at average name length would be 238 bytes, so I think we're safe unless my data generation scripts tell me otherwise.

A sector has 8x8 = 64 Clusters. At 254 bytes each, that is 64 disk blocks (16k).


There's one more option with REL files: I go totally wasteful, and make one entry per hex, allocating the full 24 bytes for each hex, including empty ones. This effectively doubles the size of the REL file to 124 blocks (32k).

But a D81 could still hold 25 sectors like that.

More than enough.

OK. I'm sold. I think this is better than sequential subsector files.

And... I could store information in those otherwise empty hexes, I suppose. I don't know what kind of information, but surely I could store something.
 
Last edited:
Aaaand... Robject is in Mad Computer Scientist Mode...

If the app isn't able to be used effectively in real time on real hardware, then it's failed one of the strongly implied goals from post 1 already...
 
USING "RELATIVE" FILES

REL files hold fixed-length records. Fixed-length is inefficient, but REL is fast. Bitfields reduced my record size; now REL will give me the tradeoff of space for speed. I'll take that tradeoff.

But first, there's one more benefit. REL files are accessed by number, right? So what I'll do is create one record for every hex in the sector -- that's 1280 records. Wasteful I know, but here are the benefits:

(1) The record number IS THE HEX LOCATION. Hex math maps directly to the record location. That is FAST.

(2) It also means the hex number is NO LONGER part of the record. It's the record NUMBER. My bitfield data just got one byte shorter!

So this is a win-win. The more I think about it, the more I like it.


THE HEX RECORD

Re-engineered so that no byte is zero, and compressed so tightly it's fusing.

Code:
(1) sp:3 (X=1, E=2, D=3, C=4, B=5, A=6), TL:5 (0-31).
(2) atm:2 (0=breathable, 1=taint, 2=exotic, 3=corrosive), pop:3 (0=transient (Ba=none), others=7+value), travel zone:2 (1=amber, 2=red), belt:1 (0=no).
(3) LL:3 (1=low, 2-4=moderate, 5-7=high), bases:3 (-, A, B, D, M, N, S, W), orbit:2 (1=inner, 2=HZ, 3=outer).
(4) trade code map byte 1 : Ag As Ba De Fl Hi Ic In ** 0=PRESENT, 1=NOT
(5) trade code map byte 2 : Lo Na Ni Po Ri Sa Va Wa ** 0=PRESENT, 1=NOT
(6) imp:3 (-3 to +5), star color:3 (1=OBA, 2-6=FGKML), giant star: 1 (0=no), gg:1 (0=no).
(7-23) Mainworld name

23 bytes per record x 1280 records = 29k = appx 118 blocks on disk (including index). The 1581 image can hold up to 26 sectors of this sort. That's more than enough. Verdict: the inefficiency is worth the benefits.

I can even put two bytes back into the record. I'll think about that.

I could replace the world name with a pointer to the world name string loaded from another file...
 
Last edited:
I have to say - this is an interesting experiment. My early days of programming were all fixed-length records (COBOL and implied decimal to save that 1 byte!) and dealing with phone switch records, all were all fixed length (and mixed ASCII and binary, yay AT&T).

Most developers these days are lazy and don't understand half of what they are doing due to the framework (we use Ruby on Rails at work; personally I hate it due to its very strong (and wrong I think at times) opinions and the "magic" stuff it does, but it is what it is).

This old-school programming, having to worry about a byte here & there, and figuring the record ID = hex number: its the good stuff. But I don't miss it too much :)

Besides, 640K should be enough for anyone!
 
Oh my. Oh. My.

Catching up on this thread has brought back some vivid memories of doing exactly the same thing on a C64 with a 1541 drive back in 1990. I'm fairly sure that I even made similar design decisions re: packing trade codes as bits.

Sadly, I don't think that I'll be able to lay my hands on my programs; if they still exist, they'll be on a 28 year old floppy that's been in my parents' attic all that time.

robject, I honestly don't know whether to view your current endeavour with admiration or pity. :)
 
I glanced at Vice, I honestly don't know anything about the C64 -- have you considered using a "FileSystemDevice"? That, apparently, lets you mount a directory on your host file system? Then your disk space woes are gone. Still have data volume issues (how much data) in terms of processing, but raw storage is nor longer a problem. Nor is things like disk latency, not compared to 1541 at least (using your REL example).
 
Thanks, guys. I appreciate your pity. This madness will pass. Until then...

I've refined my plan.

First, the bitfields will shrink to SIX bytes. I'll do this by swapping out the trade code flags for a one-byte INDEX into a table of trade code sets. Should be sufficient. We'll see. I could do the same thing for Starport + Base.

Code:
(1) Subsector hex row: 4, Subsector hex col: 3, GG presence: 1 (0=no)
(2) Starport + Base: 4 (1=A, 2=A+N, 3=A+S, 4=A+NS, 5=A+D, 6=A+W, 7=B, 8=B+N, 9=B+S, 10=B+NS, 11=C, 12=C+S, 13=D, 14=D+S, 15=E), atm:2 (0=breathable, 1=taint, 2=exotic, 3=corrosive), travel zone:2 (1=amber, 2=red).
(3) TL:5, LL:3 (1-7; multiply value by 2).
(4) pop:3 (0=transient (Ba=none), others=6+value), orbit:2 (1=inner, 2=HZ, 3=outer), imp:3 (-3 to +5).
(5) trade codes index byte.  1-255.
(6) star color:3 (1=OBA, 2-6=FGKML), giant star: 1 (0=no), belt:1 (0=no).
(7+) Mainworld name = pairs of Z-text bytes (which includes End Of String mark).

I'm also compressing the mainworld names using a scheme based on Z-Machine text encoding. This shrinks the "average" length of a world name by 30%.

Thus the average for my painfully-compressed UWP is FOURTEEN bytes, and individually ranges from 10 to 18 bytes.

One REL file holds all the data for one sector; its records are on the order of 200 bytes each. Three such records contain data for a subector; there will be 16 of these, so there will be 48 records in the REL file dedicated to them. Including one record for metadata, the structure of a REL sector file would be:

[Record 1 = metadata]
[Records 2-49 = subsectors A-P.]

Total size for such a REL file is about 40 blocks, or 10k. It's a compromise between data access speed and efficient size.


Z-Machine Text Encoding

Code:
Z-char 0123456789abcdef0123456789abcdef
       --------------------------------
     x ABCDEFGHIJKLMNOPQRSTUVWXYZ '  -y
     y 0123456789                    -x

x = toggle to the 'x' character set. Also used to pad end of string if necessary.
y = toggle to the 'y' character set. Also used to pad end of string if necessary.

and

Code:
   Text in memory consists of a sequence of 2-byte words.
   Each word is divided into three 5-bit 'Z-characters',
   plus 1 bit left over, arranged as:

   --first byte-----   --second byte------
   0 1 2 3 4   5 6 7   0 1   2 3 4 5 6   7
   --first--   --second---   --third--  END

   Bit 7 tags the end-of-string.
 
Last edited:
ALTERNATIVE #1: REU

First alternative is a cartridge image that contains all of Charted Space "in memory", and uses a small program that does bank switching to get access to that memory in chunks.

REU carts for the C64 range form 512K up to 16 megabytes (!) in size. 4mb is plenty for storing all 80,000 worlds in Charted Space.


ALTERNATIVE #2: TCP/IP

This means I use a modem (ok an emulated modem) to talk to the interwebs to get sector data. Probably, this means talking to a small gateway program that translates the calls and transmits queries to TravellerMap, and vice versa. And... probably has a cache as well so it's not always hitting the site.
 
Last edited:
You can't just connect to an arbitrarily large disk drive, like the file system option?

Defeats the purpose of writing it for an old computer.

If you want a universal VM, go with inform. (It's got more implementations than Java!) The IVM won't do complex graphics, tho'. So you have to do text-driven quasi-graphics. But it does do universal text handling on everything from an Apple II (no +, no e, no c, no GS) and Osborne 1, through the most modern hardware. And most places in between. (Including PalmOS 2+, Newton OS 1.1 and up, iOS, CP/M, VMS, almost every flavor of unix and linux, Symbian, Blackberry, and a half-dozen proprietary phone OS's)

Robject's strongly implied goals include it being able to run on real C64 hardware, albeit potentially slowly. Connecting an arbitrarily large drive is not an option on real C64 hardware.
 
ALTERNATIVE #1: REU

First alternative is a cartridge image that contains all of Charted Space "in memory", and uses a small program that does bank switching to get access to that memory in chunks.

REU carts for the C64 range form 512K up to 16 megabytes (!) in size. 4mb is plenty for storing all 80,000 worlds in Charted Space.


ALTERNATIVE #2: TCP/IP

This means I use a modem (ok an emulated modem) to talk to the interwebs to get sector data. Probably, this means talking to a small gateway program that translates the calls and transmits queries to TravellerMap, and vice versa. And... probably has a cache as well so it's not always hitting the site.

Defeats the purpose of writing it for an old computer.

Really?

Seems to me that he's tossed the original C64 under the bus a long time ago. Both from a raw performance perspective (since he seemed to have no intention of running his simulator at native speeds) along with his other expansions of storage technologies.

Surprised he hasn't upped the ante to a C-128 or some other exotic banked RAM expansion yet.

As far as I know, in the past folks were able to attach hard drives to their C64, and once that door's open, well, capacity is pretty much unlimited. Plus there are several modern peripherals that allow attachment of flash drives, etc. to a C64.
 
As far as I know, in the past folks were able to attach hard drives to their C64, and once that door's open, well, capacity is pretty much unlimited. Plus there are several modern peripherals that allow attachment of flash drives, etc. to a C64.

The usual file systems for C64 don't have directory trees and have a limited number of files. The machinery of the third party adapter presents the HD or the SD card to the machine as a 1541c drive, usually with an external push button to cycle through the stored disk images.

Most early machines were similar. One of the complaints about CP/M was the limited file names... 8+3... just like it's bastard child MS-Dos.

And while there are Hard Drive cards for, say, Apple //e and IIgs, they don't work well at all for partitions above 10 MB. (I've a IIgs with a 180 MB HD... I can only access 4 partitions unless I physically disconnect the floppy drives and/or turn off {in the system bios-equivalent} access to the peripheral ports - Serial, Parallel, Joystick, and the ones above 10 MB cannot be booted from. Arranging the partitions for usefulness took a lot of juggling.)

As for an REU cart - carts were intended to allow bank-switching. It's a standard practice in C64 carts (and C128, Vic20, some Atari 2600, some Atari 400/800, as well as a few nintendo carts). The C64 memory is actually overloaded - there's 20 K of ROM without a cart in, and 64 K of RAM, but only 64 K of address space... 3 bits of byte $0001 are used to control bank switches for areas A000-BFFF, D000-DFFF and E000-FFFF. (total, 5 banks). In fact, one loses RAM by using a cart... but the cart can be ram or rom, and can itself monitor for a specific poke to bank switch.

Heck...
$0801-$9FFF Default BASIC RAM area (38911 bytes).
$8000-$9FFF Optional cartridge ROM (8192 bytes). RAM if no cartridge in.
$A000-$BFFF Basic ROM or Machine Code accessible RAM
$D000-$DFFF Character ROM or I/O or RAM
$E000-$FFFF Kernal ROM or Machine Code accessible RAM

They got bank switching going on in routine use.
Given the 4K blocks switchable, if you lose one byte for switch control, one could theoretically make a 128 KB - 16 B bank-switching cart, showing 8 KB per window.

In fact, someone has done a manually switchable 128 KB breadboard project.
 
Seems to me that he's tossed the original C64 under the bus a long time ago. Both from a raw performance perspective (since he seemed to have no intention of running his simulator at native speeds) along with his other expansions of storage technologies.

Surprised he hasn't upped the ante to a C-128 or some other exotic banked RAM expansion yet.

As far as I know, in the past folks were able to attach hard drives to their C64, and once that door's open, well, capacity is pretty much unlimited. Plus there are several modern peripherals that allow attachment of flash drives, etc. to a C64.

LOL. Well I do what I can. And yeah, the REU is exactly an exotic banked RAM expansion, so there's my ante.

And yeah, I know VICE accesses the native file system as if it were one gigantic disk. But I don't want to go there. Not sure why... actually I know why: it's not as easy to hand to someone else and say "run this". An REU image is still unitary -- similar enough to a disk image that I could post it to Facebook and say "run this".

Granted, the file system is just one unzip removed from that, but still.

LONGER-RANGED GOAL

I'll up the ante one more: a clever Australian PhD is using his students to engineer a <thing> that can run a Commodore 65 ROM -- which was an 8-bit prototype that never made it out the door, except as a dozen working and semi-working prototypes out there in the big world. (A working prototype sells for EUR 81,000.00 in Germany for instance... or is that 81.000,00?) He essentially has his students working on a very beefy but 8-bit machine using a legacy OS with modern hardware.

So my long-range goal is to port all this nonsense to his MEGA65 when it's closer to ready. http://mega65.org/
 
Last edited:
Since I'm targeting the software for CommodoreServer(.com), I am therefore going to have to use SEQ files on 174k D64 diskette images.

The average UWP is 12.6 bytes, so the average subsector is 504 bytes (2 blocks), and the average sector is 8064 bytes (32 blocks).

Data is stored in quadrant-sectors at 8 blocks per quadrant (so 80 quadrants/20 sectors would take up 640 blocks). The filename format makes it easy for the program to find neighboring quadrants.

Row/Col Sector name Subsector letters.

"Row/Col" are the important numbers, because they uniquely identify the quadrant.

Code:
"15/21 SPIN ABCD"


Data is read in via:
GET#CH, B0$, B1$, B2$, ... B6$ : rem 7 bytes
INPUT#CH, NA$ : rem name, eol terminated

Strings are used as single-character buffers, and the operation chr$(x) is used to turn a byte value x into a string with one character in it. (Going the other way, ASC(x$) returns the value of the first character in x$.)

The Official Structure:

Code:
Byte 1:
   subsector col: 4 bits (two subsectors across)
   Starport + Base: 4 bits
Byte 2: 
   subsector row: 5 bits (two subsectors deep)
   Travel zone: 2 bits (1=amber, 2=red)
   GG presence: 1 bit
Byte 3: 
   TL: 5 bits
   Atmosphere: 2 bits (1=taint, 2=exotic, 3=corrosive)
   Belt presence: 1 bit
Byte 4:
   LL: 3 bits (1-7; multiply value by 2)
   Pop: 3 bits (0=transient (Ba=none), others=6+value)
   Orbit: 2 bits (1=inner, 2=HZ, 3=outer)
Byte 5:
   Trade Codes
Byte 6: 
   Trade Codes
Byte 7: 
   Star color: 3 bits (1=OBA, 2-6=FGKML)
   Giant star: 1 bit
   Importance: 3 bits (-3 to +5)
Bytes 8...n: Mainworld name, in pairs of Z-text bytes

Starport + Base: (1=A, 2=A+N, 3=A+S, 4=A+NS, 5=A+D, 6=A+W, 7=B, 8=B+N, 9=B+S, 10=B+NS, 11=C, 12=C+S, 13=D, 14=D+S, 15=E)

Z-text shrinks world name by 30%. So average length shrinks from 8 characters to 5.6.
 
Last edited:
Back
Top