• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

Compressing the UWP for 8-bit machines

only needs 1 byte per code as is.
Also, C64 BASIC (I somehow doubt Rob's using ASM) doesn't handle Int32; it limits to sInt16, with no provision for uInt16. Strings are 1 byte length and the content bytes in PETSCII. Floats are a 1 byte exponent and 4 byte mantissa
Floats are very memory lossy, especially given 38 KB functional memory; one is better off using an int with 0-32 + (0-32*32) (12/16 bits) or (0-16)+(0-16*16)+(0-16*256) (15/16 bits), both fitting into a 2 byte memory word, as opposed to a 5 byte float; the 5 byte float is 1 byte longer than 2 ints, for the same data in the 5 bit wordspace.
Strings are somewhat better, but typecasting is absent, and bitwise operations are a pain, so...

6502/6510 stuff. Ugh.
 
For now, I'm using this scheme:

Code:
DATA 29A88CA8RCS  G-IF5REGINA

00-01: 29 = subsector hex location (0310).
02-05: A88C = Starport, Atm, Pop, TL.
06-07: A8 = Bases and Importance.
08-12: RCS = Remarks (Rich, Capital, Satellite).
13-15: G-I = GG present but no Belt; no travel advisory; Imperium
16-17: F5 = Primary star.
18+: Name.

The entire set of data for one system encoded in one string, 25 bytes length average, with an average of 3.25 blocks per subsector.
 
(One very nice feature of C64List is I can inline assembly with BASIC, label its entry points, and it will shoehorn it into memory right after the BASIC program.)

I use C000+ for sprite data... and I reserve the tape buffer for position data.

But you've just given me an idea: I could use C000-CFFF to store a compressed subsector. That's what you were hinting at, weren't you?

Close - it's a good spot for an UNcompressed but trimmed subsector as machine code accessed uInt8's and sInt8's...
80 hexes, ~ 80 bytes each.... 1600 bytes... You can actually put 2 of them in that memory block... with some spare (because that's 3.2 KB.)

we don't need a uInt16 for the hex - but we do need at least 12 to store it as a natural number, or an 9 bit for stored as XXXXXXXWWWWHHHHH as part of a uInt16; so we can add in some bitwise data...

For a whole sector index, with 2x uInt16's...
World Symbol Types: 4... need 2 bits.
Belt
Dry
Wet
No world.

So...XXXXXSSWWWWHHHHH
let's see what we can cram in for drawing.
2 bits for Trade zone (Green, Yellow, Red, unrated)
so... XXXTTSSWWWWHHHHH
1 bit for GG present
1 bit for Navy
1 bit for sCout
CNGTTSSWWWWHHHHH
and then 16 for the memory address of the actual data for the hex.

enough for a quadrant, but not a whole sector. Load by the quarter subsector...
 
What's Our Character Inventory?

Mashing data into shorter bitlengths is a common thing. And the UWP doesn't use a large character inventory. Excluding the braces, what do we see?

A-Z and Space
0-9
The dash (optional)

That's it, really. 38 characters.
- ' . are not optional, e.g "St. Sebastien", "Tinea-Fabre".

Capitalisation may be non-obvious, so both common and capital letters are probably required, e.g. "McBain" vs "Mccandliss", "T'vira" vs "L'Engle".

Are you sure only a-z are required? Most languages are not as letter-challenged as English, and there are certainly worlds named in other languages. What do you do with, say, 'ö', 'ø', 'ü', 'ç'? At a quick search I could find the non-canon polities of "Dienbach Grüpen" and "Dienbach Grÿpen" and the worlds of "Ferré" and "Chuño" in the TravellerMap.

! seems to be used as a letter, e.g. "!!rrarii".

# seems to be used, e.g. "Th'X #1138" (non-canon?)
 
Last edited:
Thinking some more... looking at T5•10, Bk2 pdf page 176...

Needed codes: Ag As Ba De Fl Hi Ic In Lo Na Ni Po Ri Va

Needed data for generating trade data: SP, Pop, TL, Importance (for mail routes...), hex location, presence of GG, presence of Water

Add orbit number, size, atmosphere, and star type(s) for dealing with time in/out. ON is up to 20, so 5 b, Stellar type is –OBAFGKML which is 9... so !@#$ 4 bits. Size is 0-7 (if no type, 0 is none; if type L, result is BD) for 3 b. Decimal is 4 bits. (but if you used 3 b, and add 1, you get 1-9... and only those counting or crosschecking will notice. ) So that's an optional 12 b.

14 TC into one uInt16 with 2 b to spare.
We can cram SP (–ABCDEX?) into 3 bits (ignoring spaceport codes as those are never mainworlds), size into 4 b (0-15), and TL into 5 b (0-31).

Hydro can be compressed into
None (=0) • Dry (=1-3) • Normal (4-9) • Water (A)
for a nifty short 2 bits.

Atmo can also be compressed... taint doesn't matter for landing. 3 b instead of 4...
0 None/Trace
1 VThin
2 Thin
3 Standard
4 Dense
5 Exotic
6 Corrosive
7 Insidious

Or even further to 2B
0 None
1 unbreathable
2 Breathable
3 Hazardous

Still, I'd prefer to keep the standard 4 b entry.

LL should be encoded for "hassle-factors" - that's another 5 b. (And also for access to wild water for frontier refuelling.

Primary needs to be set to 31, rather than 0, because 0 is for contact binary...

Let's see... word at a time...
14 trade codes + Travel Zone = 16 b
SP (3) Pop (4) Imp (4) TL (5) = 16 b
Orb (5) Hex#L (6 b 0-40) Hex#R (5b 0-32) = 16 b
Atmo (4) GG? Belt? NSMR (6) Hydro (2) Size (4) = 16 b
Star Color (4) Star size (3) Star decimal (4), Star Orbit # (5b) = 16 b
 
- ' . are not optional, e.g "St. Sebastien", "Tinea-Fabre".

Capitalisation may be non-obvious, so both common and capital letters are probably required, e.g. "McBain" vs "Mccandliss", "T'vira" vs "L'Engle".

Are you sure only a-z are required? Most languages are not as letter-challenged as English, and there are certainly worlds named in other languages. What do you do with, say, 'ö', 'ø', 'ü', 'ç'? At a quick search I could find the non-canon polities of "Dienbach Grüpen" and "Dienbach Grÿpen" and the worlds of "Ferré" and "Chuño" in the TravellerMap.

! seems to be used as a letter, e.g. "!!rrarii".

# seems to be used, e.g. "Th'X #1138" (non-canon?)

I concur in principle... but PETSCII doesn't support accents anyway. It also has two modes of display... one of which has no lowercase letters.

Oh, and just for fun... it doesn't quite line up with ASCII, either, in the lower 32... and it's based upon an earlier ASCII standard than most of the rest of the computing world. Plus additional control codes in the 0x80-0x9F range.

Oh, and 0xC0-0xDF? Keboard return codes, which are displayed as 0x60-0x7F, but can be stored/used. Fun, no?
 
I concur in principle... but PETSCII doesn't support accents anyway.
E.g. 'ö' is an accented o in some languages (English), umlauted o in some languages (German), and a separate letter in some languages (Swedish). It is in general not correct to replace 'ö' with 'o', e.g. schön and schon are completely different words in German.

Internationalised strings are a pain...
 
I'm willing to get slightly lossy with the data...

Atmo can also be compressed... taint doesn't matter for landing. 3 b instead of 4...

I agree with 2 or 3 bits. My take is:

0 breathable unless Va flag is set.
1 tainted
2 exotic
3 corrosive/insidious (=hazardous)

But yours also works.

0 None
1 unbreathable
2 Breathable
3 Hazardous

LL should be encoded for "hassle-factors" - that's another 5 b. (And also for access to wild water for frontier refuelling.

You're right, unfortunately. But I bet LL could be truncated. Divide by 2 for example.

Importance can be shortened to 3 bits: add 4 and force range in [0,7]. It works well enough.
 
Last edited:
So here's where I'm at. If I use strings, I can fit a UWP into as little as 25 bytes on average: 17 bytes of data, and worldname for the rest.

If I use binary, I can fit the data into 8 bytes, but I need to write my own routines to read in the world name (string length + characters), and the UWP shrinks to 17 bytes on average.

The tradeoff is handling time vs memory use.
Strings faster, binary smaller.
 
Okay, I've got a handle on reading and writing binary data in Commodore BASIC. Strings are used as single-character buffers, and the operation chr$(x) is used to turn a byte value x into a string with one character in it.

Going the other way, ASC(x$) returns the value of the first character in x$.

So, that's how I'm going to store binary data. I'll need to write a custom routine that reads a fixed number of bytes for the binary data (a loop of GET# into an array perhaps), then reads the world name as a plain string (INPUT#).

Or this:
GET#CH, B0$, B1$, B2$, B3$, ... B7$ : rem 8 bytes
INPUT#CH, NA$ : rem name, eol terminated

If I keep the data in SEQ files (or even a tape image), I can still have variable-length records. Even though searches are slower, the disk space savings is huge. And there's the added benefit that searching adjacent sub sectors is faster when the files are smaller.
 
Last edited:
Feeling proper respect for Elite-C64 now?

Respect deserved.

Note that their data is procedurally generated, on demand, from a set random seed -- a far easier undertaking -- and the universe is smaller: 8 galaxies x 256 systems per galaxy = only 2,000 systems, or about 5 sectors.
 
Last edited:
Respect deserved.

Now, I always thought their data was generated from a set random seed -- a far easier undertaking -- except perhaps for the system names (wasn't each system named?)

Also, the universe is smaller: 8 galaxies x 256 systems per galaxy = only 2,000 systems, or about 5 sectors. A 5.25" Commodore floppy could fit 80 bytes of information for each system.

The name list was fixed in code. It's always the same for a given galaxy.
 
BITFIELD DATA

Code:
(0x00) hex-row:3 (1-8), hex-col:4 (1-10), gg:1 (0=no).
(0x01) sp:3 (X=0, E=1, D=2, C=3, B=4, A=5), TL:5 (0-31).
(0x02) atm:2 (0=breathable, 1=taint, 2=exotic, 3=corrosive), pop:4 (0-15), travel zone:2 (1=amber, 2=red).
(0x03) trade code map byte 1 : Ag As Ba De Fl Hi Ic In 
(0x04) trade code map byte 2 : Lo Na Ni Po Ri Sa Va Wa
(0x05) imp:3 (-4 to +4), orbit:5 (0-31)
(0x06) LL:4 (0-15), bases:3 (-, A, B, D, M, N, S, W), belt:1 (0=no).
(0x07) star color:3 (OBAFGKML), star size: 3 (x, I, II, III, IV, V, VI, D), star decimal: 2 (0=0,1=2,3=5,4=8).

The byte data is built (by Perl) in this manner, for each world:

Code:
Byte 0 = Hex row (the left two characters as an int)
+ Hex col (the right two characters as an int), left-shifted by 3 bits
+ G? 128:0

Byte 1 = starport mapped as above
+ TL left-shifted by 3 bits

Byte 2 = atmosphere mapped as above
+ POP, left shifted 2 bits
+ zone mapped as above, left shifted 6 bits

Byte 3 = eight trade codes shifted in
Byte 4 = eight more trade codes shifted in

Byte 5 = importance+4, truncated to 0-7
+ orbit, left shifted 3 bits

Byte 6 = LL
+ bases mapped as above, left shifted 4 bits
+ B? 128:0

Byte 7 = star color mapped as above
+ star size mapped as above, left shifted 3 bits
+ star decimal mapped down to a 2-bit index, left shifted 6 bits.

World name is converted to all caps, otherwise not messed with.

After reading eight bytes from disk, I'll decode them.
 
Last edited:
THE RESULT

...is even better than I expected. 12 of the 16 subsectors in the Spinward Marches are only 2 blocks long; that is, 508 bytes or less. Subsectors H, N, and P are 3 blocks. Average: 2.25 blocks per subsector.

Now I have to write a little BASIC that decodes these files correctly.
 
Sweet.

This is such a weird project, but it's a cool one.

Gonna end up writing any of this bit-packing routines in Assembly? Does the emulator support that?
 
Sweet.

This is such a weird project, but it's a cool one.

Gonna end up writing any of this bit-packing routines in Assembly? Does the emulator support that?

LOL. The whole effort is weird. Hysterical maybe. But let's just call it "retro" so I don't feel like I'm actually wasting valuable time. :rofl:

Assembly, yes, some of this will end up there. C64List allows inlining assembly, interleaved with BASIC (!), and the VICE emulator is a true emulator that actually runs on ROM dumps of the system chips. I could run one of Jim Butterfield's machine language monitors on it. VICE is a thing of beauty and danger.

I think the control loop will always be BASIC though, which relegates assembly to graphics and tight loops (like bit unpacking perhaps?). Anyway, BASIC first, then optimize.

I just want to fly around Charted Space and interact with it... on the (emulated) Commodore 64...
 
In order to extract my data, I wrote a very tidy little routine that does a shift and a bitmask, in that order. So for each field I need a shift and a bitmask.

Code:
(0x00) hex-row:3 (1-8), hex-col:4 (1-10), gg:1 (0=no).
Masks: 7, 15, 1
Divisors: 1, 8, 128

Code:
(0x01) sp:3 (X=0, E=1, D=2, C=3, B=4, A=5), TL:5 (0-31).
Masks: 7, 31
Divisors: 1, 8

Code:
(0x02) atm:2 (0=breathable, 1=taint, 2=exotic, 3=corrosive), pop:4 (0-15), travel zone:2 (1=amber, 2=red).
Masks: 3, 15, 3
Divisors: 1, 4, 64

Code:
(0x03) trade code map byte 1 : Ag As Ba De Fl Hi Ic In 
Masks: 1, 1, 1, 1, 1, 1, 1, 1
Divisors: 1, 2, 4, 8, 16, 32, 64, 128

Code:
(0x04) trade code map byte 2 : Lo Na Ni Po Ri Sa Va Wa
Masks: 1, 1, 1, 1, 1, 1, 1, 1
Divisors: 1, 2, 4, 8, 16, 32, 64, 128

Code:
(0x06) bases:3 (-, A, B, D, M, N, S, W), LL:4 (0-15), belt:1 (0=no).
Masks: 7, 15, 1
Divisors: 1, 8, 128

Code:
(0x05) imp:3 (-4 to +4), orbit:5 (0-31)
Masks: 7, 31
Divisors: 1, 8

Code:
(0x07) star color:3 (OBAFGKML), star size: 3 (x, I, II, III, IV, V, VI, D), star decimal: 2 (0=0,1=2,3=5,4=8).
Masks: 7, 7, 3
Divisors: 1, 8, 64
 
ATTACHED: a D81 image with four sectors' of data on it: the Marches, Deneb, Gvurrdon, and Tuglikki. They only take a small fraction of the available disk space, and I'll want to put dozens of sectors on the disk once the program's working.

I updated the bitfields some so that I could store a sector in each file, rather than subsectors.
 

Attachments

  • Charted Space.d81.zip
    7.9 KB · Views: 1
Last edited:
OK someone talk me off the ledge.... I've started researching compression algorithms. LZ77, Arithmetic Coding, LZ4...
 
Last edited:
Back
Top