• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

UWP data for 8-bit game

robject

SOC-14 10K
Admin Award
Marquis
So I've got this trader game running on an 8-bit architecture.

And I want to store Charted Space on disk. The file system is FAT32.

How would you do it?

* * *

At least theoretically, I could store each steenking UWP as its own file. All 80,000 of them (appx).

I mean, that's the simplest solution, right?
  • 100 directories to break up the fileset based on the first two digits of the ring/ray coords.
  • Each directory would have on the order of 800 UWPs.
  • The filename has the ring/ray coords of a single UWP.
  • Finding its neighbors is a snap. It's some math and then some directory searching.
  • I can use much of the UWP as-is.
  • Reading in a dozen tiny files is fast.
  • Ideal file size is just under 256 bytes -- plenty of room for UWP plus metadata.

For example, Regina could be stored in a file named 53/281-5099 (or whatever).
Its neighbors would be 53/280-5099, 53/282-5099, 53/281-5098, 53/281-5100. And so on. Very easy to use coordinates to determine filenames.

(I'm not consulting the map so my numbers are off, but you get the drift.)

File Content

A UWP often takes up 135 bytes. It's parsable, but it can be improved. Also, there's room for another 100 bytes.

Here's what the player might want to know:

  • The sector: SPIN
  • The hex: 1910
  • World name: Regina
  • UWP: A788899-C
  • Trade remarks: Ri Pa Ph An Cp (Amindii)2 Varg0 Asla0 Sa
  • Importance: 4
  • Nobles: BcCeF
  • Bases: NS
  • Zone: -
  • PBG: 703
  • Worlds in the system: 8
  • Allegiance: ImDd
  • Stars: F7 V BD M3 V

Pad it out to simplify parsing.

Code:
SPIN 1910 REGINA              A788899-C NS RI AN CP SA       AMI VAR ASL 4BCF-703 8 IMDD F7 V BD   M3 V

Lots of room left.
 
Last edited:
Curious what 8-bit system you're using that uses a FAT32 file system.

Remember, the key criteria to designing any persistence format is: "What kind of questions will I be asking of this data?" instead of "what data do I have to store".

Two completely different points of view.

Contrived case, you have 1M patient records with demographic data, current conditions, medical history. A HUGE amount of data.

But if all you want is folks names, gender, and addresses, you can toss out all of that medical history. Not only do you not have to figure out how to represent, you don't have to store it at all.

How the "master" data is stored is moot, since the primary role for it is "process this information for more efficient querying for XXX problem".

You don't have to make all of the decisions up front.

So.

I don't know what a "ring/ray" coord is. You're not talking hex numbers (which duplicate across sectors). You're also not mentioning sectors (or subsectors) at all.

How does your program use this information? Apparently it's going to query by "ring/ray" coordinate, using the directory entries as a simple index to walk the tree to 800 systems level.

If that works for you, then the format is decided.

Given that, however, it is straightforward to create simple indexes of your planets.

For example you can create one by system name.

Simply create an entry with the system name (say, 20 characters, truncated or padded with spaces), plus the coordinate (ring/ray?) to the actual system (however long that is, but make it fixed length).

Then, simply sort it by name. Since all of the rows are the same length (including a line separator if you care), a simple binary search will locate anything you want very quickly. Being sorted by name, you can walk forwards and backwards by name. it's not a B-Tree, but it's close enough since the data is static.

As long as you can get the length of the file, the search is easy to write, even in something like BASIC. It's not even recursive.

So, don't underestimate the value of having extra indexes against your raw data set even if it's stored as directories.
 
Are you doing a new Galactic? What OS are you using

I wrote a db to store travellermap into Microsoft Access back in 2011-2015. Don McKinney and Mr. Miller hit me up for queries and stuff and I used it to T5 the (DGP) Rebellion and 1248 data for travellermap. It also made some of the initial population counts in the wiki. I no longer have it as I lost the drive it was on and its backup. Also been lazy to rewrite due to my vision issues. I can lend you my observations what I found in creating its final form.

If the program is doing speculation trading and you are able to examine each buy/sell combination then 80,000 files might not be great. Consider:
Too many files in one directory may cause a slow down due to the FAT trying to access so much at once. In addition fragmentation may slow the system down. Finally, in older Access you have file locking and record locking limits. Since you are mentioning 8-bit games, in the old days of DOS there were often file handling limits. I suggest each file contain a sector or subsector worth UWP to reduce that potential overhead and performance problems.

SOARN (Save only as really needed :rofl:) Depending on the ruleset (looks like you are using T5) data that can be auto-generated (like many trade codes) does not need to be stored. As, Va and others that are determined by other data fields and need not be stored. Others like Capitals (Cp) or minority sophonts (the travellermap codes) which are assigned and not necessarily auto-generated will have to be saved. At runtime generate the necessary codes as needed. Since you mention 8 bit you would have to determine the resources used to calculate the trade codes and such vs just storing it in files.

The same with Ring/Ray coordinates. I calculated them from Reference using the human readable values (Sector X=0,Y=0,Hex 0140 is 10000, 0) and wrote an algorithm to calculate it from using the three. The same with the filenames Since I was running NTFS/Win 7 (later 10) my file naming was:
Sector[+-]nnn,[+-]nnn][blank][Common Name].txt
I was not limited to 8.3 DOS naming convention. I could use the plus minus and comma as part of the filename

I wrote an ETL program to auto-download the travellermap data, import to my db and write out a renamed files

That is why I ask if this was an existing program and what 8-bit system you are using? I cannot provide better observation without knowing more.

Also if it is available, where could I buy it? It will prevent eyestrain.
 
You have already established a requirement.

80,000 UWP at 135 bytes ~ 10,800,000 bytes ~ 10800KB ~ 10.8MB

Can your 8-bit game/system handle this much data all at once or does it only load in a few records or files at a time?
 
Hi, Nathan and Whartung -- thanks for the clarifying questions.

First, file name. I think Whartung's suggestion is a nice one -- to include the world name with the filename for easy reference. I think that's doable.

Whartung said:
What kind of questions will I be asking of this data?

In fact I have this data currently packed into one 400KB map file, a combination of convenience data and bit-packing.

Well actually let me SHOW YOU what the current incarnation does.

https://www.commanderx16.com/emulator/x16emu.html?manifest=/emulator/102-traveller-trader-wip/

The map's format is an array of fixed-size 64-byte binary records. Here's the structure:

LOCATION
These internal "hex grid" bytes are used to figure out where we are and where to look for neighbors.
Note the limitations of using 8-bit locations: the map can hold, at most, 256 x 256 hexes. In reality, it's engineered to only hold 64 rows, so the actual limit is 256 columns x 64 rows of uwps.

And, 0,0 is Spinward Marches 0101. So the map can't have Gvurrdon sector.

It currently has the Marches and Deneb.

0x00 column
0x01 row

PRETTY FORMAT
This text is for display purposes. It's worth it due to low main RAM and high banked RAM.

0x02 - 0x05 sector abbreviation (e.g. "SPIN")
0x06 space
0x07-0x0a hex location in ASCII (e.g. "1910")
0x0b space
0x0c-0x1a mainworld name in ASCII (e.g. "Regina ")
0x1b space
0x1c starport character
0x1d-0x22 SAHPGL
0x23 dash
0x24 TL (ASCII)
0x25 space
0x26 zone (ASCII)
0x27 allegiance (one ASCII char only)
0x28 zero byte

COMPRESSED DATA
This is for computational work and for less-common data. The base code is used in the subsector display, but not in the textual representation, which is almost too long already.

0x29 bases char (one ASCII char only)
0x2a Belts + GGs char (one ASCII char only)
0x2b-0x2c trade code flags (16 bitfields)
0x2d Siz + Atm (4 bits each)
0x2e Hyd + Pop (4 bits each)
0x2f Gov + Law (4 bits each)
0x30 TL (5 bits)
0x31-0x39 nine zeros ("reserved", probably for market info)
0x3a-0x3f compressed stellar data


As you can see, there is convenience string data here, as well as compressed data.
 
Last edited:
So the X16 is a retro machine using the WD65C02, built on the VIC-20's architecture, with an enhanced Commodore KERNAL, DOS wedge, and built-in monitor. It has an 8K RAM bank window at 0xa000, and a 16K ROM bank window at 0xc000.

The standard banked RAM size is 512K. Socketed DIPs allow expansion to 2MB.

But, the system only addresses 64K at a time, and the system RAM is therefore appx 39K.

So: small programs with large datasets are OK.

CC65 has a compile target and library support for it. So I write in C, because that's what I'm most comfortable in.
 
The file system, meanwhile, is on an SD card formatted FAT32. So I expect to use a card with "lots" of space.

That means directories can have up to 65K file entries at 8.3, or fewer with longer filenames.

I'm just assuming they're going to be longer, so I expect to be able to fit 8,000 files per directory.

Assuming I have 100 directories, and Charted Space has 80,000 UWPs, then that suggests 800 UWPs per directory.


Say a mainworld name is 15 characters long, and the ring/ray is 5 + separator + 4 = 10 characters. That's 25 characters... and I will probably truncate for the sake of FAT32's fillename quirks. Still plenty of space for several thousand UWPs in a subdirectory. If I stick with 800, then I "should be" fine.
 
Ring/Ray is an Imperiocentric "galactic" notation. When used responsibly, it provides a perfectly fine hex-grid notation for the galactic band of space roughly 9600 to 10,400 parsecs away from the galactic centerpoint.

Good enough for now.

Reference/Core is 10000 ring/ray 0.
Regina/Regina is 9930 ring/ray 62723.
 
So the binary structure is useful; however, now I'm reading files into char arrays instead of pre-loading a map directly into banked RAM. I'm backtracking to a character-based format.

LOCATION
Location is in the filename itself. This is convenient.

DISPLAY
I can theoretically take up to 80 characters...

0x00-0x04. Sector abbr + space.
0x05-0x09. Hex num + space.
0x0a-0x1d. Mainworld name + space.
0x1e-0x27. UWP + space.
0x28-0x29. Bases chars.
0x2a-0x2b. Zone + space.
0x2c-0x2e. Belts + GGs + space.
0x2f-0x32. Allegiance.
0x33-0x34. # worlds + space.

REMARK FLAGS
A compression without actual compression is just a pile of ASCII 0s and 1s.
0x35-0x4f. Remark flags.

SOPHONT FLAGS
Less elegant is a pile of sophont flags.
0x50-0x5f. Sophont flags.

STELLAR DATA
This is slightly compressed into 3 entries of 3 characters each.
0x60-0x68. Stellar data.
 
Well looks like a clever little machine.

It's hard to imagine how this will perform. On the one hand you have a fast ("fast") I/O bus (you're not slaved to that awful C64 serial bus). On the other, it's 8Mhz. So, that's important also.

But you also have 512K of RAM banked in 8K chunks.

So that's important too.

Finally, the bulk of your data is static, so it's "cheap" to jump through a bunch of hoops to pull things off in pre-processing.

You still haven't really said how you want to query the data.

Were it me, I would put the meat of the program in the 40K of working RAM.

If anyone wants to browse every sector, they're in for a world of hurt. Those are expensive operations and a lot of data.

I would use structures on disk that can simply be read() in to memory without any marshalling. In your case, disk is cheap, RAM is OK, but CPU is limited. Don't spend operational CPU on converting data. Spend all of that up front pre-processing the data. Simply I wouldn't bother to compress the disk data if you intend to uncompress it after you load it. Theres no real point, you're saving the wrong things here.

Then I'd just use the banked 512K of RAM as a monster disk cache.

Sorted, binary searched data works well in contrast to an actual index, since you only need to process the actual data your interested in rather than pages of indexed nodes. The disk cache kind of handles the rest automatically.

Again if someone wants to browse every sector, thats a cache smashing operation. Tough luck. But if the bulk of your work is "local" then it should all fit in the disk cache, so it's less of an issue constantly "reading" it over and over.

You could put more operational data in the banked memory, but it's going to be a bit of a pain to manage in a general way and lots of hoop jumping compared to something more monolithic like a disk cache. You can't just follow pointers, for example.

If you use the disk cache, it may be worth your while to just store everything in a large file, indexed how you like than a bunch of little files. That way your cache can work as you bounce around the big file just off of block offsets for any data. No need to key in file awareness.

Or you can find a disk driver that leverages the RAM as cache and ignore it completely, just assume the cache is in place, read and write whatever files however you like hoping the cache will save your bacon. If you write it yourself, perhaps make it a point that you don't have any structures that cross the 8K boundary of the page RAM. That makes it a bit easier, locate the page, copy the structure. No page hopping shenanigans.

This is in stark contrast to using it as a RAM Disk, as a RAM Disk has to be populated. You don't want to do that. Better as a pull through cache, however it's done.
 
Currently, I just paw through the banked RAM to build up the subsector view. It's efficient.

Were it me, I would put the meat of the program in the 40K of working RAM.

Yep.

If anyone wants to browse every sector, they're in for a world of hurt. Those are expensive operations and a lot of data.

Yep -- not gonna do that. I pull neighboring data only. That sort of math and memory reads is quick.

There's travellermap for people who want it.

I would use structures on disk that can simply be read() in to memory without any marshalling. In your case, disk is cheap, RAM is OK, but CPU is limited. Don't spend operational CPU on converting data. Spend all of that up front pre-processing the data.

Yep! Good!

Then I'd just use the banked 512K of RAM as a monster disk cache.

True dat. The annoying thing is that I/O is not going to be as fast as the emulator. And.... actually I'm not quite sure how fast I/O's going to be. Probably "fast enough", whatever that means.

You could put more operational data in the banked memory, but it's going to be a bit of a pain to manage in a general way and lots of hoop jumping compared to something more monolithic like a disk cache. You can't just follow pointers, for example.

I've actually done this in another program. And yeah, it's a little hairy, but all that banked RAM is really tempting.
 
And... in line with the "don't work more than you have to" angle:

I think the file read is byte-based, not necessarily character-based. So, I could read directly into a structure. So I can have bitfields and binary data where appropriate.
 
True dat. The annoying thing is that I/O is not going to be as fast as the emulator. And.... actually I'm not quite sure how fast I/O's going to be. Probably "fast enough", whatever that means.

With the flash drive, it should be "CPU Bound". Seek is not an issue, it's all about block moves from the flash drive.

That said, do you know if they're using SIO for the drive? And, if so, is there a peripheral chip driving that as well (or the FPGA)? Bit banged SIO is kind of awful on legacy CPUs. It's just very cheap to implement. But if they're using a intervening controller chip that can drive the SIO at speed faster than the CPU and bus, then you won't notice it.

And... in line with the "don't work more than you have to" angle:

I think the file read is byte-based, not necessarily character-based. So, I could read directly into a structure. So I can have bitfields and binary data where appropriate.

Yes, indeed. You write conversion programs to convert the data in to you internal memory structure and write it out. There's no reason to make the data "human readable" or anything else. Nobody cares that the data isn't portable (the primary complaint about using this technique for serialization). Just make sure any pointers written are zeroed out and properly reloaded if necessary.
 
Neato :coffeesip:
*Nathan takes off the grognard hat.*
*Nathan puts on the programmer hat.*

Now I see what you are up to and potential expansion. It seems you want access to the universe the OTU worlds, but the will only access at most a grid of 256 x 64 UWP at time, yes?
Q.Do you plan on altering UWP data files on the fly in within the app or outside the app or not at at all? For example you might wish to tweak the data based on your ship's actions, say storing you ship's reputation for that world.

A. Umm Keep it with the ship?

A.1. If you don't plan on alterations of the UWP, there is no need to keep the data in the files human readable. Compress those records as much as possible. While the filenames and data being human readable is neat, it is functionally not needed once you are no longer debugging This also allows you to further "compress" the filename at the same.
Example: Going back to 8.3 the naming could be [XXXX][YYYY].dat where XXXX and YYYY represent ring/ray in Hex format so your "Universe Size" 164 by 164 hexes or about 1638.4 by 2048 Sectors

Regina/Regina is 9930 ring/ray 62723 is 26CA F503 in hex so the Filename might be 26CAF503.DAT
Reference/Core is 10000 ring/ray 0 is 2710 0000 in hex so the Filename might be 27100000.DAT

This of course makes it useless to view directly, but then once you are set up importing your data, why do you need to go into the files directly ever again? Are you not going to view the data via the program anyways?
The potential problems are Can you compute those values in your 8 bits and how to structure the directories?

Possible Solution: One More (just one more) file the "what directory do I find xxxxyyyy.dat " file. You mentioned how you have a practical limitation of loading 256 columns x 64 rows of uwps (records). Create a file named IDXDIR.dat or some such.

Each Record has five columns of data
X1X1 X2X2 Y1Y1 Y2Y2 [DIR IN 8]

X1X1 Low Ring Range (hex)
X2X2 High Ring Range (hex)
Y1Y1 Low Ray Range (hex)
Y2Y2 High Ring Range (hex)
[DIR IN 8] = directory aka Sector in 8 Characters or less
So Core Sector (Containing Reference/Core 0140) would read as:
26E9 2710 0000 001F CORE

This way you could store your files by Sector. Some Directories will be larger than others because they contain more worlds but you seem to be ok with 800 files in one directory. IIRC no single sector has more than 500 or so mainworlds anyway.

Hopefully you know your current centerpoint. Write code to look in directories found in IDXDIR.dat where rings + or minus 127 of my current ring and +-32 of my current ray. Write some offset if your range crosses spinward or trailing of 0 ray .
A 256 x 64 grid means you will make the search of 6x2 (12) sectors/directories at a time and encounter at most 6000 filenames (12 x 500 based on my estimation). Most likely less though.
 
Last edited:
Well looks like a clever little machine.

I would use structures on disk that can simply be read() in to memory without any marshalling. In your case, disk is cheap, RAM is OK, but CPU is limited. Don't spend operational CPU on converting data. Spend all of that up front pre-processing the data. Simply I wouldn't bother to compress the disk data if you intend to uncompress it after you load it. Theres no real point, you're saving the wrong things here.

OOPS:rofl:

I started my previous response before your post. Forget compressing the records/files.

Still whartung, what do you think of the filename/directory search though?
 
OOPS:rofl:

I started my previous response before your post. Forget compressing the records/files.

Still whartung, what do you think of the filename/directory search though?

Well, the singular advantage of it is that it's a "free" index in to the data.

You can certainly duplicate data and make multiple indexes if you like (for example, you can do a "name" index by make directories out of the first couple of letters).

But, in the end, the file system is going to sequentially scan the directory (which is why you'd like to not stuff 100,000 files in a single directory, but break them up).

However, if you go with the "use the banked memory as a disk cache" scheme, then it's better to do you own indexing, and shove everything in to one big file. Now, if you can get a "disk driver" that uses the banks as a cache for arbitrary files and directories, then, that's a bonus, you get both for free ("free index + caching"). But you have the overhead of opening and closing files (not completely arduous, but not free to be sure).

Human readability is a non-requirement. Nobody is going to look at the files, that's what the program is for.

The "everything in a big file" is not trivial, but it's not really difficult. You ARE writing a "mini-filesystem", but it's very constrained to your use case, and you don't have to write to it. (For heavens sake, don't write to the big file!)

Binary Search indexes are easy and cheap. With some consideration for the 8K ram pages, I think the caching would be straightforward with some expense in the creating of the master file (but you have all day to do that).
 
This is getting me imaginative again. Im going back to travellermap to see if I can (re)create an/my ETL. I see Joshua has made api improvements. lots of them...
 
This is groovy, guys, thank you for humoring me.

CPU I/O Bit-Banging

So the X16 does not have special I/O handling. It's going to bit-bang, even with the SD card. The FPGA is there because at the time there were no video chip options. But in choosing it, they also got a PSG and the SD I/O, so it's a threefer.

Ah -- there is an I2C chip since prototype #2 (https://www.youtube.com/watch?v=T5vjnBYGp2w). But, I'm not up on its speed.

I think you're right when you say CPU is the bottleneck, and amusing to say, the CPU is what made the project get anywhere at all in the first place, so I can't grouse about not using a 65C816.

Going Nuts With the Filename

Something else I can do, is abuse the filename. Since it can be quite long, I can actually encode the UWP display line as part of the filename. So for searches, I can apply a search mask, but I can also use the filename to hold more information.

It's not free -- a 24 character filename takes up (I believe) 44 bytes -- but it saves having to change directories and do file reads.

Consider this use case: I'm calculating nearby worlds and showing a local map. So I do a directory listing...

...and by definition I already have to filter the filenames based on their location. So I'm already parsing filenames.

Might as well shove basic "visible" data on that filename.

Code:
RR-RRRR-HH-BBZG-SSAHPGL-T-AAAA-NNNNNNNNNNNNNNN

RR-RRRR: Ring/Ray data (6 chars)
HH: hex number, encoded (2 chars)
BB: base presence (2 chars)
Z: zone (1 char)
G: GG presence (1 char)
SSAHPGL-T: UWP (9 chars)
AAAA: Allegiance (4 chars)
N*: world name (say 12 chars average)

Maybe 42 chars average, so 53 bytes per entry. Thus a subdirectory can hold around 1200 entries. Doable.
 
Back
Top