UWP data for 8-bit game

robject · Aug 12, 2021

A previous "solution" was to group worlds by subsector. After all, a subsector is relatively small in size (so fast to load), plus it usually holds the worlds you want. And it decreases the number of files down to 2,000 (there are about 2,000 subsectors in Charted Space)... very manageable.

In the worst case scenario, your ship is in the corner of a subsector, and you'd have to load three neighboring subsectors. That's still not bad.

But then the math becomes less simple. You calculate subsector based on current hex. So whenever you load data, you first have to calculate the filename -- and, probably the neighboring filenames as well. And you get to decide which other subsectors to load, if any.

robject · Aug 12, 2021

I'm liking the RAM-cache solution. Here's why.

(1) The cache IS the map. This bypasses access speed worries: access is about as fast as it will ever be.

(2) Load it as you go. This is a huge win. You scoop up a small pile of nearby worlds and put them in their place in the RAM-map.

(3) Decoupled from file storage. This is also a huge win. This allows file storage, without coupling the map logic with the filesystem. This kind of layer of indirection nearly epitomizes what Computer Science is all about.

(4) It's still a grid. Map calculations stay easy.

(5) The cached data can MOVE. Once you've entered jump, I can block-shift the data in the cache so that the Origin is the new location. Jump takes time by the game rules, so that can mask the operation.

whartung · Aug 12, 2021

The primary benefit of using (abusing) file names is that you get to sequentially scan the directory entries using "hand written" machine code, rather than your higher level C code.

However, since we're talking C over machine code, while, especially, on the 6502, machine code can certainly be faster than compiled C code, you're not looking at a dramatic difference than if you were using something like BASIC.

So, the performance boost will be marginal, me thinks, particularly if you compare it to binary searched indexes, even in C.

You still need some kind of master list of subsectors. You don't want to have 2000 files in a directory. Every time you try to load a file, the scanner will be searching through 1000 (on average) entries to find the file by filename.

Mind, there's no code to write. So that's a benefit as well.

Its really hard without the actual machine to make any guesses here, and we're certainly sinning with the whole pre-optimization thing.

The primary benefit of SIO and I2C interfaces is that they have low pin counts. 2 or 3 I think for SIO, 2 I think for I2C.

The problem is that if the clock is driven by the CPU, we'll make a swag that the inner loop takes 4 instructions at 3 cycles each to shift the data byte out the primary data pin. That 12 cycles times 8, 96 cycles per byte. On an 8MHz processor, that's 87K bytes per second.

Whereas if it's a parallel interface, you can be bouncing between 1 and 2MHz.

The serial interface speeds can be quite fast, that's why it can be helpful to have auxiliary hardware that can drive the serial lines at full speed, independent of the CPU, and then just present an 8-Bit interface to the CPU where the bytes are likely there faster than the CPU can fetch them.

But, more pins, more hardware, the antithesis of the primary drive to using serial interfaces in the first place.

87K Bytes per second is a far cry from the 1000 bytes per second of a legacy C64 floppy drive, but, it's not "fast" either. Not that there's anything you can do about it. It's not your hardware. It would have been nice if they added some stuff to the FPGA perhaps to handle this. But using the FPGA as a black box video chip is fine as well.

IF you're just loading local subsector data occasionally, then I/O won't be that biggest hurdle to overcome. As you mentioned, you likely only need to have 4 subsectors in memory at a time, at best 200 systems. But at 100 bytes per system (system meta data plus any attached game data), that's 20,000 bytes (3 banked pages) of data (!!). There went half of your working RAM.

But that's why I advocate the cache idea. Simply, you don't "store" 200 systems in working RAM. You just "read" them, all the time (save immediate local processing).

So if you draw a map, you read the list of systems, start iterating across it, if the system is on your map, you read the system, display the information, and that's it. When you're done drawing the map, you have "no" systems in working memory. The list is "gone", the systems are "gone", all you have are their vestiges displayed on the screen.

Since you're not animating, I'm betting this will be "fast enough", plus you should be able to test this on the simulator (assuming it's running at an "honest" 8Mhz), because the actual I/O will be low (everything is cached), and you'll see "real speeds" of banking and reading from cache.

You can see how you can easily have 4 full sectors of data floating around in the cache, more than enough for any real nearby operations, but no (little) impact on your working memory.

There's only 64 8K pages in the cache, so the management of the cache should have low impact on your working memory (i.e. a table 64 entries long).

By no means is this free, but it's a good use of the banked RAM and keeps your working set down for more "real" data.

You'll need a way to uniquely identify the systems, your coordinate system may be fine if it's "galactic" in scope. There are several duplicate system names. "Depot" is particularly popular, for example.

robject · Aug 12, 2021

whartung said:
IF you're just loading local subsector data occasionally, then I/O won't be that biggest hurdle to overcome. As you mentioned, you likely only need to have 4 subsectors in memory at a time, at best 200 systems. But at 100 bytes per system (system meta data plus any attached game data), that's 20,000 bytes (3 banked pages) of data (!!). There went half of your working RAM.

I'm with you. That's why the current implementation just shoves everything into banked RAM, and I use a few small routines to make sure the correct bank is selected (125 UWPs per bank).

There's room in there for more than 6,000 UWPs.

But that's why I advocate the cache idea. Simply, you don't "store" 200 systems in working RAM. You just "read" them, all the time (save immediate local processing).

[...]

Since you're not animating, I'm betting this will be "fast enough", plus you should be able to test this on the simulator (assuming it's running at an "honest" 8Mhz), because the actual I/O will be low (everything is cached), and you'll see "real speeds" of banking and reading from cache.

Sadly, it's not true to I/O speeds. The forum discusses and wonders what the true throughput will be, but whatever it is, it's slower.

But I get your drift and I think there's some takeaways.

robject · Aug 12, 2021

Well. It looks like the subsector is STILL the ideal unit of file I/O.

2016 - 2017
I was targeting the Commodore 64 at this time.

So the way I did this in 2016 is to label the subsectors along grid lines:

Code:

SS-20-20
SS-20-21
SS-20-22
...etc...

SS-20-20 was Cronor subsector. That gave me a few sectors spinward and a few sectors coreward of the Marches if I wanted to add them.

2018
I was targeting the Commodore 64.

In 2018 I tried variable length records, and encoded the world names in Z-text... neither of which were worth the headache.

2019
Still on the C64.
By 2019 I was back to subsectors, but added a bit of reference data:

Code:

SS-20-20 SPIN A

Helpful for debugging, plus now I don't need a header in the file itself.

Then the next step in late 2019 was to widen my scope a bit.

Code:

500-502 SPIN C

These files were lightly encoded and newline-delimited; readline() is easy to write, so this is a decent idea for storage.

Code:

04a646930-dba imefate
05b56789c-a  eimalell
11bac6773-9  cimyres
12c652998-7  aimmenorb
14b439598-d  aimuakye
15e676126-7  cimwhanga
16e888765-2 acimknorbes
17d893614-5  cimforboldn
18c776977-7  fnaruie
19c799663-9s cimjenghe
22a100103-dnabimpixie
23a8b3531-dsabimboughene
28c200423-7s aimhefry
29a788899-cakbimregina
34b584879-bs aimferi
...etc...

2020
In 2020 I moved to the Commander X16, which made some things significantly better:

(1) file names could be longer than 16 chars
(2) that lovely RAM bank
(3) the SD card is large enough to store all of Charted Space
(4) the SD card is large enough to allow more than 24 characters per UWP line

Last year was when I moved to the monolithic all-banked-RAM map, with 64 bytes per hex... every hex... including empty ones. Inefficient but fast running. The problem there is that I'm limited to about 2 sectors, and I'm almost positive the map load will be S L O W.

robject · Aug 12, 2021

You still need some kind of master list of subsectors. You don't want to have 2000 files in a directory. Every time you try to load a file, the scanner will be searching through 1000 (on average) entries to find the file by filename.

This is a really interesting problem.

How about a quadtree division using subdirectories? Or is that insane?

robject · Aug 12, 2021

whartung said:
However, if you go with the "use the banked memory as a disk cache" scheme, then it's better to do you own indexing, and shove everything in to one big file.

I hadn't considered a file that's bigger than banked RAM... but... the compiler supports fseek().

Huh.

So...

You're saying...

I can fopen() a megagargantuan file...

And I know I want the world data at 9930 ring/ray 62340...

And I have a hot little function that tells me that's at offset 7459324.

So I do the fseek() and read the data.

(And, fseek() can be based on the CURRENT position too, allowing jumps greater than 2^31)

Oh. Great. Kugganzir.

robject · Aug 12, 2021

Harrrrrgh!

Well fseek() isn't supported on any of the Commodore hardware targets on CC65.

That means, unless I am magically able to write the assembly language to support it, I'm going to have to fall back on multiple directories of subsector files.

whartung · Aug 12, 2021

robject said:
Harrrrrgh!

Well fseek() isn't supported on any of the Commodore hardware targets on CC65.

That means, unless I am magically able to write the assembly language to support it, I'm going to have to fall back on multiple directories of subsector files.

Well, thats lame.

It took some digging, but this is what I came up with.

First, the disk drive is an autonomous unit in the C64 architecture. The C64 itself, and the underlying KERNAL, doesn't really know squat about disk drives.

What it does know is how to send command to things, and one of those things happens to be something akin to a disk drive.

The 1541 (along with other Commodore drives) supports what's called a "relative" file.

When you open a relative file, you must specify a record size, which can not be larger than 254 bytes (I think). From there you may have more than 65535 records in the file.

That limits a single file to ~16MB (not an issue for the 1541...).

What I would hope that the CC65 libraries have is some wait to call the C64 KERNAL (as they call their internal system). If you can talk to the KERNAL, you can talk to the disk drive. If you can talk to the disk drive, you can make your own "fseek".

Perhaps one reason they don't support fseek is fseek is arbitrary, any point in the file. Relative files don't work that way, they work off of fixed record lengths.

Mind finding this out was like pulling hens teeth. Since it's not about sprite and sounds and such, nobody gives a rip about things like relative files.

The best reference I could find is a 1541 reference manual, since it talks about programming the 1541 disk drive. Since the C64, again, doesn't know anything about disk drives, none of the C64 texts really talk about them.

Heres a link to the manual I found: https://www.mocagh.org/cbm/c1541II-manual.pdf

Do not confuse relative files with block access. Block access is reading sectors on a floppy, and that's not what you want to do. How that would be supported under FAT32 is anyones guess. But feel free to ask the Commander x16 folks.

Mind, you still need to write your own disk cache. The C64 won't do it because...it doesn't know anything about disk drives! So there's no "disk driver" to hack to add a cache too.

Now, perhaps, you can install an internal, virtual "peripheral" that you can talk too via KERNAL which could act as a proxy. You call your internal device, it calls the disk drive. I don't know if you really benefit much here, honestly, over rolling your own "mega file" format and caching the blocks yourself.

I love this line from the manual above:

Some well-written relative file programs are able to find and read the record of one desired person out of a thousand in under 15 seconds, a feat no sequential file program could match.

Wow! 15 seconds! Oh boy...

Ideally you won't have that problem.

So, in summary, it's possible, but it'll be some work to be sure.

EDIT: So, yea, you shouldn't need machine language to do this, CC65 should be able to call the internal ROM routines directly, but you're going to have to hunt down some samples. This stuff is pretty arcane.

robject · Aug 13, 2021

Wayne, you really did your homework!

I did some file I/O on the C64 back in the day... but not much. Yes, I know about REL files: it's essentially a poor man's database, kinda.

There is another way on IEC devices, and that's to read raw sectors straight off the disk. Compute the offset, read in a 256 byte block.

But thankfully, we don't have to care about the 1541, nor the IEC serial bus protocol, nor the sad loading times due to the serial bit-banging thereof, nor the small disk sizes, etc. Suffice to say IEC support is for REAL navel-gazers.

The SD card is not on the IEC bus. It really is the best choice.

* * *

I think the math is not hard. If Regina is 62739 Ring / Ray 9930, then its subsector index is computable. One possible solution:

Code:

int ssy = ring / 10   ; 6273
int ssx = ray / 8     ; 1241
filename is something like sprintf( "%04x-%04x SPIN C or whatever", ssy, ssx )

Meanwhile, Regina's location in the subsector is a mod.

Code:

int sly = ring % 10   ; 9
int slx = ray % 8     ; 2

int offset = (sly * 10 + slx) * record_size;

That is, 0310.

SLY is within one jump of the southern border, so we load that subsector, too.

whartung · Aug 13, 2021

robject said:
Wayne, you really did your homework!

I did some file I/O on the C64 back in the day... but not much. Yes, I know about REL files: it's essentially a poor man's database, kinda.

It's not as flexible as what MS-BASIC does on other systems, but its workable. They jumped through a bunch of hoops to not send a 4 byte offset.

There is another way on IEC devices, and that's to read raw sectors straight off the disk. Compute the offset, read in a 256 byte block.

Yea, but that skips over the file system, which isn't really a good idea for the SD card.

But thankfully, we don't have to care about the 1541, nor the IEC serial bus protocol, nor the sad loading times due to the serial bit-banging thereof, nor the small disk sizes, etc. Suffice to say IEC support is for REAL navel-gazers.

The SD card is not on the IEC bus. It really is the best choice.

That's fine, but how is it exposed to the developer? I didn't see anything obvious in the Commander X16 documentation. I assumed that the SD card was essentially "compatible" with how the 1541 worked, just pointing to a different device (say, 9 instead of 8), but still just sending it strings of commands.

I would like to hope that you might be able to, indeed, make an fseek style call on to a file on the SD card.

I honestly find the C64 quite foreign. I'm much more used to the Atari, but it had a better device abstraction layer than the C64 has. I think the C64 core is based upon the earlier Commodore computers that interfaced with the IEEE-488 bus, which is essentially a serial protocol. In contrast to something like a CPU bus, which gives you address lines and data lines and control signals. The -488 is quite simple and lets you send message packets to the individual devices on the bus. They converted it to a serial bus for the C64, but semantically its quite similar.

Code:
I think the math is not hard. If Regina is 62739 Ring / Ray 9930, then its subsector index is computable. One possible solution:

Code:

int ssy = ring / 10 ; 6273 int ssx = ray / 8 ; 1241 filename is something like sprintf( "%04x-%04x SPIN C or whatever", ssy, ssx )

Meanwhile, Regina's location in the subsector is a mod.

Code:

int sly = ring % 10 ; 9 int slx = ray % 8 ; 2 int offset = (sly * 10 + slx) * record_size;

That is, 0310.

SLY is within one jump of the southern border, so we load that subsector, too.

This is certainly the right track.

It's too bad you don't have real hardware, to see what the I/O is really like. I would just assume it'll be "fast enough" until you get your hands on it, and then you can address it again then.

I played enough Wizardry back in the day to just embrace the disk light every time I hit the keyboard.

robject · Aug 13, 2021

So yes, the KERNAL calls are patched to work with the SD card; however, it's patched, so we're not on the old IEEE 488 / IEC serial port.

I've just been informed that the CBM DOS "seek" function is available for use, so even though there's no fseek(), I could wrap one around DOS. And then have a truly gigantic single file solution.

More later.

***

Which Ataris? The old 800, or the later ST series? I hear the STs were amazing.

Nathan Brazil · Aug 14, 2021

More about the ring/ray:

If you are making this work just around Spinward Marches, this should not be a factor.
But if you are designing it across all Imperial Sectors, you must address crossing Ray 0. Ray 0 corresponds to Hex Column 01 of Core (Sector 0,0) and sectors directly rimward and coreward. Computing trailing from Regina (Ray 62721), Ray 62848 corresponds to Hex Column 32 of Dagudashaag (Sector -1,0) and Sectors rimward and coreward.

I bring this up now because if file naming conventions or "reading a cache" or selecting records based on "current position" might require some sort thought on offset. If you compute needing some systems in rays higher 62848 say crossing from ray 62846 (column 30) in Vland Jump 4 to trailing Lishun (column 2), you want systems with rays up to 62848 AND Rays 0 and 1 not from rays 62846 to 62850. The same if you cross in the other direction not 1 to -3 but 1 0 then loop from 62848 and down to 62846.

Just a thought on waking up...

whartung · Aug 15, 2021

Ok, so I've written up a banked backed file cache implementation in C.

I've got it here.

https://www.dropbox.com/s/zp3dfynu3cq9bym/Cache64x.zip?dl=0

Now, that's an XCode project, but the C code is there. There's no Makefile. It should be straightforward to incorporate into anything, it's just 2 files: cache64.c and cache64.h

Obviously, I have not compiled this with CC65. I do hope that CC65 has code for 32 bit values. I assume it does, but I don't know.

I use uint8_t (unsigned byte), uint16_t (unsigned 16b int), uint32_t (unsigned 32b int) types, not exclusively, but where I feel it's important to be specific on the data sizes.

The architecture is pretty simple, and cheap. For 64 banks, the overhead is 4 bytes per bank, plus 5 for house keeping, 261 bytes.

You tie the cache to a file, and you can allocated as many banks to that file as you like. Each cache has a starting bank, and a bank count. So you could work with several files if you like, or just one big one.

First, we have the concept of a block. A block is a chunk of the master file that's the size of the bank. Each block has an id, which is simply the offset within the file divided by the block size. This lets us use 16 bit block ids. With an 8K block/bank size, and 65K different block ids, that's 512MB. "Should be big enough for you, old man. What's the cargo?"

We have a list of blocks, sorted by block_id, which tells us which blocks of the file are in the cache, and which bank they're associate with.

Then, we have an LRU (Least Recently Used) list. The top of that list is the most recently used bank, the bottom is the least.

The read routine crassly assumes that the amount of data to read is LESS THAN a block size, but it DOES span blocks, so there's no concern with alignment. It could readily be change to read data up to 16 bits in length. I just don't.

You read data, just like a normal read.

Code:

uint16_t cache64_read(cache64 *cache, uint32_t offset, uint16_t size, void *buffer);

So, if you had a structure, say star_system, and you wanted to read the 100th star_system from the file, you could do something like:

Code:

start_system my_system;
uint16_t size = cache64_read(cache, sizeof(star_system) * 100, sizeof(star_system), &my_system);

The size returns is just the size you pass in. It's not really set up to return partial results, it pretty much assumes you know where the data is, how much there is, etc. Reading past the EOF is kind of "undefined". So, "don't do that".

Given the offset to read, we discern the block_id, look it up in the block list, and if it's not there, we check if the list is full. If not, we'll just insert it into the list.

Whenever we use a block, we grab it's bank and move it to the top of the LRU list.

If the list is full, then we take the bank at the bottom of the LRU list, search block list for it, and remove the block. Right now, when we remove a block from the list, nothing happens. But if you wanted to make the cache write-through, you could do this here by dirtying the blocks and flushing them back to disk when they're about to be removed from the list.

The cache structure relies on dynamic memory, but could easily be made static.

Also, the cache structure has a "banks" array which is used to simulate the "banking" system. Obviously, the needs to be removed for the real system.

There is no error handling.

It currently relies on stdio routines: fopen, fread, fseek. Those will have to be replaced by the x16/CC65 equivalents, but should be straightforward. It assumed, again, the ability to use a 32bit offset in to an arbitrary point in the SD file, and that CC65 can handle that.

I haven't written C in decades, but this seems ok to me.

robject · Aug 16, 2021

Nathan Brazil said:
More about the ring/ray:

...But if you are designing it across all Imperial Sectors, you must address crossing Ray 0...

You're right, of course. I haven't gotten that far.

I can think of three ways to do it though:

(1) Always modulo the number of rays, i.e. 642836 or whatever.
(2) Round up and modulo 2^16.
(3) Translate the map, so that Charted Space is in "the middle", and then I can be lazy and never modulo, assuming we'll never be on the other side of the galaxy.

I'll probably do #3.

aramis · Aug 17, 2021

Remember: signed int_16 =[-2^15,...,(2^15-1)]. And, canonically, 0 is Core.

UWP data for 8-bit game

robject

SOC-14 10K

robject

SOC-14 10K

whartung

SOC-14 5K

robject

SOC-14 10K

robject

SOC-14 10K

robject

SOC-14 10K

robject

SOC-14 10K

robject

SOC-14 10K

whartung

SOC-14 5K

robject

SOC-14 10K

whartung

SOC-14 5K

robject

SOC-14 10K

Nathan Brazil

SOC-14 1K

whartung

SOC-14 5K

robject

SOC-14 10K

aramis

Administrator

Similar threads