JSON Sector Data Service

robject · Jun 12, 2014

I want to discuss with Thalassogen, Draconian, and Joshua Bell about a sector data model.

I am updating my website, and I find that my server-side code is inexorably, slowly moving away from HTML generation and instead serving JSON data, which my webpages then render however they like.

So I am thinking about "how would I do this in a way that is convenient for me BUT also useful for others"?

My current format is verbose and detailed. There are many ways to organize this data, this is just one. I'm wondering if y'all would use it if it were in a different (better?) structure:

[FONT=arial,helvetica]http://eaglestone.pocketempires.com/...pl?asJSON=true

[/FONT] The reason I am thinking about this is because my strengths are in server-side data crunching and production, but not persistence, and not presentation. Presumably, others are better at presentation (e.g. Travellermap!!!!) and other services I am currently forgetting. So, I would want others to be able to consume my data. Some level of interoperability seems therefore to be at least a good idea to think about.

I know we've been over this before. But talking about it is therapeutic to me.

Code:

{
  'meta': {
    'generated': 'Thu Jun 12 15:10:45 2014',
    'prng': 'burtle',
    'density': 'Standard',
    'sectorUID': 'mw-orion-4991',
    'hash': 'MD5',
    'tlCap': 21,
    'systems': 619,
    'civilization': 'Civilized'
  },

  '2632': {
    'population': 0,
    'importance extension': {
      'importance': -1,
      'expected ship traffic': 0
    },
    '_stellar': 'A6 II 10:A3 V G1 V',
    '_trade codes': 'Wa Co Sa',
    'pop mult': 0,
    'primary': {
      'spectral': 'A',
      'position': 'primary',
      '_uwp': '  A6 II',
      'decimal': 6,
      'spectral flux': 2,
      'size': 'II',
      'size flux': 3
    },
    'near companion': {
      'spectral': 'G',
      'orbit': '',
      'decimal': 1,
      'size': 'V'
    },
    '_uwp': '2632 Angindsan       B89A002-0    Wa Co Sa                003',
    'law level': 2,
    'starport': 'B',
    'hz variance': '+1',
    'size': 8,
    'hz': 9,
    '_extensions': '{-1} (C04-3) [1151]',
    'orbit': 4,
    'atmosphere': 9,
    'cultural extension': {
      'acceptance': 1,
      'symbols': 1,
      'strangeness': 5,
      'homogeneity': 1
    },
    'gas giants': 3,
    'mainworld type': 'Far Satellite',
    'government': 0,
    'belt data': [],
    '_bases': '  ',
    'economic extension': {
      'resources': 12,
      'efficiency': -3,
      'infrastructure': 4,
      'labor': 0,
      'ru': -144
    },
    'tech level': 0,
    'planetoid belts': 0,
    'tas': '',
    'name': 'Angindsan',
    'gas giant data': [
      {
        'number': 1,
        'orbit': '14.21',
        'satellites': 3,
        '_uwp': 'YLGG000-0:3 ',
        'size': 27,
        'code': 'LGG',
        'ringed': ''
      },
      {
        'number': 2,
        'orbit': '11.71',
        'satellites': 5,
        '_uwp': 'YLGG000-0:5 ',
        'size': 28,
        'code': 'LGG',
        'ringed': ''
      },
      {
        'number': 3,
        'orbit': '12.81',
        'satellites': 3,
        '_uwp': 'YLGG000-0:3 ',
        'size': 26,
        'code': 'LGG',
        'ringed': ''
      }
    ],
    'near star': {
      'spectral': 'A',
      'orbit': 10,
      '_uwp': '10:A3 V    hz7',
      'decimal': 3,
      'size': 'V',
      'hz': 7
    },
    'hex': '2632',
    'hydrographics': 10
  },

[B][I]  ...(and so on)...[/I][/B]

}

A sector using my verbose format below takes up a bit over 1 megabyte, and is a bit over 55,000 lines long.

That's a FAR CRY from a 30k SEC file.

Draconian · Jun 13, 2014

I'm not a computer guy, so that may be over my head.

I've generated a few sectors on Thalassogen's SectorMaker utility and they are datafiles about 1 MB too. T5 generates mainworlds and an average of about 9 more secondary worlds per stellar-occupied hex. So the memory-size may be well-night unavoidable. The generation is subject to any future T5 WorldMaker (?) errata (if you add the correction, they are corrigenda).

robject · Jun 13, 2014

See, I've thought about Thalassogen's code, too. He doesn't write Perl. BUT that's no reason I couldn't potentially call his service, get a passel of JSON in return, and use it in some way. Vice versa too, of course. Those of us who are better at providing services therefore don't have to write client code, and those who are good at client code can mix 'n match existing services. Seems like a good idea to me.

inexorabletash · Jun 13, 2014

Just a quick note to remember to set CORS headers so pages can "mash up" the data across sites. Basic tutorial on how to do it:

http://enable-cors.org/server.html

Then JS code in web page served from example.com can make XHR requests to a JSON service on your server.

inexorabletash · Jun 13, 2014

Also, you need to quote strings in "double quotes" in JSON data. http://www.json.org/

tjoneslo · Jun 14, 2014

robject said:
A sector using my verbose format below takes up a bit over 1 megabyte, and is a bit over 55,000 lines long.

That's a FAR CRY from a 30k SEC file.

This is similar to my experiences from a few years ago doing a similar effort with XML. My two responses to his are:

A megabyte, you're worried about a megabyte in this era where disk space starts at hundreds of gigbytes?

Json, like XML, compresses wonderfully. I was compressing my ~1Mb XML sector data into 15Kb. That is, smaller that the original SEC file, but with more data. The reason this is relevant for you is something like 98% of browsers understand data in a "gzip" format. This is a recommended practice when sending a large group of small files, or when sending a large blob (say a Mb of JSON). I don't remember (nor may have learned in the first place) the details of how this works, but it will be worth your time if you are worried about bandwidth.

robject · Jun 14, 2014

inexorabletash said:
Also, you need to quote strings in "double quotes" in JSON data. http://www.json.org/

Yes, we've been working with the JSON spec at work really a lot lately, and even though the official rules don't permit single quotes, it appears as though they're taken for granted willy-nilly. I don't know what the JS community thinks of this; I'm a PON* and YAML guy myself.

* Perl Object Notation, invented whenever Perl decided it needed a way to store references to data structures. It's very close to JSON. Of course the C community would naturally tend to gravitate toward curly braces for maps and square braces for arrays, wouldn't they?

inexorabletash · Jun 14, 2014

robject said:
Yes, we've been working with the JSON spec at work really a lot lately, and even though the official rules don't permit single quotes, it appears as though they're taken for granted willy-nilly. I don't know what the JS community thinks of this; I'm a PON* and YAML guy myself.

If you're using a conforming parser - like the one that's built into JS for the last umpteen years, instead of just eval()ing your code - it will fail.

Code:

JSON.parse("{'a': 1}") // throws SyntaxError: Unexpected token '

aramis · Jun 14, 2014

robject said:
* Perl Object Notation, invented whenever Perl decided it needed a way to store references to data structures. It's very close to JSON. Of course the C community would naturally tend to gravitate toward curly braces for maps and square braces for arrays, wouldn't they?

Square brackets for arrays, absolutely - that's standard C/C++; the Curly Braces are used in C/C++ for defining code chunks almost exclusively - if something requires a single function as target object, but you want to do multiple things, wrap them in curly braces, and they become an unnamed function object.

robject · Jun 14, 2014

inexorabletash said:
If you're using a conforming parser - like the one that's built into JS for the last umpteen years, instead of just eval()ing your code - it will fail.

Code:

JSON.parse("{'a': 1}") // throws SyntaxError: Unexpected token '

Another problem I've seen is that some JSON appears to use "bare string" keys, and even I know that's a no-no.

Hmm. Looks like the JSON I return for my world-building stuff is double-quoted, and yet both it and this sysgen program use the same module for outputting the data.

Very interesting... I wonder what's going on...

AHA! My world-builder code has this directive to the dumper:

$Data:: Dumper::Useqq = 1;

Duh. Apparently I knew about this when I wrote it. Funny how easy it is to forget things.

robject · Jun 16, 2014

Full-steam ahead, then.

Services

So far, I have only one service published. Well, maybe two. What I WILL offer is a number of smaller-grained services for targetted work:

(1) generation of a random UWP (including UWP descriptions and trade code gen).
(2) generation of a "skeleton" of a sector -- topology only.
(3) identification of the habitable zone (AU and orbital track) for a given star.
(4) given the UWP, generation of non-UWP data for a world (tilt, eccentricity, density, albedo, greenhouse, temp...).
(5) whatever else I can think of.

In other words, all of the stuff I currently do, but offered as a set of small services for those who only need one part.

Perl Modules

This weekend I was reorganizing my Perl T5 world-builder library. I have a lot of code, and I didn't like it all piled into two modules. So I'm fixing that. So far, I have:

UWP.pm - manipulation of the UWP, of course.

Star.pm - generation and calculations based on stellar data. This includes calculating the orbital tracks and habitable zones.

Next to-do is:

World.pm - all non-UWP data management will go here: tilt, eccentricity, density, albedo, greenhouse effect, and temperature, plus other bric-a-brac. I'll move it out of the current mega-module it's in.

Sector.pm - orchestrative module for generating a sector's worth of the above, handling the density of the sector, adding various small rift types ("terrain features"), and so on. Which gives me a separate idea for Marc...

robject · Jun 17, 2014

JSON services currently available:

http://eaglestone.pocketempires.com/survey/t5-tools/uwp.pl - builds a random UWP, inlcuding descriptive text.

http://eaglestone.pocketempires.com/survey/t5-tools/star.pl - builds a random stellar configuration.

http://eaglestone.pocketempires.com/survey/t5-tools/world.pl [unfinished] builds out most of a world using world-building code.

charvolant · Oct 8, 2014

I'm a bit late coming to this party but my suggestion would be to split up the data somewhat, give individual URIs to things like worlds, systems etc. and then serve things incrementally, using HATEOS to allow people to navigate to more detailed information. But then I would, wouldn't I?

In this sort of model, looking up a subsector at http://http://eaglestone.pocketempires.com/survey/sector/1/subsector/2 then becomes something like

Code:

{
 "_2020": {
    "name": "Desca",
    "uwp": "A65A878-A",
    "trade": "Wa Ph Co",
    "href": "http://http://eaglestone.pocketempires.com/survey/sector/1/subsector/2/system/2020/planet/1"
  },
 ...
}

what you would then find at http://http://eaglestone.pocketempires.com/survey/sector/1/subsector/2/system/2020/planet/1 would be something like

Code:

{
   "hex": 2020,
   "name": "Desca",
    "starport": "A",
    "siz": 6,
    "atm": 5,
    "hyd": 10
    "pop": 8,
    "gov": 7,
    "law": 8,
    "TL": 10,
   "trade": "Wa Ph Co",
   "position": 1,
   "system" {
     "name": "Willthisdo",
     "stellar": "G7 V G1 VI K5",
     "href": http://http://eaglestone.pocketempires.com/survey/sector/1/subsector/2/system/2020
   }
}

With the system at http://http://eaglestone.pocketempires.com/survey/sector/1/subsector/2/system/2020 containing stellar information and ... well you get the idea.

At each point, there's enough information in the way of names, etc. for people to get an idea of what's at the other end of the link.

The tricky bit is that I suspect that you don't want to have to store this information, just generate it on the fly. My suggestion there would be to use the identifying URIs as the seed to a random number generator; hash the URI using a hash function with a decent amount of uniformity. That way, when people come back, they'll get the same thing. (This could get difficult if you want to go to a new version of the generator with extended data, since you'll need to generate random numbers in the same sequence as previous versions.)

tjoneslo · Oct 11, 2014

charvolant said:
The tricky bit is that I suspect that you don't want to have to store this information, just generate it on the fly. My suggestion there would be to use the identifying URIs as the seed to a random number generator; hash the URI using a hash function with a decent amount of uniformity. That way, when people come back, they'll get the same thing. (This could get difficult if you want to go to a new version of the generator with extended data, since you'll need to generate random numbers in the same sequence as previous versions.)

The challenge with this scheme is finding out the PRNG seed for the existing data, which is akin to breaking a code. Brute force only unless you design your own (i.e broken) PRNG.

While storing a seed to generate a set of data seems like it's a neat idea, the reality is it substitutes storage space for compute time, and for this size of data really isn't worth it.

robject · Dec 15, 2014

tjoneslo said:
The challenge with this scheme is finding out the PRNG seed for the existing data, which is akin to breaking a code. Brute force only unless you design your own (i.e broken) PRNG.

Existing data is not under consideration - it must be stored. And edits to data must also be stored. Random generation does not replace storage. It is a potential convenience only -- and a potentially very convenient convenience at that.

The PRNG is the heart of the matter.

That's why I recommend, and use, a modified version of Jenkins' tiny fast non-cryptographic PRNG. Tested on Perl, Objective-C, Java, and JavaScript, with identical results. Feel free to try it with C, Python, etc (it's short).

Java

Code:

public class Burtle
{
   private int a, b, c, d;

   public int rot( int x, int k )
   {
      return ((x<<k)|(x>>(32-k)));
   }

   public Burtle( int seed )
   {
      a = 0xf1ea5eed;
      b = c = d = seed;
      for (int i=0; i<20; ++i)
      {
         this.randval();
      } 
   }

   public int randval()
   {
      int e;
      e = a - rot(b, 27);
      a = b ^ rot(c, 17);
      b = c + d;
      c = d + e;
      d = e + a;
      return d;
   }

   public static void main( String[] args )
   {
      Burtle b = new Burtle(8063);
      for (int i=0; i<20; ++i)
      {
         System.out.printf( "%08x\n", b.randval() );
      }
   }
}

Objective-C

Code:

//
//  Burtle.m
//  Burtle
//
//  Created by Robert Eaglestone on 12/8/12.
//  Copyright (c) 2012 __MyCompanyName__. All rights reserved.
//

#import "Burtle.h"

@implementation Burtle

long a, b, c, d;

#define rot(x,k) (((x)<<(k))|((x)>>(32-(k))))

- (long) randval
{
    long e;
    e = a - rot(b, 27);
    a = b ^ rot(c, 17);
    b = c + d;
    c = d + e;
    d = e + a;
    return d;
}

- (id)init: (long) seed
{
    int i;
    self = [super init];
    if (self) 
    {
        // Initialization code here.
        a = 0xf1ea5eed, b = c = d = seed;
        for (i=0; i<20; ++i)
        {
            [self randval];
        }
    }
    
    return self;
}

@end

JavaScript (thanks to Joshua Bell)

Code:

(function(global) {

  function uint32(x) { return x >>> 0; }

  function Burtle(seed) {
    if (!(this instanceof Burtle)) { return new Burtle(seed); }

    seed = uint32(seed);
    this.a = 0xf1ea5eed;
    this.b = this.c = this.d = seed;
    for (var i = 0; i < 20; ++i) {
      this.randval();
    }

    return this;
  }

  function rot(x, k) { return (x << k) | (x >> (32 - k)); }
  Burtle.prototype.randval = function() {
    var e = uint32(this.a - rot(this.b, 27));
    this.a = this.b ^ rot(this.c, 17);
    this.b = uint32(this.c + this.d);
    this.c = uint32(this.d + e);
    this.d = uint32(e + this.a);
    return this.d;
  };

  global.Burtle = Burtle;
}(self));

Perl (with a method to generate a seed from a string)

Code:

package Jenkins2rSmallPRNG;
require Exporter;
@ISA = qw(Exporter);
@EXPORT_OK = qw(srand srandByString randval rand1d rand2d flux);

use bigint;

my ($a, $b, $c, $d);

sub rot { (($_[0])<<($_[1])) | (($_[0])>>(32-($_[1]))) }

sub randval
{
   my $e = $a - rot( $b, 27 );
      $a = $b ^ rot( $c, 17 );
      $b = $c + $d;
      $c = $d + $e;
      $d = $e + $a;
   
   return $d;
}

sub init
{
   $a = 0xf1ea5eed;
   $b = $c = $d = shift; # seed value
   randval() for 0..19;
}

# ------------------------------------------------------------------

sub rand1d
{
   return abs randval() % 6;
}

sub rand2d
{
   my $num = abs randval();
   return ($num % 6), ($num/6) % 6;
}

sub flux
{
   my $num = abs randval();
   return ($num % 6) - (($num/6) % 6);
}

sub srand
{
   init(shift);
}

sub srandByString
{
   my $string = shift;
   my $val = 0;
   foreach my $char (split '', $string)
   {
      $val += (ord $char);
      $val <<= 8;
      #print STDERR "seed=$val\n";
   }
   init( $val );
}

1; # return a true value as all packages should

robject · Dec 15, 2014

I use the PRNG as follows:

* a rand1d() function: return abs randval() % 6;

* a rand2d() function, which returns a pair, taking advantage of the width of the random 32-bit signed integer:

Code:

   number = abs randval();
   return (number % 6), (number/6) % 6;

(If a die is approximately 3 bits, then a 32-bit value should hold as much as 10 or even 11 die rolls. But that sort of economy is really completely unnecessary for this application. Really I probably shouldn't even use one random int for two dice.)

* a flux() function, which works in much the same way as rand2d():

Code:

   number = abs randval();
   return (number % 6 - (number/6) % 6;

The trickiest is a function that generates a seed value from a string. I should be using something like MD5 or SHA, and keeping the first 32 bits. Instead, I'm just adding and shifting. Perl avoids the overflow, but other languages are not so forgiving, so a better way is needed:

Code:

   foreach character in string
   {
      val += (ord $char);
      val <<= 8;
   }
   init( $val );

robject · Dec 15, 2014

The Downside

The downside to using a fixed-width hashing algorithm is that our data space tends to get pretty long. If we can uniquely NAME a sector of space, then the resulting seed value should also be unique -- length of seed be darned.

Consider the bit depth of a system-presence-only sector map: i.e. it's a 40 x 32 matrix of "yes" and "no", so there are 2^1280 possible sector configurations. 1280 bits, in other words. But with a 32 bit seed, we can only model a tiny fraction of them all.

If we kept it at the subsector-level, then we'd "only" need 80 bits.

Now consider the bit depth of a star-plus-world-presence-only star system map: maybe 5 bits for the orbit slots, but maybe 10 bits per star for up to 8 stars (not including exclusions and limitations in the rules which reduce bit count), and you've got around 85 bits.

Consider the bit depth of a UWP: starport (4 bits) + size (5 bits) + atm (4 bits) + hyd (4 bits) + pop (4 bits) + gov (5 bits) + law (4 bits) + TL (5 bits) = 35 bits. Now add PBG (maybe another 10 bits) for 45 bits. Now discard the edge cases thrown out by the rules... and you have something less than 45 bits. How much less, I don't know. See the problem?

So forget MD5, SHA1, even SHA512. In other words, I'm keeping my fake hashing algorithm for now, because it is only limited by the size of the numbers used. If I can use Big Integers, the problem is solved.

robject · Dec 15, 2014

Here's how it's done in Perl. Here's the module with the dice roller functions removed.

It's a lot slower than the other algorithms (e.g. MD5), but it's open-ended. If I get bored I might just split the string into chunks and SHA1 each segment. Maybe. (OK, I did it).

Code:

package Burtle;

require Exporter;
@ISA = qw(Exporter);
@EXPORT_OK = qw(srand srandByString randval rand1d rand2d flux);

[COLOR=Red][B]use bigint try => 'GMP';
use Digest::SHA qw(sha1_hex);
[/B][/COLOR]
my ($a, $b, $c, $d);

sub rot { (($_[0])<<($_[1])) | (($_[0])>>(32-($_[1]))) }

sub randval
{
   my $e = $a - rot( $b, 27 );
      $a = $b ^ rot( $c, 17 );
      $b = $c + $d;
      $c = $d + $e;
      $d = $e + $a;
   
   return $d;
}

sub init
{
   $a = 0xf1ea5eed;
   $b = $c = $d = shift; # seed value
   randval() for 0..19;
}

# ------------------------------------------------------------------

[I](rand1d(), rand2d(), and flux() go here)[/I]

sub srand
{
   init(shift);
}
[COLOR=Red][B]
sub fakehash
{
   my $string = shift;
   my $chunksize = 4; # 4 alphanum is roughly 24 bits
   my @chunks = ( $string =~ /.{1,$chunksize}/gs );   
   foreach $word (@chunks)
   {
      my $hashcode = hex Digest::SHA::sha1_hex( $word );
      $val += $hashcode;
      $val <<= 24;
   }
   return $val;
}

[/B][/COLOR] sub srandByString
{
   my $string = shift;
[COLOR=Red][B]   my $val = fakehash( $string );
[/B][/COLOR]   init( $val );
   return $val;
}

1; # return a true value as all packages should

tjoneslo · Dec 15, 2014

Here is the Python version, but I'm not convinced that it returns the same set of random numbers your implementation do given the same seed.

Code:

class BurtleRandom(object):

    def __init__(self, seed):
        self.random_seed = [0xf1ea5eed, int(seed), int(seed), int(seed)]
        
    def rot (self, x, k):
        return ((x << k) & 0xffffffff) | (x >> (32 - k))
    
    def value(self):
        extra = self.random_seed[0] - self.rot(self.random_seed[1], 27)
        self.random_seed[0] = self.random_seed[1] ^ self.rot(self.random_seed[2], 17)
        self.random_seed[1] = (self.random_seed[2] + self.random_seed[3]) & 0xffffffff
        self.random_seed[2] = (self.random_seed[3] + extra) & 0xffffffff
        self.random_seed[3] = (extra + self.random_seed[0]) & 0xffffffff
        return self.random_seed[3]

    def rand1d(self):
        return abs(self.value()) % 6
    
    def rand2d (self):
        seed = abs(self.value())
        return (seed % 6, (seed / 6) % 6)
    
    def flux(self):
        seed = abs(self.value())
        return (seed % 6) - ((seed / 6) % 6)

robject · Dec 16, 2014

Does the code use 32 bit signed integers? 0xffffffff makes me think they're forced to be so, but... well, let's try it and see.

You may be right; my Perl code doesn't enforce the 32 bit boundary, so will probably vary. But otherwise, if the sign and bit widths are the same, the results should be the same.

I would suggest you aim to duplicate Joshua Bell's experiment here, which does it with JavaScript: http://calormen.com/tmp/prng/burtle.html

The "nice" thing about using that as a target is that we get to use 64-bit IEEE754 floating point numbers i.e. C doubles. I should ask Joshua to open up his implementation to using those without forcing them to 32 bit values.

JSON Sector Data Service

SOC-14 10K

SOC-12

SOC-14 10K

SOC-14 1K

SOC-14 1K

SOC-14 1K

SOC-14 10K

SOC-14 1K

Administrator

SOC-14 10K

SOC-14 10K

SOC-14 10K

SOC-6

SOC-14 1K

SOC-14 10K

SOC-14 10K

SOC-14 10K

SOC-14 10K

SOC-14 1K

SOC-14 10K

Similar threads