• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

Traveller XML star database

tjoneslo

SOC-14 1K
Staff member
Admin Award
Administrator
Count
Following the calls for an XML format for storing Traveller star data I modified/updates/fixed the AstroML files originally written by Mark A. Preston.

I believe I've managed to get this to the point where I can represent all of the data generated by the CT book 6 or MT extended system generation.

An example XML file of part of the Regina subsector, including the expanded Regina system.

A complete reference set for all reference values.

Finally, the XML Schema document.

Note: I decided to follow Mark's lead and use XML Schema rather than writing a full XML DTD.

Comments and criticizims are welcome.
 
I've been working on my own XML format, which (strangely... heh) looks very similar - I guess since the format is largely data-field-driven, that makes sense.

I've represented some things differently to how you've represented them. I'll take a more complete look at your work and try to provide some contrasts/criticisms.

FMI: Why is a schema an improvement on a DTD?
 
Originally posted by kaladorn:
I've represented some things differently to how you've represented them. I'll take a more complete look at your work and try to provide some contrasts/criticisms.
Thanks, I'd love to see yours to see what I'm missing or what design decisions you made

FMI: Why is a schema an improvement on a DTD?
A question which starts flame-wars. XMLSchema is not better than a DTD, it's a different way of doing exactly the same thing. The DTD is more flexable, but the data we're trying to represent does not necessarly require the full flexability of a DTD. Schema simplifies some things, making it easier to write. Plus it has the advantage of being written in XML, not another markup language altogether.

Your call as to if the reasons I gave are sufficient to use one over the other. Or offer me a list of reason why to use a DTD...
 
What is XML?

In a nutshell, it is a markup language. If you know about HTML, you could think of it as a subset of XML (that is to say, XML could define HTML syntax). SGML is another related (XML sprung from that root) mark up language. These are used for marking up documents (document formatting). They also represent ways to organize data/information, which is I guess the real point.

XML goals:

The design goals for XML are:

1.XML shall be straightforwardly usable over the Internet.
2.XML shall support a wide variety of applications.
3.XML shall be compatible with SGML.
4.It shall be easy to write programs which process XML documents.
5.The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6.XML documents should be human-legible and reasonably clear.
7.The XML design should be prepared quickly.
8.The design of XML shall be formal and concise.
9.XML documents shall be easy to create.
10.Terseness in XML markup is of minimal importance.

These give you some idea what they were shooting for.

When I'm done my version of TravXML (an XML format for system/universe representation, etc - I think I even picked the same name as tjoneslo), I'll present it here.

As to the DTD:

A DTD *is* an XML doc. (not that such a statement implies much in the way of constraint!)
DTD's can be tricky to get right, but once you have one, it is fairly constraining (which is the point!). I'm not sure how a schema really works, but I guess I can Google my way to an answer.

It seems to me your argument is that the schema is easier to write and less work than the DTD. I could imagine that is true
 
Originally posted by kaladorn:
As to the DTD:
I'm not sure how a schema really works, but I guess I can Google my way to an answer.

It seems to me your argument is that the schema is easier to write and less work than the DTD. I could imagine that is true
My reference document:
http://www.w3.org/TR/xmlschema-0/

After reading through both DTD's and Schema's I found schema to be easier to read, write and understand. I'm new to both of these and there may be limitations to the schema I'm not aware of.
 
I briefly reviewed your page. Nice start. Some good work there. Two points:

1. I think you underscope the project (or must limit it to within only a subsegment of the OTU) if you figure 15,000 rows. I'd guess with 11,000 systems, each having an average of say 10 or 12 bodies (picked out of the air, but how many are in our system? I'd guess 40 or so with moons etc) and that in the Imperium alone, the scope is much much larger. Add in the Extents, the Sol Confed, the Hivers, the 2000 Worlds, the Aslan, the Zhos and all the minor powers/polities, and independents, and you probably have a *far* higher count of systems. I don't know if anyone has ever done a full count, but I'd call it 'big'.

2. Where in your XML format (looking at the Regina example) do I indicate the survey date? (That is 'the 1065 second survey'). that should be a field in one of the outermost wrappers of a sector. Or, just to cover non-Imperial cases, there should be an optional override field for this on each body....

Just my first thoughts.
 
Originally posted by kaladorn:
I briefly reviewed your page. Nice start. Some good work there. Two points:

1. I think you underscope the project (or must limit it to within only a subsegment of the OTU) if you figure 15,000 rows.
Anthony's map site give a grand total of 28,621 system in charted space. The average number of bodies in a sytem is 4 or 5, so I'm probably off by an order of magnitude. But most surveyers are not going to have fully generated 66 sectors worth of data.

2. Where in your XML format (looking at the Regina example) do I indicate the survey date?
Good thought. I'll add a survey date. I was concentrating upon getting the base book 6 data in first.
 
4 or 5 bodies?

You usually have
- primary
- mainworld
- inner planets
- outer planets
- maybe a gas giant
- maybe a belt
(or two or three)
- possibly a companion star

I'm a wee bit curious about the 4 to 5 figure. that seems low.

You are right that not everyone will have these mapped. But if we can build the tool to handle the limit case, then others become trivial.

BTW, I too like the use of XML and SQL. (Though Oracle SQL is a little special, IIRC from my time with 8i and 9i).
 
Originally posted by kaladorn:
4 or 5 bodies?
Statistical analysis of the Book 6 tables: Roll 2D for number of orbits (avg. 7). Type M stars (55%) apply -4 DM (makes the average 3 for those stars). Plus 1/3 of the systems have one or more empty orbits. I left out the moons, but that adds 5 per gas giant and 0.5 per terresterial body.

It's not to say you can't come up with a Terra or Regina system, they just are on the large size and shouldn't be used as an average.
 
Depends on your stats. One Terra will skew quite a few other systems.

I'm just suggesting that if you include moons, it might be more like 7 or 9 on average.
 
Hello.
Just for your info, I'm currently entering all the sectors i have into an excel spreedsheet and sofar i'v done 44 sectors (only the system UWP) out of 96 sectors and the number of systems is 18009 (basicaly the four rows north including CORE).
Also most systems are binary's arn't they.
Thats a lot of planetary bodies.
Bye.
 
Originally posted by kaladorn:


<snip>

2. Where in your XML format (looking at the Regina example) do I indicate the survey date? (That is 'the 1065 second survey'). that should be a field in one of the outermost wrappers of a sector. Or, just to cover non-Imperial cases, there should be an optional override field for this on each body....

<snip>

This is of supreme importance from a database point of view.

I think each body in each system needs, minimum, a First Survey Date, First Survey Allegiance, and First Survey Level; and Last Survey Date, Last Survey Allegiance, and Last Survey Level.

Ideally, the database would store the complete history of Survey Dates, the nation doing the Survey, the organization doing the Survey, the level of the Survey, and that full body stats documented by the Surveyors at that time, and the true full body stats at that time (the difference between what the GM knows *is*, and what the Surveyors actually discover). This would allow for queries that show the history of Surveys, as well as the differences in how accurate the various surveys were. The ebb and flow of all facets is recorded across time, population, governments, allegiances, tech levels, etc. This would amount to a radical shift in database structure, where the date entered determined the data retrieved, all eras of the Traveller Universe could be stored in a single database. Queries in that database could span all data. The benefits of this are tremendous. Of course, the data is no longer *stored* in UWP format, but that is of no importance. A query could generate an original-style UWP line from the data anytime anyone wanted to view it in that format.

Now, I've only read preliminary XML documentation. But for the most part, XML data appears to be stored in flat file with no indexing. Honestly, given the huge amount of data that could potentially accumulate for a single body in a single system containing the ideal (my ideal, anyway) date and survey data would be pretty big, and when multiplied by the 10-40 bodies per system (Galactic 2.4 and HE 1.04 seem to generate a lot of bodies per system) and the number of systems . . . across all those dates (potentiall all dates, but more likely just the famous Survey times) . . . that's a pretty big flat file. It would *seem*, on the face of it, that it would be pretty slow. Especially when it came to updating data in the database.

Is there a way to index the various parts of an XML file? Any of the normal advantages of relational database systems? Like a database manager and query optimizer?

EDIT>
I went off an did some research, apparently XSLT Keys can help with this . . .
 
I'll look into adding a history element. I would have more than just the allegiance in the history; population, govenment, Law Level, Tech Level all can vary widely over time.

As for the flat XML file being large, this is true. But performance depends upon what you want to do with the data. Applications dealing with XML files handle the large ones by either reading them into a memory database where queries and other lookup can be optimized. Or they do a pass through them, reading them one tag at a time and writing the relevant ones to another file. In either case, the structure of the XML file is somewhat irrelevant.

If you are planning on doing the kind of database query manipluation you seem to be implying with a dozen complete surveys of charted space to plot the Imperial population growth and write scientific papers on spread of humaniti, XML may not be the best storage medium. But then most gamers don't get into that level of analysis.
 
Originally posted by tjoneslo:
I'll look into adding a history element. I would have more than just the allegiance in the history; population, govenment, Law Level, Tech Level all can vary widely over time.
Right, I mentioned that all stats of the body be recorded, both what the surveyor finds, and what is actually true.

Big surveys can take a while, so the duration of the Survey would have to be included, but the date of record, IMO, would be the date the Survey was completed.

Originally posted by tjoneslo:

As for the flat XML file being large, this is true. But performance depends upon what you want to do with the data. Applications dealing with XML files handle the large ones by either reading them into a memory database where queries and other lookup can be optimized. Or they do a pass through them, reading them one tag at a time and writing the relevant ones to another file. In either case, the structure of the XML file is somewhat irrelevant.

If you are planning on doing the kind of database query manipluation you seem to be implying with a dozen complete surveys of charted space to plot the Imperial population growth and write scientific papers on spread of humaniti, XML may not be the best storage medium. But then most gamers don't get into that level of analysis.
Yeah. Too bad, that. However, if we just include the minimums I mentioned, First values and Last values, per body, that alone will allow for considerable advantages, and the UWP line would only be extended (beyond page printing limits, it's true, but for why it's being included, this info won't be printed too often), rather than having to create a 1st-3rd Normal Form database.
 
Back
Top