• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

Legacy SEC line parser

robject

SOC-14 10K
Admin Award
Marquis
I took Joshua Bell's examples of aberrant SEC lines, and made a regex that will parse them all successfully.

WARNING: Joshua's examples are not comprehensive. However, they are interesting.
WARNING: I had to set the allegiance code to end in a non-numeral; hence, allegiance codes that end in a number would be rejected. This is the main problem with parsing legacy data: when an allegiance code looks like a spectral code.

Here's the essential regex:

my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star)
= /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;


Here is the entire Perl code, for context.

Code:
foreach (<DATA>)
{
   print "$_\n";
   my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star) 
   = /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;
   
   print "name          : $name\n"
       , "hex           : $hex\n"
       , "uwp           : $uwp\n"
       , "bases         : $bases\n"
	   , "codes         : $codes\n"
	   , "zone          : $z\n"
	   , "pbg           : $pbg\n"
	   , "alleg         : $al\n"
	   , "stellar       : $star\n";
	   
   print "\n";
}

__DATA__
Iashplie      0701 B658452-C  Z Ni                 917 Zh     K9V M0D
Prianaf       1520 XA99000-0    Ba Lo Ni        F  003 Zh     F3V M7D
0103               0103 X348002-0   Lo Ni             020 Na F7 V
Eahyo              0131 X433731-0   Na Po             400 As M7 V M9 D
                     0639 X440000-0   Ba Lo Ni Po De         013 --
Nnurukgr             0705 E637900-5   Hi In                  213 AW
Kue Urzue          0207 C6568AB-5                   R 100 J- F2 V
Zimigsika          0209 A77399B-C J Hi In Cp          412 J- F0 V
.             3238 X610000-0    Ba Lo Ni           012 --     F4V
Rudzaghz      0139 D261610-2    Ni                 314 Va     G3V M3D
G-1           1715 A301735-9  N Ic Na Va           224 Na
G-2           1718 E7A0437-4    De Ni              604 Na
Dujj't'kzo                0101 E331000-0    Ba Ni Po                        703 Ia M3 III M4 V
Hastitan                  0104 E460343-4    De Lo Ni Po                     103 Na M2 V M4 V
Iyaaahai      2609 C7C05L6-9    De Ni              905 A5 M7 III M7 D
Eaaileishryaor2610 BAD57M6-E    Fl                 404 As M3 V
Ihkasya            1020 B547100-6   Ni                603 Na
Ftihahe.           1023 D5A07M8-C   De                403 As
 
Yeah, this is why I've given up on SEC. When I got started I had a very liberal parser and lots of special cases to clean things up afterwards (including I/1 and O/0). Now that I have a corpus I'm happy with I tell new submitters to fix things, since every time I tweaked the parser something else would break. :(

Watch out for G as a travel zone (also in use in thalassogen's generator), and the BG in PBG may be > 9 c/o some generators.

I've never run into files with a missing allegiance code but stellar data, so filtering out numbers hasn't been necessary (and there are all the V# and A# polities). There are sometimes other columns in there, or wacky allegiances like JP/J-, though.
 
There are sometimes other columns in there, or wacky allegiances like JP/J-, though.

J- : Julian Protectorate
Jp : Pirbarish Starlane (part of Julian Protectorate) (Assuming JP = Jp)

Years ago I found an amalgumated list on the HIWG CD and used it in my Traveller Universe program (version 1.x). You can find it here.
 
Just to be clear for posterity, I was sharing an example of a 5 character, 2-part allegiance code found in the wild, which makes consuming arbitrary data tricky.


(And yes, I've only seen it used for the Protectorate. Thanks Hemdian!)
 
@Joshua: You're right.

Tash's Corollary: No sense in catering to bad data. Set the standard.
 
Back
Top