• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.
  • We, the systems administration staff, apologize for this unexpected outage of the boards. We have resolved the root cause of the problem and there should be no further disruptions.

Legacy SEC line parser

robject

SOC-14 10K
Admin Award
Marquis
I took Joshua Bell's examples of aberrant SEC lines, and made a regex that will parse them all successfully.

WARNING: Joshua's examples are not comprehensive. However, they are interesting.
WARNING: I had to set the allegiance code to end in a non-numeral; hence, allegiance codes that end in a number would be rejected. This is the main problem with parsing legacy data: when an allegiance code looks like a spectral code.

Here's the essential regex:

my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star)
= /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;


Here is the entire Perl code, for context.

Code:
foreach (<DATA>)
{
   print "$_\n";
   my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star) 
   = /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;
   
   print "name          : $name\n"
       , "hex           : $hex\n"
       , "uwp           : $uwp\n"
       , "bases         : $bases\n"
	   , "codes         : $codes\n"
	   , "zone          : $z\n"
	   , "pbg           : $pbg\n"
	   , "alleg         : $al\n"
	   , "stellar       : $star\n";
	   
   print "\n";
}

__DATA__
Iashplie      0701 B658452-C  Z Ni                 917 Zh     K9V M0D
Prianaf       1520 XA99000-0    Ba Lo Ni        F  003 Zh     F3V M7D
0103               0103 X348002-0   Lo Ni             020 Na F7 V
Eahyo              0131 X433731-0   Na Po             400 As M7 V M9 D
                     0639 X440000-0   Ba Lo Ni Po De         013 --
Nnurukgr             0705 E637900-5   Hi In                  213 AW
Kue Urzue          0207 C6568AB-5                   R 100 J- F2 V
Zimigsika          0209 A77399B-C J Hi In Cp          412 J- F0 V
.             3238 X610000-0    Ba Lo Ni           012 --     F4V
Rudzaghz      0139 D261610-2    Ni                 314 Va     G3V M3D
G-1           1715 A301735-9  N Ic Na Va           224 Na
G-2           1718 E7A0437-4    De Ni              604 Na
Dujj't'kzo                0101 E331000-0    Ba Ni Po                        703 Ia M3 III M4 V
Hastitan                  0104 E460343-4    De Lo Ni Po                     103 Na M2 V M4 V
Iyaaahai      2609 C7C05L6-9    De Ni              905 A5 M7 III M7 D
Eaaileishryaor2610 BAD57M6-E    Fl                 404 As M3 V
Ihkasya            1020 B547100-6   Ni                603 Na
Ftihahe.           1023 D5A07M8-C   De                403 As
 
Yeah, this is why I've given up on SEC. When I got started I had a very liberal parser and lots of special cases to clean things up afterwards (including I/1 and O/0). Now that I have a corpus I'm happy with I tell new submitters to fix things, since every time I tweaked the parser something else would break. :(

Watch out for G as a travel zone (also in use in thalassogen's generator), and the BG in PBG may be > 9 c/o some generators.

I've never run into files with a missing allegiance code but stellar data, so filtering out numbers hasn't been necessary (and there are all the V# and A# polities). There are sometimes other columns in there, or wacky allegiances like JP/J-, though.
 
There are sometimes other columns in there, or wacky allegiances like JP/J-, though.

J- : Julian Protectorate
Jp : Pirbarish Starlane (part of Julian Protectorate) (Assuming JP = Jp)

Years ago I found an amalgumated list on the HIWG CD and used it in my Traveller Universe program (version 1.x). You can find it here.
 
Just to be clear for posterity, I was sharing an example of a 5 character, 2-part allegiance code found in the wild, which makes consuming arbitrary data tricky.


(And yes, I've only seen it used for the Protectorate. Thanks Hemdian!)
 
@Joshua: You're right.

Tash's Corollary: No sense in catering to bad data. Set the standard.
 
Back
Top