I took Joshua Bell's examples of aberrant SEC lines, and made a regex that will parse them all successfully.
WARNING: Joshua's examples are not comprehensive. However, they are interesting.
WARNING: I had to set the allegiance code to end in a non-numeral; hence, allegiance codes that end in a number would be rejected. This is the main problem with parsing legacy data: when an allegiance code looks like a spectral code.
Here's the essential regex:
my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star)
= /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;
Here is the entire Perl code, for context.
WARNING: Joshua's examples are not comprehensive. However, they are interesting.
WARNING: I had to set the allegiance code to end in a non-numeral; hence, allegiance codes that end in a number would be rejected. This is the main problem with parsing legacy data: when an allegiance code looks like a spectral code.
Here's the essential regex:
my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star)
= /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;
Here is the entire Perl code, for context.
Code:
foreach (<DATA>)
{
print "$_\n";
my ($name, $hex, $uwp, $bases, $codes, $z, $pbg, $al, $star)
= /^(.*?)\.?\s*(\d{4})\s(.......-.)\s\s?(\w?)\s*(.*?)\s*([ARBUF]?)\s{0,2}(\d{3})\s*(.\D)?\s*(.*)$/;
print "name : $name\n"
, "hex : $hex\n"
, "uwp : $uwp\n"
, "bases : $bases\n"
, "codes : $codes\n"
, "zone : $z\n"
, "pbg : $pbg\n"
, "alleg : $al\n"
, "stellar : $star\n";
print "\n";
}
__DATA__
Iashplie 0701 B658452-C Z Ni 917 Zh K9V M0D
Prianaf 1520 XA99000-0 Ba Lo Ni F 003 Zh F3V M7D
0103 0103 X348002-0 Lo Ni 020 Na F7 V
Eahyo 0131 X433731-0 Na Po 400 As M7 V M9 D
0639 X440000-0 Ba Lo Ni Po De 013 --
Nnurukgr 0705 E637900-5 Hi In 213 AW
Kue Urzue 0207 C6568AB-5 R 100 J- F2 V
Zimigsika 0209 A77399B-C J Hi In Cp 412 J- F0 V
. 3238 X610000-0 Ba Lo Ni 012 -- F4V
Rudzaghz 0139 D261610-2 Ni 314 Va G3V M3D
G-1 1715 A301735-9 N Ic Na Va 224 Na
G-2 1718 E7A0437-4 De Ni 604 Na
Dujj't'kzo 0101 E331000-0 Ba Ni Po 703 Ia M3 III M4 V
Hastitan 0104 E460343-4 De Lo Ni Po 103 Na M2 V M4 V
Iyaaahai 2609 C7C05L6-9 De Ni 905 A5 M7 III M7 D
Eaaileishryaor2610 BAD57M6-E Fl 404 As M3 V
Ihkasya 1020 B547100-6 Ni 603 Na
Ftihahe. 1023 D5A07M8-C De 403 As