Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Sandbox

Comments on The 50 substrings that validate any string of Roman numerals [FINALIZED]

Post

The 50 substrings that validate any string of Roman numerals [FINALIZED]

+2
−0

Now posted: The 50 substrings that validate any string of Roman numerals


Given a string of Roman numerals, decide whether it forms a valid Roman number. If not, output the substring that proves this, from the list of 50 strings described below.

Relevant fact

This challenge is based around the following fact:

A string of Roman numerals is a valid Roman number if and only if it contains none of the following 50 strings as a substring:

"CCCC", "CCD", "CCM", "CDC", "CMC", "CMD", "CMM", "DCD", "DCM", "DD", "DM", "IC", "ID", "IIII", "IIV", "IIX", "IL", "IM", "IVI", "IXC", "IXI", "IXL", "IXV", "IXX", "LC", "LD", "LL", "LM", "LXC", "LXL", "MMMM", "VC", "VD", "VIV", "VIX", "VL", "VM", "VV", "VX", "XCC", "XCD", "XCL", "XCM", "XCX", "XD", "XLX", "XM", "XXC", "XXL", "XXXX"

This works for any length string, if a valid Roman number is defined as follows:

Valid Roman numbers[1]

  • Each numeral appears no more than 3 times consecutively.
  • Each of V (5), L (50), and D (500) appears no more than once consecutively.
  • A Roman number is constructed by concatenating the strings representing its thousands, hundreds, tens, and units components.
Decimal Thousands Hundreds Tens Units
1 M C X I
2 MM CC XX II
3 MMM CCC XXX III
4 CD XL IV
5 D L V
6 DC LX VI
7 DCC LXX VII
8 DCCC LXXX VIII
9 CM XC IX

So, for example, 2345 would be represented as the concatenation of MM (for 2000), CCC (for 300), XL (for 40), and V (for 5), or MMCCCXLV.

This defines a unique correct representation for each number from 1 to 3999 (4000 and above are not representable without breaking the first two rules). The 50 substrings method described above will identify each of these 3999 strings as valid, and all other strings of Roman numerals as invalid.

Input

  • A string containing only Roman numerals (I, V, X, L, C, D, M).
  • The string will have length at least 1.
  • The string will have length at most 15 (this is the length of the longest valid string of Roman numerals).

Output

  • If the input is a valid Roman number, output a consistent value indicating this.
    • Consistent means that the value must be the same for all valid Roman numbers.
    • The output for a valid Roman number must not be one of the strings from the list of 50.
  • If the input is not a valid Roman number, output exactly 1 string from the list of 50.
    • The output in this case must be a substring of the input.
    • If the input has 2 or more of the strings from the list of 50 as substrings, you may choose any 1 of them to be the output, but you must choose only 1 of them (you must not output 2 or more).

Examples

A valid Roman number

The input MCMXCVI is the unique correct representation of 1996. It contains none of the 50 strings.

A string that is not a valid Roman number

Although the input MMXDIII might be suspected of representing 2493, it is not the unique correct representation of this number (which is MMCDXCIII). Note that it has XD as a substring, identifying it as invalid. The only correct output is therefore XD.

An invalid string with more than 1 potential output

The input MMCCMDXXV has 2 substrings that make it invalid, so either CCM or CMD would be correct outputs. It would not be correct to output both of these, or to output their overlap CCMD, as this is not one of the 50 strings.

Test cases

Test cases are in the format INPUT : VALID, OUTPUTS. Note that only one of the valid outputs can be chosen - outputting 2 or more is incorrect.

I : VALID
V : VALID
X : VALID
L : VALID
C : VALID
D : VALID
M : VALID
II : VALID
VV : VV
XX : VALID
LL : LL
CC : VALID
DD : DD
MM : VALID
III : VALID
VII : VALID
IVI : IVI
IIV : IIV
CCI : VALID
CCV : VALID
CCX : VALID
CCL : VALID
CCC : VALID
CCD : CCD
CCM : CCM
IIII : IIII
MLDI : LD
MXXC : XXC
DCIIX : IIX
MCXXXX : XXXX
MCCCCXVI : CCCC
MMLXCVII : LXC
MMMCMXCIX : VALID
MMMDCCCLXXXVIII : VALID
MMCCCXLV : VALID
MCMXCVI : VALID
MMXDIII : XD
MMCDXCIII : VALID
MMCCMDXXV : CCM, CMD
XXX : VALID
CLLX : LL
DXXDMMV : DM, XD
CCDDDIMDD : DD, IM, CCD
VLCXIVXMCVXLC : VX, VL, LC, XM
DVLIILVCXVXVMLI : VX, VL, IL, VM, VC
VVDLMIVILXXDX : VV, VD, IL, XD, LM, IVI
DMXXCMILVCMLLMV : DM, IL, LL, VC, LM, XXC, XCM
CMXDVLCCDDLXLXC : DD, VL, LC, XD, XLX, LXC, CCD, LXL
XDIXCLLMVVLCMCM : VV, VL, LC, XD, LL, LM, IXC, CMC, XCL
IIXXCDVVLMILVDD : DD, VV, VL, VD, IL, LM, XXC, IXX, IIX, XCD
DDMIIXXCMCCDCMM : DD, DM, CMC, CMM, XXC, IXX, DCM, CDC, XCM, CCD, IIX

Scoring

This is a code golf challenge. Your score is the number of bytes in your code. Lowest score for each language wins.

Explanations are optional, but I'm more likely to upvote answers that have one.


  1. This is a common modern set of rules, described as Standard form on Wikipedia. It does not reflect all usages during history, but will be the basis of this challenge, since otherwise the 50 substrings approach does not work. ↩︎

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Challenge type (3 comments)
Challenge type
trichoplax‭ wrote 9 months ago · edited 9 months ago

As written, the challenge is simply to validate a string of Roman numerals. I suspect that in most languages the golfiest way will be regex, and the list of 50 substrings will not be used. As such, they are irrelevant background info about Roman numerals, rather than being part of the challenge. I'm considering changing the requirements to make the challenge about the substrings.

One possible challenge type is Kolmogorov complexity: the challenge is to output the 50 strings in any order.

An alternative could be to output any set of strings that has this property, even if it is a set of more than 50 strings (in case there is a larger set that is golfier). However, I have no reason to believe that there would be alternative sets. The set of 50 strings is minimal - would there be reason to expect a larger set to be both valid and golfier in some languages? In languages where there is no golfier set, this would reduce to the Kolmogorov complexity challenge.

trichoplax‭ wrote 9 months ago

Another possible challenge:

Given a string of Roman numerals, output either "valid" or a string from the list of 50 that proves it invalid. This may involve the Kolmogorov complexity challenge as a subchallenge. Is this good or bad?

trichoplax‭ wrote 9 months ago

I have now edited to change the challenge to match the previous comment. Is there any problem with this? Anything to improve?