The 50 substrings that validate any string of Roman numerals

+8

−0

Given a string of Roman numerals, decide whether it forms a valid Roman number. If not, output the substring that proves this, from the list of 50 strings described below.

Relevant fact

This challenge is based around the following fact:

A string of Roman numerals is a valid Roman number if and only if it contains none of the following 50 strings as a substring:

"CCCC", "CCD", "CCM", "CDC", "CMC", "CMD", "CMM", "DCD", "DCM", "DD", "DM", "IC", "ID", "IIII", "IIV", "IIX", "IL", "IM", "IVI", "IXC", "IXI", "IXL", "IXV", "IXX", "LC", "LD", "LL", "LM", "LXC", "LXL", "MMMM", "VC", "VD", "VIV", "VIX", "VL", "VM", "VV", "VX", "XCC", "XCD", "XCL", "XCM", "XCX", "XD", "XLX", "XM", "XXC", "XXL", "XXXX"

This works for any length string, if a valid Roman number is defined as follows:

Valid Roman numbers^[1]

Each numeral appears no more than 3 times consecutively.
Each of V (5), L (50), and D (500) appears no more than once consecutively.
A Roman number is constructed by concatenating the strings representing its thousands, hundreds, tens, and units components.

Decimal	Thousands	Hundreds	Tens	Units
1	M	C	X	I
2	MM	CC	XX	II
3	MMM	CCC	XXX	III
4		CD	XL	IV
5		D	L	V
6		DC	LX	VI
7		DCC	LXX	VII
8		DCCC	LXXX	VIII
9		CM	XC	IX

So, for example, 2345 would be represented as the concatenation of MM (for 2000), CCC (for 300), XL (for 40), and V (for 5), or MMCCCXLV.

This defines a unique correct representation for each number from 1 to 3999 (4000 and above are not representable without breaking the first two rules). The 50 substrings method described above will identify each of these 3999 strings as valid, and all other strings of Roman numerals as invalid.

Input

A string containing only Roman numerals (I, V, X, L, C, D, M).
The string will have length at least 1.
The string will have length at most 15 (this is the length of the longest valid string of Roman numerals).
You may choose to take input in lower case instead, provided that you also use lower case in your output for an input that is not a valid Roman number.

Output

If the input is a valid Roman number, output a consistent value indicating this.
- Consistent means that the value must be the same for all valid Roman numbers.
- The output for a valid Roman number must not be one of the strings from the list of 50.
- Since the consistent value can be anything (and specifically does not need to contain Roman numerals), there is no requirement for it to be upper or lower case.
If the input is not a valid Roman number, output exactly 1 string from the list of 50.
- The output in this case must be a substring of the input.
- If the input has 2 or more of the strings from the list of 50 as substrings, you may choose any 1 of them to be the output, but you must choose only 1 of them (you must not output 2 or more).
- If you chose to take input in lower case, then this output string must also be in lower case.

Examples

A valid Roman number

The input MCMXCVI is the unique correct representation of 1996. It contains none of the 50 strings.

A string that is not a valid Roman number

Although the input MMXDIII might be suspected of representing 2493, it is not the unique correct representation of this number (which is MMCDXCIII). Note that it has XD as a substring, identifying it as invalid. The only correct output is therefore XD.

An invalid string with more than 1 potential output

The input MMCCMDXXV has 2 substrings that make it invalid, so either CCM or CMD would be correct outputs. It would not be correct to output both of these, or to output their overlap CCMD, as this is not one of the 50 strings.

Test cases

Test cases are in the format INPUT : VALID, OUTPUTS. Note that only one of the valid outputs can be chosen - outputting 2 or more is incorrect.

The output "VALID" is just an example - for an input that is a valid Roman number you may choose to output any consistent value distinct from the 50 strings.

Upper case test cases

These reflect the case used in the rest of the challenge wording, although there is no requirement to use upper case for this challenge.

I : VALID
V : VALID
X : VALID
L : VALID
C : VALID
D : VALID
M : VALID
II : VALID
VV : VV
XX : VALID
LL : LL
CC : VALID
DD : DD
MM : VALID
III : VALID
VII : VALID
IVI : IVI
IIV : IIV
CCI : VALID
CCV : VALID
CCX : VALID
CCL : VALID
CCC : VALID
CCD : CCD
CCM : CCM
IIII : IIII
MLDI : LD
MXXC : XXC
DCIIX : IIX
MCXXXX : XXXX
MCCCCXVI : CCCC
MMLXCVII : LXC
MMMCMXCIX : VALID
MMMDCCCLXXXVIII : VALID
MMCCCXLV : VALID
MCMXCVI : VALID
MMXDIII : XD
MMCDXCIII : VALID
MMCCMDXXV : CCM, CMD
XXX : VALID
CLLX : LL
DXXDMMV : DM, XD
CCDDDIMDD : DD, IM, CCD
VLCXIVXMCVXLC : VX, VL, LC, XM
DVLIILVCXVXVMLI : VX, VL, IL, VM, VC
VVDLMIVILXXDX : VV, VD, IL, XD, LM, IVI
DMXXCMILVCMLLMV : DM, IL, LL, VC, LM, XXC, XCM
CMXDVLCCDDLXLXC : DD, VL, LC, XD, XLX, LXC, CCD, LXL
XDIXCLLMVVLCMCM : VV, VL, LC, XD, LL, LM, IXC, CMC, XCL
IIXXCDVVLMILVDD : DD, VV, VL, VD, IL, LM, XXC, IXX, IIX, XCD
DDMIIXXCMCCDCMM : DD, DM, CMC, CMM, XXC, IXX, DCM, CDC, XCM, CCD, IIX

Lower case test cases

These are the same test cases in lower case, in case you can benefit from taking lower case input.

Note that if you take lower case input then you must also give lower case output where it is one of the 50 strings.

i : valid
v : valid
x : valid
l : valid
c : valid
d : valid
m : valid
ii : valid
vv : vv
xx : valid
ll : ll
cc : valid
dd : dd
mm : valid
iii : valid
vii : valid
ivi : ivi
iiv : iiv
cci : valid
ccv : valid
ccx : valid
ccl : valid
ccc : valid
ccd : ccd
ccm : ccm
iiii : iiii
mldi : ld
mxxc : xxc
dciix : iix
mcxxxx : xxxx
mccccxvi : cccc
mmlxcvii : lxc
mmmcmxcix : valid
mmmdccclxxxviii : valid
mmcccxlv : valid
mcmxcvi : valid
mmxdiii : xd
mmcdxciii : valid
mmccmdxxv : ccm, cmd
xxx : valid
cllx : ll
dxxdmmv : dm, xd
ccdddimdd : dd, im, ccd
vlcxivxmcvxlc : vx, vl, lc, xm
dvliilvcxvxvmli : vx, vl, il, vm, vc
vvdlmivilxxdx : vv, vd, il, xd, lm, ivi
dmxxcmilvcmllmv : dm, il, ll, vc, lm, xxc, xcm
cmxdvlccddlxlxc : dd, vl, lc, xd, xlx, lxc, ccd, lxl
xdixcllmvvlcmcm : vv, vl, lc, xd, ll, lm, ixc, cmc, xcl
iixxcdvvlmilvdd : dd, vv, vl, vd, il, lm, xxc, ixx, iix, xcd
ddmiixxcmccdcmm : dd, dm, cmc, cmm, xxc, ixx, dcm, cdc, xcm, ccd, iix

Scoring

This is a code golf challenge. Your score is the number of bytes in your code. Lowest score for each language wins.

Explanations are optional, but I'm more likely to upvote answers that have one.

This is a common modern set of rules, described as Standard form on Wikipedia. It does not reflect all usages during history, but will be the basis of this challenge, since otherwise the 50 substrings approach does not work. ↩︎

posted about 1 year ago

CC BY-SA 4.0

8mo ago

trichoplax‭

1215 reputation 98 23 156 502

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

+2

−0

[Python], 220 bytes Works w …

12mo ago

+2

−0

Python, 344 bytes 534 bytes …

11mo ago

+1

−0

[Bash], 205 bytes for s …

11mo ago

+1

−0

Rust, 258 254 bytes - Saved …

12mo ago

+1

−0

[Haskell], 271 bytes …

1y ago

+0

−0

Google Sheets, 155 bytes `` …

1y ago

+0

−0

JavaScript, 149 bytes ``` …

1y ago

1 comment thread

Casing (2 comments)

You are accessing this answer with a direct link, so it's being shown above all other answers regardless of its score. You can return to the normal view.

+0

−0

JavaScript, 149 bytes

t=>t.match(/CCCC|CC[DM]|CM[CDM]|DC[DM]|DD|DM|I[CDLM]|IIII|II[VX]|IVI|IX[CILVX]|L[CDLM]|LX[CL]|MMMM|V[CDLMVX]|VI[VX]|XCC[DLMX]|XD|XLX|XM|XX[CL]|XXXX/)

Simple regex. Returns null on valid strings.

Try it online!

posted about 1 year ago

CC BY 4.0

1y ago

--Hyde‭

1 reputation 0 2 0 2

Copy Link

Raw

Markdown

History

2 comment threads

Bug (2 comments)

Leaderboard (5 comments)

+2

−0

Python, 344 bytes

534 bytes:

This is not golfed in any way. ~~But it's the current winner!~~

def validate_roman_numeral(candidate):
    invalid_strings = ["CCCC", "CCD", "CCM", "CDC", "CMC", "CMD", "CMM", "DCD", "DCM", "DD", "DM", "IC", "ID", "IIII", "IIV", "IIX", "IL", "IM", "IVI", "IXC", "IXI", "IXL", "IXV", "IXX", "LC", "LD", "LL", "LM", "LXC", "LXL", "MMMM", "VC", "VD", "VIV", "VIX", "VL", "VM", "VV", "VX", "XCC", "XCD", "XCL", "XCM", "XCX", "XD", "XLX", "XM", "XXC", "XXL", "XXXX"]
    for invalid_string in invalid_strings:
        if invalid_string in candidate:
            return invalid_string
    return "VALID"

379 bytes: Here's a minimal effort golf of the above:

def v(c):
    i=['DD','DM','IC','ID','IL','IM','LC','LD','LL','LM','VC','VD','VL','VM','VV','VX','XD','XM','CCD','CCM','CDC','CMC','CMD','CMM','DCD','DCM','IIV','IIX','IVI','IXC','IXI','IXL','IXV','IXX','LXC','LXL','VIV','VIX','XCC','XCD','XCL','XCM','XCX','XLX','XXC','XXL','CCCC','IIII','MMMM','XXXX']
    for s in i:
        if s in c:
            return s
    return "VALID"

344 bytes: A little more effort:

def v(c):
 o="VALID"
 for s in 'DD','DM','IC','ID','IL','IM','LC','LD','LL','LM','VC','VD','VL','VM','VV','VX','XD','XM','CCD','CCM','CDC','CMC','CMD','CMM','DCD','DCM','IIV','IIX','IVI','IXC','IXI','IXL','IXV','IXX','LXC','LXL','VIV','VIX','XCC','XCD','XCL','XCM','XCX','XLX','XXC','XXL','C'*4,'I'*4,'M'*4,'X'*4:
  if s in c:
   o=s
 return o

posted about 1 year ago

CC BY-SA 4.0

11mo ago

daviewales‭

21 reputation 0 1 2 3

Copy Link

Raw

Markdown

History

1 comment thread

Output for valid inputs is flexible (2 comments)

+2

−0

Python, 220 bytes

Works with Python 3.8 or newer.

n=" IVXLCDM"
x=("".join([n[(d:=ord(c)-32)//8]+n[d-8*(d//8)]for c in'&PW!H.!@/$HF$@G"H6"@7"03#P?%N%O%U%]%^%_&N&O!*!+!1!=!9!<!:!;$=$<"*"+#M#N#L#O#K#C#=#<%MH))\'_X;;']))
r=lambda c:max(s if s in c else"A"for s in x.split())

Try it online!

There are 8 kind of characters, " IVXLCDM". 2 characters are 6 bits that can be easily mapped to printable characters (after code point 32). This string is much shorter (half) then the "DD DM IC ..." string itself. Even if I add the extraction it is still somewhat shorter.

I just regenerate the " IVXLCDM" characters from this string.

So it is the same as the code below just with some decoding from 6-bit-string.

238 bytes

If the r= part is not needed then 2 bytes less

r=lambda c:max(s if s in c else"A"for s in'DD DM IC ID IL IM LC LD LL LM VC VD VL VM VV VX XD XM CCD CCM CDC CMC CMD CMM DCD DCM IIV IIX IVI IXC IXI IXL IXV IXX LXC LXL VIV VIX XCC XCD XCL XCM XCX XLX XXC XXL CCCC IIII MMMM XXXX'.split())

Try it online!

posted about 1 year ago

CC BY-SA 4.0

12mo ago by trichoplax‭

Arpad Horvath‭

191 reputation 0 10 19 41

Copy Link

Raw

Markdown

History

1 comment thread

Specific python version (6 comments)

+1

−0

Bash, 205 bytes

for s in C{C{CC,D,M},DC,MC} {CM,DC,D}{D,M} I{C,D,I{II,V,X},L,M,VI,X{C,I,L,V,X}} L{C,D,L,M,XC,XL} MMMM V{C,D,IV,IX,L,M,V,X} X{C{C,D,L,M,X},D,LX,M,X{C,L,XX}}
do echo $1|grep -q $s&&echo $s&&exit
done
echo T

The golfing is done mostly in the list of 50 strings. Other than that, the code is pretty straightforward, with the only additional golfing being the removal of unnecessary whitespace and use of a single-letter variable name.

Try it online!

posted 11 months ago

CC BY-SA 4.0

celtschk‭

1504 reputation 28 50 190 74

Copy Link

Raw

Markdown

History

0 comment threads

+1

−0

Haskell, 271 bytes

import Data.Char
import Data.List
n=" IVXLCDM"
r x=maximum[if(s`isInfixOf`x)then s else"A"|s<-words$concat$map(\x->(n!!div x 8):[(n!!(x-8*(div x 8)))])$map(\x->ord x-32)"&PW!H.!@/$HF$@G\"H6\"@7\"03#P?%N%O%U%]%^%_&N&O!*!+!1!=!9!<!:!;$=$<\"*\"+#M#N#L#O#K#C#=#<%MH))\'_X;;"]

Try it online!

posted about 1 year ago

CC BY-SA 4.0

Arpad Horvath‭

191 reputation 0 10 19 41

Copy Link

Raw

Markdown

History

0 comment threads

+1

−0

Rust, 258 254 bytes

Saved four bytes thanks to trichoplax.

fn v(s:&str)->&str{for b in "CCCC
CCD
CCM
CDC
CMC
CMD
CMM
DCD
DCM
DD
DM
IC
ID
IIII
IIV
IIX
IL
IM
IVI
IXC
IXI
IXL
IXV
IXX
LC
LD
LL
LM
LXC
LXL
MMMM
VC
VD
VIV
VIX
VL
VM
VV
VX
XCC
XCD
XCL
XCM
XCX
XD
XLX
XM
XXC
XXL
XXXX".lines(){if s.contains(b){return b}}""}

I tried using a closure instead of a function, but I got lifetime errors which, by the time they're fixed, probably wouldn't make the code any shorter.

Try it online!

posted 12 months ago

CC BY-SA 4.0

12mo ago

Sylvester‭

131 reputation 4 2 18 18

Copy Link

Raw

Markdown

History

3 comment threads

Golfing the output (2 comments)

Typo (2 comments)

A sometimes golfier `split` (1 comment)

+0

−0

Google Sheets, 155 bytes

=regexextract(A1,"CCCC|CC[DM]|CM[CDM]|DC[DM]|DD|DM|I[CDLM]|IIII|II[VX]|IVI|IX[CILVX]|L[CDLM]|LX[CL]|MMMM|V[CDLMVX]|VI[VX]|XCC[DLMX]|XD|XLX|XM|XX[CL]|XXXX")

The same regex as in my JavaScript answer.

Returns an error on valid strings.

posted about 1 year ago

CC BY 4.0

--Hyde‭

1 reputation 0 2 0 2

Copy Link

Raw

Markdown

History

1 comment thread

Incorrect output for input "XCCD" (1 comment)

Communities

The 50 substrings that validate any string of Roman numerals

Relevant fact

Valid Roman numbers^[1]

Input

Output

Examples

A valid Roman number

A string that is not a valid Roman number

An invalid string with more than 1 potential output

Test cases

Upper case test cases

Lower case test cases

Scoring

1 comment thread

7 answers

JavaScript, 149 bytes

2 comment threads

Python, 344 bytes

1 comment thread

Python, 220 bytes

238 bytes

1 comment thread

Bash, 205 bytes

0 comment threads

Haskell, 271 bytes

0 comment threads

Rust, 258 254 bytes

3 comment threads

Google Sheets, 155 bytes

1 comment thread

Communities

The 50 substrings that validate any string of Roman numerals

Relevant fact

Valid Roman numbers[1]

Input

Output

Examples

A valid Roman number

A string that is not a valid Roman number

An invalid string with more than 1 potential output

Test cases

Upper case test cases

Lower case test cases

Scoring

1 comment thread

7 answers

JavaScript, 149 bytes

2 comment threads

Python, 344 bytes

1 comment thread

Python, 220 bytes

238 bytes

1 comment thread

Bash, 205 bytes

0 comment threads

Haskell, 271 bytes

0 comment threads

Rust, 258 254 bytes

3 comment threads

Google Sheets, 155 bytes

1 comment thread

Valid Roman numbers^[1]