This sounds like the attempts to make natural language programming, sort of? |
Kind of. I'm importing a long set of rules for a tabletop game and it would be impractical to implement
all of them manually.
So here's what I did:
First I grabbed all the inputs and filtered out some minor differences in phrasing that were easy to remove and could trip up the algorithm. I split each input into words and punctuation. This is partly to speed up the next step. I then applied a longest common subsequence algorithm to each pair of inputs. Using this I can easily calculate how similar both inputs are. All inputs that are closer to each other than a specific threshold, determined empirically, are grouped together and not compared to other inputs in subsequent loops (this is to speed up the process, since LCS is somewhat expensive and I'm doing it ~100M times).
Using this algorithm under the first group I get merged
Flip a coin. If heads, the other player takes {1|2|3|4} extra damage.
plus some other similar strings:
Flip a coin. If heads, the other player takes {1|2|3|4} extra damage {and can't attack the next turn}.
Flip a coin. If heads, the other player takes {1|2|3|4} extra damage{, if tails you can't attack in your next turn}.
I managed to reduce the problem from ~16100 cases to ~1300 cases that can be handled automatically (some are probably too small to be worth handling this way) and ~1300 cases that must be handled specially. It's still a lot of work, but it's like an order of magnitude smaller.