Accepted answer

Redefine whitespace; exclude newline from it.

override val whiteSpace = """[ \t]+""".r

I'm not sure if this is considered good practice. Please have a look at this thread for further discussion and inspiration: Scala parser combinators and newline-delimited text

EDIT: further refinement based on input from OP; see also the comments made earlier.

In this particular DSL, some statements (the declarations) are terminated by newline, while other statements consider newline as whitespace, merely separating tokens and to be ignored by the parser.

This inconsistent interpretation of newline may be too complicated for a simple regex. So in this case, instead of overriding the variable val whiteSpace, override the method def handleWhiteSpace; here you can programmatically determine what is to be considered as whitespace. The easiest approach seems to be to define a global modifiable variable (var foo: Boolean) that is toggled on and off by the tokenizer/parser, based on the type of statement that is being parsed. Your implementation of handleWhiteSpace can then use this variable to adjust its behaviour accordingly.

The new implementation of handleWhiteSpace can be a copy of the original handleWhiteSpace, where the unmodifiable whiteSpace is replaced by an expression that dynamically switches between two regular expressions (one matching all whitespace including newline, the other excluding newline), depending on the value of your global variable. If possible, you may want to make better use of inheritance and call super.handleWhiteSpace in either one of these cases.


I would strongly suggest keeping the syntax of your language so that newlines are not significant.

In my experience, making whitespace significant either leads to confusion on the part of users who don't know that it is, or to much more complex specification and processing of the syntax.

I think it's rarely worth the trouble. In particular, with the Scala RegexParsers it is non-trivial to ignore whitespace mostly but make it significant in some places (and be sure that you did it correctly).

Two suggestions for syntax variation:

a) add semicolons or some other terminator on the end of declarations, or

b) add commas or some other separator between the multiple varnames in a single declaration.

The modifications to your parser would be trivial and then you can move onto more interesting issues :-)


Here's an alternative approach for you to consider.

Write a simple preprocessor that appends an explicit separator token (something you make up yourself) to every line that starts with a question mark. Now your parser can ignore all newlines; declarations are explicitly terminated by an explicit token.


Since the format you are trying to parse uses newline as a keyword you will have to consider the newline while parsing.
On the bright side this is easely done and not so different from your original code. Try creating a parser that is aware of the lines by adding a typed_variables_line function. Then define the document as a list of these lines. Also you want to allow for empty lines. I have added a rule for this as well.

override val whiteSpace = """[ \t]+""".r;
def typed_list_variables : Parser[List[LiftedTerm]]= rep(typed_variables_line | empty_line) ^^ { 
    case list => list.flatten

def typed_variables_line:Parser[List[LiftedTerm]] = rep1(variable) ~ opt("-"~>primitive_type) <~ opt("\n") ^^ {
    case vars ~ None =>>LiftedTerm(varName,ObjectType));
    case vars ~ Some(primTypeName) =>>LiftedTerm(varName,TermType(primTypeName)));

def empty_line:Parser[List[LiftedTerm]] = "\n" ^^ {
    case nothing => List();

Related Query

More Query from same tag