Sunday, September 20, 2009

Quotes and punctuation: What do programming languages have to teach us about natural languages?

It annoys me when programmers don't use proper grammar. Often when I correct a programmer's grammar, they respond with something along the lines of, "I'm a math person, not a verbal person." To that my response is, "If that's true, you're not a very good programmer." For the record, most of these people are good programmers, because they are verbal people. Programming itself requires good math and verbal skills; there's a reason that so many programming gurus recommend Strunk and White's Elements of Style. Programming is a form of writing. If a good programmer doesn't use good grammar, it's not because they aren't capable, it's because they aren't trying.

As the preceding rant shows, I take grammar very seriously. Admittedly, I have a bit more of a burden in this area than the average programmer; I'm developing my own programming language and would love to work professionally in programming language design, so grammars are obviously an area of interest for me.

Part of my research involves finding ways to make programming languages imitate natural language. The logical, mathematical aspects of programming languages are still important, because the ideas that a programmer is recording for compilation are inherently mathematical. But humans speak and communicate most easily in natural languages, so it only makes sense that we would want programming languages to imitate natural languages to some extent.

Sometimes, however, my brain reverses the process a bit, and I start thinking about how natural languages could learn something from programming languages.

An example of this is my thoughts on quotes. Take a look at this bit of dialogue:
"Are you feeling better?" she asked. "I saw what happened to you. I was with you on the beach all that long night but could not reach you."
"My fugue," I said. "You took it."

This demonstrates a few issues with quotation marks. To highlight the issues, let's extract the dialog into the form of an IM chat log:
Anna: Are you feeling better? I saw what happened to you. I was with you on the beach all that long night but I could not reach you.
William: My fugue. You took it.

This quote is taken from The Empire of Ice Cream, an excellent short story by Jeffrey Ford. Ford is an American author, so his use of quotations follows the American convention of placing sentence-ending punctuation inside the quotation marks. If he were a British author, the sentence-ending punctuation would be outside the quotation marks, according to British convention. There's some debate about which convention is better, but most people will say that you can use either as long as you're consistent or that you should use the convention of whatever organization is publishing your writing. While the latter recommendation is pragmatic, the former could use a little work; neither convention makes sense.

As a programmer, I think of open and close quotes as changes in scope. Punctuation inside the quotation marks is in a scope belonging to the quoted person. The text in the IM version of our conversation contains only the text in the quoted scope Anna and William's scope). Anything outside the quotes is in the author's scope (Ford's scope).

This should also apply to punctuation. Punctuation inside the quotes should only make sense in Anna and William's scope, and punctuation outside the quotes should only make sense in Ford's scope. From this we can see that the American convention has two problems:
"Are you feeling better?" she asked. "I saw what happened to you. I was with you on the beach all that long night but could not reach you."1
"My fugue,2" I said. "You took it."

  1. There is no indication that Ford's sentence has ended.

  2. William's sentence ended here, but a comma doesn't indicate an end of sentence (look at the IM version of the conversation).

The British convention is worse (in this case):
"Are you feeling better?"1 she asked. "I saw what happened to you. I was with you on the beach all that long night but could not reach you".2
"My fugue",3 I said. "You took it".

  1. Putting the question mark inside the quotation is inconsistent with the way periods are handled.

  2. There is no indication that Anna's sentence ended.

  3. In addition to having no indication that William's sentence ended, the comma here indicates that Ford's sentence has continued. This is inconsistent with the lack of such indication after the question mark.

If, instead, we follow the idea of punctuating both the quoted sentence and the author's sentence, we get:
"Are you feeling better?" she asked. "I saw what happened to you. I was with you on the beach all that long night but could not reach you.".
"My fugue." I said. "You took it.".

The period-quote-period looks ugly, but it leaves no ambiguity about whose sentence has ended. One might make an argument for placing commas after '"Are you feeling better?"' and '"My fugue."', but I tend toward using less punctuation if (and only if) the punctuation doesn't reduce ambiguity.

This is just the kind of nonsense that keeps me up at night. Are there any other places where natural language could learn from the less ambiguous grammars of programming languages? Let me know in the comments.

No comments:

Post a Comment