The accurate formulation of boolean expressions is a notorious problem in programming languages and database query tools. This paper studies the ways that untrained users naturally express and interpret queries, revealing some of the underlying reasons why this task is so difficult. Among the study '
IntroductionWe are applying human-computer interaction techniques to the design of new programming language features. Design decisions are resolved by looking to prior research for warnings about potential problems, suggestions for design alternatives, and guidance in selecting among various potential solutions. When necessary, we perform new user studies to investigate questions that are not fully addressed by prior research.We are currently using this method to design a primarily textual programming system for children to use in creating interactive simulations and games. One of the challenges of this effort is to craft the features of the system to address the problems observed in the prior research. While this is straightforward in some cases, it is quite difficult in others.One of the more notorious problem areas in programming languages is the accurate specification of boolean expressions [1]. In addition to programming languages, this same problem also appears in the task of formulating queries for common end-user activities such as web searching, library catalog searching, and other database retrieval tasks [2]. Despite the great difficulty that users have demonstrated with using the boolean operators AND , OR , and NOT to construct these expressions, no universally better alternatives have been discovered, so most programming languages continue to rely on them, including many visual and forms-based languages (e.g., [3,4]). Early web search engines also used these operators, although many have turned to less expressive query languages (for example, the plus and minus unary operators for inclusion and exclusion). Newsweek reports that even with these simplifications, most web users are dissatisfied with search engines, and less than 6% manage to use these operators in their searches [5]. The problems with boolean queries are exemplified in studies of non-programmers writing solutions to programming problems in their own words [6,7]. For example, in these studies it was very common for participants to use the word AND where the word OR is the correct boolean operator. Instead of saying something like "count the cars with license plates from Georgia or Louisiana" they would say "count the cars with license plates from Georgia and Louisiana." The latter version refers to an empty set of license plates when interpreted according to boolean logic, but in English it is usually interpreted to mean the union of the two states' license plates.1 It was also noted that the words OR and NOT rarely appeared, suggesting that boolean expressions are not a natural way to formulate these statements. The participants often used other words and sentence structures to specify their queries accurately. For example, rathe...