The impact of vocabulary normalization

Binkley, David; Lawrie, Dawn

doi:10.1002/smr.1710

Cited by 9 publications

(4 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considerable attention has been devoted to the issue of name lengths, and in particular to whether abbreviations cause a disadvantage relative to using full words [5], [15], [16], [17], [19], [20]. Another favorite topic is controlling the vocabulary used in variable names, and how this reduces ambiguity [4], [7], [9], [10] Feitelson et al investigated how variables names are selected [11]. One of their results was that naming is very variable, with little chance that different developers would select exactly the same name for the same variable.…”

Section: Related Workmentioning

confidence: 99%

Does Code Structure Affect Comprehension? On Using and Naming Intermediate Variables

Cates¹,

Yunik²,

Feitelson³

2021

Preprint

View full text Add to dashboard Cite

Intermediate variables can be used to break complex expressions into more manageable smaller expressions, which may be easier to understand. But it is unclear when and whether this actually helps. We conducted an experiment in which subjects read 6 mathematical functions and were supposed to give them meaningful names. 113 subjects participated, of which 58% had 3 or more years of programming work experience. Each function had 3 versions: using a compound expression, using intermediate variables with meaningless names, or using intermediate variables with meaningful names. The results were that in only one case there was a significant difference between the two extreme versions, in favor of the one with intermediate variables with meaningful names. This case was the function that was the hardest to understand to begin with. In two additional cases using intermediate variables with meaningless names appears to have caused a slight decrease in understanding. In all other cases the code structure did not make much of a difference. As it is hard to anticipate what others will find difficult to understand, the conclusion is that using intermediate variables is generally desirable. However, this recommendation hinges on giving them good names.

show abstract

Section: Related Workmentioning

confidence: 99%

Does Code Structure Affect Comprehension? On Using and Naming Intermediate Variables

Cates¹,

Yunik²,

Feitelson³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Here, the attempted hit [35] or vocabulary match [54] is performed, by matching the phrase within the query to the same phrase occurring within the text of the document at least once. Note that various measurement techniques suggested in the literature are not made use of, for example: inverse document frequency (idf) and the product of term frequency (tf) and idf (tf*idf) [55][56][57], cosine similarity [10,58] and others. These measurement techniques were not used because the results needed to be non-weighted (based on pure 'hits', where tf = 1) and uninfluenced by these and other mathematical techniques.…”

Section: The Irsmentioning

confidence: 99%

The hybridised indexing method for research-based information retrieval

Fitzgerald

Harpe

Uys

2021

Journal of Information Science

View full text Add to dashboard Cite

An information retrieval system (IRS) is used to retrieve documents based on an information need. The IRS makes relevance judgements by attempting to match a query to a document. As IRS capabilities are indexing design dependent, the hybrid indexing method (IRS-H) is introduced. The objectives of this article are to examine IRS-H (as an alternative indexing method that performs exact phrase matching) and IRS-I, regarding retrieval usefulness, identification of relevant documents, and the quality of rejecting irrelevant documents by conducting three experiments and by analysing the related data. Three experiments took place where a collection of 100 research documents and 75 queries were presented to: (1) five participants answering a questionnaire, (2) IRS-I to generate data and (3) IRS-H to generate data. The data generated during the experiments were statistically analysed using the performance measurements of Precision, Recall and Specificity, and one-tailed Student’s t-tests. The results reveal that IRS-H (1) increased the retrieval of relevant documents, (2) reduced incorrect identification of relevant documents and (3) increased the quality of rejecting irrelevant documents. The research found that the hybrid indexing method, using a small closed document collection of a hundred documents, produced the required outputs and that it may be used as an alternative IRS indexing method.

show abstract

“…Closer to the bias that we analyze is the work of Dit et al and Binkley et al on the impact of various kinds of preprocessing operations of textual data on the effectiveness of feature location and traceability recovery from source code. Both papers compared the impact on feature location algorithms of using different approaches for splitting identifiers.…”

Section: Related Workmentioning

confidence: 99%

Error leakage and wasted time: sensitivity and effort analysis of a requirements consistency checking process

Hayes

Antoniol

et al. 2016

J Software Evolu Process

View full text Add to dashboard Cite

Several techniques are used by requirements engineering practitioners to address difficult problems such as specifying precise requirements while using inherently ambiguous natural language text and ensuring the consistency of requirements. Often, these problems are addressed by building processes/tools that combine multiple techniques where the output from 1 technique becomes the input to the next. While powerful, these techniques are not without problems.Inherent errors in each technique may leak into the subsequent step of the process. We model and study 1 such process, for checking the consistency of temporal requirements, and assess error leakage and wasted time. We perform an analysis of the input factors of our model to determine the effect that sources of uncertainty may have on the final accuracy of the consistency checking process. Convinced that error leakage exists and negatively impacts the results of the overall consistency checking process, we perform a second simulation to assess its impact on the analysts' efforts to check requirements consistency. We show that analyst's effort varies depending on the precision and recall of the subprocesses and that the number and capability of analysts affect their effort. We share insights gained and discuss applicability to other processes built of piped techniques.

show abstract

The impact of vocabulary normalization

Cited by 9 publications

References 38 publications

Does Code Structure Affect Comprehension? On Using and Naming Intermediate Variables

Does Code Structure Affect Comprehension? On Using and Naming Intermediate Variables

The hybridised indexing method for research-based information retrieval

Error leakage and wasted time: sensitivity and effort analysis of a requirements consistency checking process

Contact Info

Product

Resources

About