How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora. 1 * Equal contribution. 1 The code is available at https://github.com/ yongjie-lin/bert-opensesame.
In Phrase Structure Composition and Syntactic Dependencies, Robert Frank explores an approach to syntactic theory that weds the Tree Adjoining Grammar (TAG) formalism with the minimalist framework. TAG has been extensively studied both for its mathematical properties and for its usefulness in computational linguistics applications. Frank shows that incorporating TAG's formally restrictive operations for structure building considerably simplifies the model of grammatical competence, particularly in the components concerned with syntactic movement and locality. The empirical advantages of the resulting model, illustrated with extensive case studies of subject-raising constructions and wh-questions, point toward a conception of grammar that is sharply limited in its computational power.
ObjectiveThe authors' objective was to quantitatively assess angiogenesis or neovascularity within nodenegative colon cancers and to determine if increased angiogenesis correlated with higher recurrence and lower survival rates. Summary Background DataNeovascularization promotes rapid tumor growth by facilitating nutrient and metabolite exchange. Recent work with breast and nonsmall cell lung cancers has shown that low angiogenic activity imparts a lower risk of recurrence and metastasis. Although adjuvant therapy is beneficial for patients with node-positive colon cancers, no such benefit has been demonstrated for patients with node-negative lesions. Nevertheless, up to 30% of this latter group will experience recurrence. We sought to identify a subset of patients with node-negative colon cancers at high risk for recurrence who might benefit from such therapy. MethodsOne hundred five node-negative colon cancers were immunostained for endothelial cell factor ViII-related antigen. Blood vessels within three microscopic fields at 1OOX magnification were counted, the mean calculated, and an angiogenesis score assigned. A subjective angiogenesis grade (1-4) was assigned after each slide was surveyed in its entirety. Score and grade were then assessed with respect to cancer recurrence and patient survival. ResultsMean patient age was 71 years (range, 41-90 years) and mean tumor size, 5.6 cm (range, 2-12 cm). Mean follow-up was 6.5 years; mean angiogenesis score, 27.9 (range, 4-50); and mean grade, 2.0 (range, 1-4). Patients living 5 years had significantly lower angiogenesis scores than did nonsurvivors (22.8 vs. 43.2, p = 0.0004). Each 1 0-vessel increase in score imparted a 2.0-fold greater hazard of death and a 2.7-fold greater hazard of recurrence. The probability of surviving 5 years is estimated by: e2,6290-.0976.A.S. P(survival)= 1 + e26-76AS and the probability of recurrence is estimated by: e-3.5527+.08556. A.S. P(recurrence) 1 e-3.5527+.08556 A.S. ConclusionsAngiogenesis within colon cancer is an important predictor of tumor behavior and may identify patients at higher risk for recurrence and early death. 695
Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.