Though preceding work in computational argument quality (AQ) mostly focuses on assessing overall AQ, researchers agree that writers would benefit from feedback targeting individual dimensions of argumentation theory. However, a large-scale theory-based corpus and corresponding computational models are missing. We fill this gap by conducting an extensive analysis covering three diverse domains of online argumentative writing and presenting GAQCorpus: the first largescale English multi-domain (community Q&A forums, debate forums, review forums) corpus annotated with theory-based AQ scores. We then propose the first computational approaches to theory-based assessment, which can serve as strong baselines for future work. We demonstrate the feasibility of large-scale AQ annotation, show that exploiting relations between dimensions yields performance improvements, and explore the synergies between theory-based prediction and practical AQ assessment.RQ1: Can we develop a large-scale theory-based AQ corpus? We conduct an extensive annotation study with trained linguists and crowd workers on 5, 295 arguments from three domains to create the Grammarly Argument Quality Corpus (GAQCorpus), the first large-scale multi-domain English corpus annotated with theory-based AQ scores.RQ2: Are we able to develop computational models that can do theory-based AQ assessment in varying domains? Based on GAQCorpus, we are the first to propose computational approaches to theory-based AQ assessment and show that it is possible to develop models for this task. Our models can serve as strong baselines for future research and enable the field to investigate follow-up research questions.RQ3: Can the interrelations between the different AQ dimensions be exploited in a computational setup? Inspired by the hierarchical structure of the taxonomy, we explore whether the relationships between dimensions can be computationally exploited. In addition to simple single-task learning approaches, we study the effect of jointly predicting AQ dimensions in two variants (flat vs. hierarchical) and find that combining the training signals of all four aspects benefits theory-based AQ assessment.RQ4: Does the corpus support training a single unified model for multi-domain evaluation? When enough data from a single domain is available, training on in-domain data is typically preferred over multi-domain. However, larger amounts of data are especially useful for complex model architectures currently prominent in NLP (e.g., BERT (Devlin et al., 2019), GPT2 (Radford et al., 2019). We study these two mutually opposing effects on GAQCorpus and show that our corpus supports training a single unified model across all three domains, with improved performances in individual domains.RQ5: Can we empirically substantiate the idea that theory-based and practical AQ assessment can learn from each other? Wachsmuth et al. (2017a) suggest that both the practical and the theory-based views can learn from each other, but so far, this has been only tested manually. Employing our mode...