Zixuan Ma scite author profile

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

A Filtered-Marine Map-Based Matching Method for Gravity-Aided Navigation of Underwater Vehicles

Wang

Huang

et al. 2022

IEEE/ASME Trans. Mechatron.

View full text Add to dashboard Cite

Improved Particle Filter-Based Matching Method With Gravity Sample Vector for Underwater Gravity-Aided Navigation

Wang

Zhu

et al. 2021

IEEE Trans. Ind. Electron.

View full text Add to dashboard Cite

Smart Eco-Villages and Tourism Development Based on Rural Revitalization with Comparison Chinese and Polish Traditional Villages Experiences

Zhou

Bonenberg

et al. 2019

View full text Add to dashboard Cite

Educators’ Perspectives on Factors Impacting STEM Achievement in Rural Indigenous Student-Serving Schools

Mars¹,

Alive²,

Ortiz³

et al. 2022

ruraled

View full text Add to dashboard Cite

This study addressed the question, “What factors do experts perceive as impacting STEM achievement of students in rural schools with predominantly Indigenous students?” A thematic analysis of interviews with 40 educators with a depth of experience identified six major themes: holistic STEM education, inclusion of local culture in STEM education, highly qualified staff, STEM curriculum and instruction, technology, and STEM funding. These themes were interrelated. Holistic education demanded more individualized curriculum and required more highly qualified staff who could adapt the curriculum and integrate technology with traditional knowledge, but these educators were harder to hire and retain due to low funding. It is emphasized that efforts to raise STEM achievement of Indigenous students attending rural schools should be embedded within consideration of the larger system, including the academic, emotional, and cultural experiences of students and financial, technological and human resources available.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zixuan Ma

A Roadmap for Big Model

A Filtered-Marine Map-Based Matching Method for Gravity-Aided Navigation of Underwater Vehicles

Improved Particle Filter-Based Matching Method With Gravity Sample Vector for Underwater Gravity-Aided Navigation

Smart Eco-Villages and Tourism Development Based on Rural Revitalization with Comparison Chinese and Polish Traditional Villages Experiences

Educators’ Perspectives on Factors Impacting STEM Achievement in Rural Indigenous Student-Serving Schools

Contact Info

Product

Resources

About