Di Pang scite author profile

Di Pang

5Publications

49Citation Statements Received

200Citation Statements Given

How they've been cited

How they cite others

196

Affiliations

West Virginia University

Publications

Order By: Most citations

A novel single-pulse search approach to detection of dispersed radio pulses using clustering and supervised machine learning

Pang

Goševa-Popstojanova

Devine

et al. 2018

View full text Add to dashboard Cite

We present a novel two-stage approach which combines unsupervised and supervised machine learning to automatically identify and classify single pulses in radio pulsar search data. In the first stage, we identify astrophysical pulse candidates in the data, which were derived from the Pulsar Arecibo L-Band Feed Array (PALFA) survey and contain 47,042 independent beams, as trial single-pulse event groups (SPEGs) by clustering single-pulse events and merging clusters that fall within the expected DM and time span of astrophysical pulses. We also present a new peak scoring algorithm, to identify astrophysical peaks in S/N versus DM curves. Furthermore, we group SPEGs detected at a consistent DM for they were likely emitted by the same source. In the second stage, we create a fully labelled benchmark data set by selecting a subset of data with SPEGs identified (using stage 1 procedures), their features extracted and individual SPEGs manually labelled, and then train classifiers using supervised machine learning. Next, using the best trained classifier, we automatically classify unlabelled SPEGs identified in the full data set. To aid the examination of dim SPEGs, we develop an algorithm that searches for an underlying periodicity among grouped SPEGs. The results showed that RandomForest with SMOTE treatment was the best learner, with a recall of 95.6% and a false positive rate of 2.0%. In total, besides all 60 known pulsars from the benchmark data set, the model found 32 additional (i.e., not included in the benchmark data set) known pulsars, and several potential discoveries.

show abstract

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Devine

Goševa-Popstojanova

Pang

2018

View full text Add to dashboard Cite

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta-and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15 data nodes shows that the parallel implementation of the identification algorithm offers a speedup of up to 5X that of a similar multithreaded implementation. Further, we show that the combination of automated multiclass classification and feature selection speeds up the execution performance of the RandomForest machine learning algorithm by an average of 54% with less than a 2% average reduction in the algorithm's ability to correctly classify pulsars. The generalizability of these results is demonstrated by using two real-world radio astronomy data sets.

show abstract

Panoramic Sea-Sky-Line Detection Based on Improved Active Contour Model

Su¹,

Wu²,

Pang³

2016

光学学报

View full text Add to dashboard Cite

Detection of Radio Pulsars in Single-pulse Searches Within and Across Surveys

Pang

Goševa-Popstojanova

McLaughlin

2020

PASP

View full text Add to dashboard Cite

Pulsar detection using machine learning is a challenging problem as it involves extreme class imbalance and strong prioritization of high Recall. This paper is focused on automatic detection of astrophysical pulses in single-pulse searches, both within and across surveys. We use the output from the first stage of our previously developed twostage Single-Pulse Event Group IDentification approach and focus on the second stage (i.e., classification of pulse candidates). Specifically, for the first time in time-domain single-pulse searches we (1) use boosting and deep learning algorithms for within-survey classification and (2) investigate cross-survey classification by using two transfer learning methods, trAdaBoost (instance-based) and fine-tuning (parameter-based). Our experimental results are based on two benchmark data sets, Green Bank Telescope Drift-scan (GBTDrift) and Pulsar Arecibo Lband Feed Array (PALFA)-extended, created from the GBTDrift survey and the PALFA survey. The main findings include: (1) Due to the emphasis on high Recall, F 4 measure is more appropriate performance indicator than the balanced F 1 measure. (2) For the GBTDrift benchmark, AdaBoost outperformed the other models with 98.5% Recall and F 4 =0.942. For the PALFA-extended benchmark, RandomForest performed the best, with 91.6% Recall and F 4 =0.890. (3) Models performance degraded significantly when they were used for pulsar classification across surveys. (4) Transfer learning improved cross-survey classification significantly when training data in the target data set were limited. When the GBTDrift benchmark was used as the target data set, fine-tuned SPEGnet models had the highest F 4 measure, while trAdaBoost models had the highest F 4 measure when PALFAextended benchmark was used as the target data set. (5) Pulse classification was affected not only by the betweenclass imbalance (i.e., pulsar versus non-pulsar), but also by the within-class imbalance, which was more prominent in the case of the PALFA-extended benchmark due to the lack of labeled low signal-to-noise ratio pulsars.

show abstract

Identification and Classification of Radio Pulsar Signals Using Machine Learning

Pang¹

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Di Pang

A novel single-pulse search approach to detection of dispersed radio pulses using clustering and supervised machine learning

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Panoramic Sea-Sky-Line Detection Based on Improved Active Contour Model

Detection of Radio Pulsars in Single-pulse Searches Within and Across Surveys

Identification and Classification of Radio Pulsar Signals Using Machine Learning

Contact Info

Product

Resources

About