Advanced data analysis tools and techniques are important for semiconductor companies to gain competitive advantage. In particular, yield prediction tools, which fully utilize production data, help to improve operational efficiency and reduce production costs. This paper introduces a novel and scalable framework for semiconductor manufacturing Final Test (FT) yield prediction leveraging machine learning techniques. This framework is able to predict FT yield at wafer fabrication stage, so that FT low yield problems can be caught at an earlier production stage compared to past studies. Our work presents a robust solution to automatically handle both numerical and categorical production related data without prior knowledge of the low yield root cause. Gaussian Mixture Models, One Hot Encoder and Label Encoder techniques are adopted for data pre-processing. To improve model performance for both binary and multiclass classification, model selection and model ensemble using the F1-macro method is demonstrated. The framework has been applied to three mass production products with different wafer technologies and manufacturing flows. All of them achieved high F1-macro test score indicative of the robustness of our framework.INDEX TERMS Semiconductor manufacturing, smart manufacturing, yield prediction, final test, Gaussian mixture models, clustering, ensemble methods.
This paper presents a strategy for efficiently selecting in formative data from large corpora of untranscribed speech. Confidence-based selection methods (Le., selecting utter ances we are least confident about) have been a popular approach, though they only look at the top hypothesis when selecting utterances and tend to select outliers, therefore, not always improving overall recognition accuracy. Alternatively, we propose a method for selecting data looking at compet ing hypothesis by computing entropy of N-best hypothesis decoded by the baseline acoustic model. In addition we ad dress the issue of outliers by calculating how representative a specific utterance is to all other un selected utterances via a tf-idf score. Experiments show that N-best entropy based selection (%relative 5.8 in 400-hour corpus) outperformed other conventional selection strategies; confidence based and lattice entropy based, and that tf-idfbased representativeness improved the model further (%relative 6.2). A comparison with random selection is also presented. Finally model size impact is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.