基於動態時間校正的過抽樣方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：3.21.171.92

姓名

楊盛博(Sheng-Po Yang) 查詢紙本館藏

畢業系所

工業管理研究所

論文名稱

基於動態時間校正的過抽樣方法
(An Oversampling Method Based on Dynamic Time Warping)

相關論文

★ 二階段作業研究模式於立體化設施規劃應用之探討–以半導體製造廠X及Y公司為例	★ 推行TPM活動以改善設備總合效率並提昇企業競爭力...以U公司桃園工廠為例
★ 資訊系統整合業者行銷通路策略之研究	★ 以決策樹法歸納關鍵製程暨以群集法識別關鍵路徑
★ 關鍵績效指標(KPI)之建立與推行 - 在造紙業	★ 應用實驗計劃法- 提昇IC載板錫球斷面品質最佳化之研究
★ 如何從歷史鑽孔Cp值導出新設計規則進而達到兼顧品質與降低生產成本目標	★ 產品資料管理系統建立及導入-以半導體IC封裝廠C公司為例
★ 企業由設計代工轉型為自有品牌之營運管理	★ 運用六標準差步驟與FMEA於塑膠射出成型之冷料改善研究(以S公司為例)
★ 台灣地區輪胎產業經營績效之研究	★ 以方法時間衡量法訂定OLED面板蒸鍍有機材料更換作業之時間標準
★ 利用六標準差管理提升生產效率－以Ａ公司塗料充填流程改善為例	★ 依流程相似度對目標群組做群集分析- 以航空發動機維修廠之自修工件為例
★ 設計鏈績效衡量指標建立 —以電動巴士產業A公司為例	★ 應用資料探勘尋找影響太陽能模組製程良率之因子研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

異常偵測(Anomaly Detection)是指問題在發生的當下或提前被找出來，是個在現實中資料分析裡常見的問題，像是信用詐欺與醫療問題，而在製造業的製程中，維修人員常會遇到機器故障、零件損壞、耗材斷裂等情況，造成瑕疵發生或製程中斷，因此對於機具及耗材的維護保養或更換，都不希望在問題發生後才去解決。異常偵測常見的資料中，都是僅有極少量的異常資料跟大量正常資料，因此會難以分辨出資料異常的特性，在對資料抽樣的時候我們常見的作法有調整資料抽取的數量、特徵欄位的選取(Feature Selection)、跟計算資料間的距離相似度。在過去的研究中，以距離為算法的抽樣方法都沒有考量到資料在時間序列上的不同時間長度的相似度，因此在本研究中，我們將針對時間序列的資料提出以DTW為計算不同時間長度為基礎的抽樣方法。
關於衡量時間序列的相似度，我們採用的方法是動態時間校正(DTW)，其中兩筆時間序列的DTW距離愈小，代表兩者之間愈相似。相較於歐氏距離，DTW可以用來計算不同的時間長度，因此在我們已知異常的情況下，可以根據相對應的表現對樣本的長度放寬或限縮。在本篇論文的實驗中，我們便首先定義了發生異常前的一段時間作為原本的異常資料，並用DTW將跟其最為相似的資料作為樣本，這樣可以使我們考量到的不僅有發生異常前的時間點，還有可能找到隱藏在不同時間長度中的異常片段。
論文中使用的資料是來自一間半導體公司其中的一個製程資料，由於耗材在製程中損壞的時間難以預估，使得異常偵測的準確率相對較低，本研究希望以我們提出的抽樣方法提升類似此情況的時間序列異常偵測。本研究會將提出的抽樣方法、隨機位置抽樣的過抽樣方法與以歐基里德距離抽樣的方法，在LSTM跟SVC兩個分類模型中進行比較。最後從我們的實驗結果可以觀察到，我們提出的抽樣方法在模型的表現都比其他兩個方法優秀。

摘要(英)

Anomaly detection implies that the problem is found at the moment of occurrence or in advance. It is a common problem in data analysis in reality, such as credit fraud and medical problems. In the manufacturing process, maintenance personnel often meet the event of machine failure, damaged parts, broken consumables, etc., resulting in defects or interruption of the process. We are not willing the maintenance or replacement of equipment and consumables will be found out after the problem occurs. Among the common data for abnormal error detection, there are only a very small amount of abnormal data and a large amount of normal data, which causes it is difficult to distinguish the characteristics of abnormal data. Sampling is a common method to solve the problem by adjust the number and characteristics of data extraction, feature selection, and distance similarity. In past researches, the sampling method based on the distance between sequences did not consider the similarity of sequences at different time lengths in time series. Therefore, in this study, we propose to use DTW as a calculation method in time series sampling method.
For measuring the similarity of time series, the method we use is Dynamic Time Warping (DTW). The smaller the DTW distance between two time series, the more similar they are. Compared with Euclidean distance, DTW can be used to calculate different time lengths, so in the case of known anomalies, the length of the sample can be relaxed or limited according to the corresponding performance. In the experiment of this paper, we first define the period before the abnormality occurs as the original abnormal data, and use DTW to take the most similar data as samples, so that we can consider not only data before the abnormality occurs, but also possible to find anomalous fragments hidden in different lengths of time.

The data used in this paper is from one of the process data of a semiconductor company. Since it is difficult to predict the time when consumables are damaged in the process, the recall rate of abnormal detection is relatively low. We hope to improve the performance by the sampling method we propose in the study while comparing the proposed sampling method, oversampling with random location, and sampling with Euclidean distance, in two classification models, LSTM and SVC. Finally, it can be observed from our experimental results that our proposed sampling method performs better than the other two methods in the model.

關鍵字(中)

★ 動態時間校正
★ 異常偵測
★ 時間序列
★ 過抽樣

關鍵字(英)

★ DTW
★ anomaly detection
★ time-series
★ oversampling

論文目次

中文摘要 i
Abstract ii
Contents iv
The Contents of Figures v
The Contents of Tables vi
1. Introduction 1
1.1 Motivation 1
1.2 Research objective 2
2. Literature review 3
2.1 Anomaly Detection 3
2.2 Time-series 4
2.3 Classification 6
2.4 Oversampling 7
2.5 Dynamic Time Warping 8
2.6 Measurement 12
3. Methodology 15
3.1 Data Processing 15
3.2 Similarity Measurement 16
3.3 Oversampling 17
3.4 Evaluation 18
4. Numerical Experiment 20
4.1 Data Processing 20
4.2 Similarity Measurement 22
4.3 Oversampling 23
4.4 Evaluation 24
4.4.1 Compare oversampling methods by using only an anomaly testing data 24
4.4.2 Compare oversampling methods by a given specificity 25
4.4.3 Compare SVC and LSTM in oversampling method 27
5. Conclusion 28
Reference 30

參考文獻

1. Aboagye-Sarfo, P., Mai, Q., Sanfilippo, F. M., Preen, D. B., Stewart, L. M., Fatovich, D. M., (2015). “A comparison of multivariate and univariate time series approaches to modelling and forecasting emergency department demand in Western Australia,” Journal of Biomedical Informatics, 57, pp. 62-73
2. Beleites, C., Neugebauer, U., Bocklitz, T., Krafft, C., Popp, J. (2013) “Sample size planning for classification models.” Anal. Chim. Acta, 760, pp. 25-33
3. Blagus, R., Lusa, L. (2013). “SMOTE for high-dimensional class-imbalanced data.” BMC Bioinformatics 14, 106.
4. Blázquez-García, A., Conde, A., Mori, U., Lozano, J. A., (2021) “A Review on Outlier/Anomaly Detection in Time Series Data,” ACM Computing Surveys, 54(3), pp. 1-33.
5. Brown, A. H. D., & Marshall, D. R. (1995). “A basic sampling strategy: theory and practice.” Collecting plant genetic diversity: technical guidelines. CAB International, Wallingford, pp. 75-91.
6. Carreño, A., Inza, I., Lozano, J.A. (2020) “Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework,” Artif Intell Rev 53, pp. 3575–3594.
7. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., (2002) “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, 16, pp. 321-357.
8. Chazal, F., Cohen-Steiner, D., Lieutier, A. (2009) “A Sampling Theory for Compact Sets in Euclidean Space.” Discrete Comput Geom 41, pp.461-479.
9. Davey, A.M., Flores, B.E., (1993). “Identification of seasonality in time series: A note,” Mathematical and Computer Modelling, 18, Issue 6, pp. 73-81.
10. Dreiseitl, S., Ohno-Machado, L., (2002). “Logistic regression and artificial neural network classification models: a methodology review,” Journal of Biomedical Informatics, 35, Issues 5–6, pp. 352-359.
11. Ferreira, L. N., Zhao, L., (2016). “Time series clustering via community detection in networks,” Information Sciences, 326, pp. 227-242
12. Franses, P. H., (1991), “Seasonality, non-stationarity and the forecasting of monthly time series.” International Journal of forecasting, 7(2), pp. 199-208.
13. Goutte, C., Gaussier, E., (2005). “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,” Advances in Information Retrieval, pp 345–359
14. Hamilton, J. D., (2020) “Time series analysis.” Princeton university press.
15. Holt, G. A. ten, Reinders, M. J. T., Hendriks, E., (2007). “Multi-Dimensional Dynamic Time Warping for Gesture Recognition,” Thirteenth annual conference of the Advanced School for Computing and Imaging.
16. Huang, W., (2019), “Time Series Forecasting and Analysis: A Study of American Clothing Retail Sales Data” Honors Undergraduate Theses. 643
17. Kumari, R., & Srivastava, S. K., (2017). “Machine learning: A review on binary classification.” International Journal of Computer Applications, 160(7)
18. Li, J., (2019). “Regression and Classification in Supervised Learning,” Proceedings of the 2nd International Conference on Computing and Big Data, pp. 99-104,
19. Li, L., Chang, Q., Xiao, G., Ambani, S. (2011). “Throughput Bottleneck Prediction of Manufacturing Systems Using Time Series Analysis.” Journal of Manufacturing Science and Engineering, 133(2), pp. 1-8.
20. López, V., Fernández, A., García, S., Palade, V., Herrera, F., (2013) “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Information Sciences, 250, pp.113-141
21. Lu, C. J., Lee, T. S., Chiu, C. C., (2009) “Financial time series forecasting using independent component analysis and support vector regression,” Decision Support Systems, 47(2), pp. 115-125.
22. Martin, D., Serrano, A., Bergman, A., Wetzstein, G., Masia, B., (2021) “ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{circ}$ Images.”
23. Mathur, A., Foody, G. M., (2008) “Multiclass and Binary SVM Classification: Implications for Training and Classification Users,” in IEEE Geoscience and Remote Sensing Letters, 5(2), pp. 241-245.
24. Menardi, G., Torelli, N. (2014) “Training and assessing classification rules with imbalanced data,” Data Min Knowl Disc 28, pp. 92-122.
25. Menon, A. K., Williamson, R. C. (2018). “The cost of fairness in binary classification.” In Conference on Fairness, Accountability and Transparency, pp. 107-118.
26. Miljković, D., (2011) “Fault detection methods: A literature survey,” 2011 Proceedings of the 34th International Convention MIPRO, pp. 750-755.
27. Phan, T. T. H., Caillault, E. P., Lefebvre, A., Bigand, A., (2017) “Dynamic time warping based imputation for univariate time series data,” Pattern Recognition Letters, 139, pp. 139-147.
28. Sarker, I. H., (2021) “CyberLearning: Effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks,” Internet of Things, 14, 100393
29. Shen, S. Y., (2020) “Establishing an early warning system on streaming data by anomaly detection,” 中央大學工業管理所碩士論文
30. Soltanzadeh, P., Hashemzadeh, M., (2021). “RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem,” Information Sciences, 542, pp. 92-111
31. Song, S., & Baek, J. G. (2020). “New Anomaly Detection in Semiconductor Manufacturing Process using Oversampling Method.” In ICAART (2), pp. 926-932.
32. Sperandei, S., (2014). “Understanding logistic regression analysis.” Biochemia medica, 24(1), 12-18.
33. Switonski, A., Josinski, H. & Wojciechowski, K., (2019) “Dynamic time warping in classification and selection of motion capture data,” Multidim Syst Sign Process 30, 1437-1468.
34. Wang, F., Shao, W., Yu, H., Kan, G., He, X., Zhang, D., Ren, M., & Wang, G., (2020) “Re-evaluation of the Power of the Mann-Kendall Test for Detecting Monotonic Trends in Hydrometeorological Time Series,” Frontiers in Earth Science, 8.
35. Wang, J. C., Hu, J., Xu, H. M., (2007) “A strategy on constructing core collections by least distance stepwise sampling.” Theor Appl Genet, 115, pp. 1-8.
36. Wu, S. F., Chang, C. Y., Lee, S. J., (2015). “Time series forecasting with missing values.” In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), pp. 151-156.
37. Zhao, F., Gao, Y., Li, X., An, Z., Ge, S., Zhang, C., (2021) “A similarity measurement for time series and its application to the stock market,” Expert Systems with Applications, 182, 115217.
38. Zhou, W. X., Sornette, D., (2008). “Analysis of the real estate market in Las Vegas: Bubble, seasonal patterns, and prediction of the CSW indices,” Physica A: Statistical Mechanics and its Applications, 387(1).

指導教授

曾富祥(Fu-Shiang Tseng)

審核日期

2022-8-13

推文