Data mining is an(66)research field in database and artificial intelligence. In this paper, the data mining techniques are introduced broadly including its producing background, its application and its classification. The principal techniques used in the data mining are surveyed also, which include rule induction, decision(67), artificial(68)network, genetic algorithm, fuzzy technique, rough set and visualization technique. Association rule mining, classification rule mining, outlier mining and clustering method are discussed in detail. The research achievements in association rule, the shortcomings of association rule measure standards and its(69), the evaluation methods of classification rules are presented. Existing outlier mining approaches are introduced which include outlier mining approach based on statistics, distance-based outlier mining approach, data detection method for deviation, rule-based outlier mining approach and multi-strategy method. Finally, the applications of data mining to science research, financial investment, market, insurance, manufacturing industry and communication network management are introduced. The application(70)of data mining are described.
A.intractable
B.emerging
C.easy
D.scabrous
A、Clustering belongs to supervised learning.
B、Principles of clustering include maximizing intra-class similarity and minimizing interclass similarity.
C、Outlier analysis can be useful in fraud detection and rare events analysis.
D、Outlier means a data object that does not comply with the general behavior of the data.
Assignment 6 - Outlier mining You are required to use outlier mining methods to detect the outliers with given data sets. In a section of a city road, several cameras are set to collect the plate of vehicles from 2017-06-09 to 2017-06-12, as well as the date and time when passing the start point and the finish point. Travel time is calculated later. Time serial is another form of transformation from start time. So each instance contains 8 attributes, including serial number, license plate number, date and time passing start/end point, time serial and travel time. There are totally 4977 instances. You need to finish the following tasks. Task: (1) Use statistic-based approach to detect the outliers of travel time. Calculate the mean value and the variance of travel time. Write out the confidence interval. Take time serial as X-axis and the travel time as Y-axis. Plot the scatter diagram and mark the outliers you have recognized. (2) Use distance-based approach to detect the outliers of travel time. An object o in data set D is defined as an outlier with parameters r and π described as DB(r,π), if a fraction of the objects in D lie at a distance less than r from o is less than π, o is an outlier. Let parameter r vary from 0.1 to 0.3 with the step of 0.1, and π vary from 30 to 90 with the step of 30, find the outliers and the number of the outliers. You can use the Euclidian distance. (3) Use density-based approach to detect the outliers of travel time. With different k (from 3 to 400 with the step of 5), the number of neighbors, calculate the LOF for each data point. Set 2.0 as a threshold for LOF and an object is labeled as an outlier if its LOF exceeds 2.0. Firstly, take k value as X-axis and the number of outliers as Y-axis. Plot the line chart. Secondly, calculate the LOF for each data point and give the top 4 outliers. Use k=350 and the Euclidian distance.
A、mean
B、median
C、mode
D、none of the above
A、Traffic incident detection
B、Credit card fraud detection
C、Network intrusion detection
D、Medical analysis
在一个n维的空间中,最好的检测outlier(离群点)的方法是()
A.作正态分布概率图
B.作盒形图
C.马氏距离
D.作散点图
A、K should be at least 10 to remove unwanted statistical fluctuations.
B、Pick 10 to 20 appears to work well in general.
C、Pick the upper bound value for k as the maximum of “close by” objects that can potentially be global outliers.
D、Pick the upper bound value for k as the maximum of “close by” objects that can potentially be local outliers.
为了保护您的账号安全,请在“简答题”公众号进行验证,点击“官网服务”-“账号验证”后输入验证码“”完成验证,验证成功后方可继续查看答案!