Most Common Types of Data Mining
Data mining is most useful in identifying data patterns and deriving useful business insights from those patterns. To accomplish these tasks, data miners use a variety of techniques to generate different results. Here are five common data mining techniques.
Classification Analysis
With this technique, data points are assigned to groups, or classes, based on a specific question or problem to address. For instance, if a consumer packaged goods company wants to optimize its coupon discount strategy for a specific product, it might review inventory levels, sales data, coupon redemption rates, and consumer behavioral data in order to make the best decision possible.
Association Rule Learning
This function seeks to uncover the relationships between data points; it is used to determine whether a specific action or variable has any traits that can be linked to other actions (e.g., business travelers’ room choices and dining habits). A hotelier might use association rule insights to offer room upgrades or food and beverage promotions to attract additional business travelers.
Anomaly or Outlier Detection
In addition to searching for patterns, data mining seeks to uncover unusual data within a set. Anomaly detection is the process of finding data that doesn’t conform to the pattern. This process can help find instances of fraud and help retailers learn more about spikes, or declines, in the sales of certain products.
Clustering Analysis
Clustering looks for similarities within a data set, separating data points that share common traits into subsets. This is similar to the classification type of analysis in that it groups data points, but, in clustering analysis, the data is not assigned to previously defined groups. Clustering is useful for defining traits within a data set, such as the segmentation of customers based on purchase behavior, need state, life stage, or likely preferences in marketing communication.
Regression Analysis
Regression analysis is about understanding which factors within a data set are most important, which can be ignored, and how these factors interact. With this technique, data miners are able to validate theories such as “when a lot of snow is predicted, more bread and milk will be sold before the storm.” While this seems obvious enough there are a number of variables that need to be verified and quantified for the store manager to make sure enough stock is available. For example, how much is “a lot” of snow? How much is “more milk and bread”? Which types of weather forecasts tend to cause consumer action and how many days before the storm will consumers start buying? What is the relationship between inches of snow, units of bread, and units of milk?
Through regression analysis, specific inventory levels of milk and bread (in units/cases) can be recommended for specific levels of snow forecasted (inches), at specific points in time (days before the storm). In this way, the use of regression analysis maximizes sales, minimizes out-of-stock instances, and helps avoid overstocking which results in product spoilage after the storm.