Comp578                                                                                                                     Susan Portugal Fall 2008

Assignment 1                                                                      September 4, 2008

"Suppose that you are employed as a data mining consultant for an Internet search engine company.  Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied."

Data mining can help a company in many ways, particularly an Internet search engine company.   The internet search engine can give the user more options in their methods to find exactly what they are searching for through a large data base of information.


Clustering is a useful way to search through information that is exactly similar.  The user can input a list of variables that classify the main idea that is being searched.  Clustering will take the list of variables and match them with other main topics that contain the same variables.   

This method of data mining can be applied to many different searching applications.  One example can be a search through databases or websites containing drug interaction information.  If a patient is taking drugs X, Y, and Z (variables), the user can find out if this combination is a safe combination from previous observations and/or what other drugs can be mixed with these drugs.  If drugs X, Y, Z, and A is are good combination to help a certain symptom, then the doctor and/or pharmacies can prescribe the combination to the patient.


Classification deals with a group of variables that have some associations that are discrete.  As in the previous example for a pharmacology search engine; drugs X, Y, and Z may be similar drugs to help lower blood pressure.  Each variable can be a drug that is used in conjunction with other variables (drugs).  Not all drugs are prescribed by a doctor, some drugs can be over the counter, other can be vitamins or even nutritional components found in foods.  Using classification these particular variables may be stored as binary (discrete) variables that if enabled (1) or disabled (0) would aid the search algorithm.

Association Rule Mining:

This method of data mining is used to discover patterns within the input and the data base creating a strong link that associates the two variables.  This type of data mining can take a link of words, i.e. a sentence or short phrase, and compare it to previous searches that have been performed in the past.

An example of association rule mining can be applied to many points of interest searches.  If a user is searching Pismo Beach, CA then other associated phrases can come up to help a user find out more information about the point of interest.  The data base can keep a log of all previous searches performed by other users.  For example if one person was in search for the famous cinnamon rolls made in Pismo Beach, CA searching “Pismo Beach Cinnamon Rolls”, when the next user types just a portion of a phrase, “Pismo Beach, CA,” the search engine can convey that there are cinnamon rolls made in the same point of interest.

Anomaly Detection:

Anomaly detection is when an input variable is very dissimilar from other variables (or events) contained in the data base.  This is a helpful tool to insure that only pertinent information is included in search results.
                                                                                                                                                                  Anomaly detection can be useful in the first pharmacology example to insure that the only information relayed to the user is about related drugs, rather than drugs associated with treating unrelated symptoms.     


Introduction to Data Mining,By: Pang-Ning Tan, Michael Steinbach, Vipin Kumar - Addison Wesley,2005,0321321367