Data Mining

Status
Not open for further replies.

lakhanin

Beta member
Messages
4
What kind of degree or learning path i have to look in to to get in to the field of Data mining?
 
Typically, these days, edcational centers and even online education can provide a custom based "Data Mining Degree". Course structure would include classes like mathematical statistics, basic computer science, and multiple courses on database education. There are classes on data mining itself, i'm sure. But, aside from teh data mining classes, those are the other classes that you will be focusing on.
 
What field is strictly data mining? I only know of fields which apply data mining as a tool, in which case bioinformatics uses a lot of data mining tools and also security.
 
chanyeehon said:
data mining = database

The practice of data mining has nothing to do with databases. Your understanding of data mining is very superficial. People use data mining in practice to take sets of arbitrary data and learn something from it. This goes into the realm of machine learning.

Example: I could take a representative sample of data at a hospital. LetÂ’s say, I run 10 tests on every patient and I have 100 patients. These tests determine if they have cancer or not. But being a computer scientist I have absolutely no clue what this data really means. Therefore, I can now run the data using a machine learning algorithm and analyze the data at the mathematical level and make conclusions about the cancer from this data.

Hope this helps.
 
^Incorrect. Data mining does have something to do with databases.

Wikipedia states that "Data mining (DM), also known as Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining (KDD), is the process of automatically searching large volumes of data for patterns" as in "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" [1] and "the science of extracting useful information from large data sets or databases" [2].

"Data mining involves the process of analyzing data to show patterns or relationships; sorting through large amounts of data; and picking out pieces of relative information or patterns that occur e.g., picking out statistical information from some data."

"Generally, data mining (also called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases."

They then offer a simple retail example:

"A simple example of data mining is its use in a retail sales department. If a store tracks the purchases of a customer and notices that a customer buys a lot of silk shirts, the data mining system will make a correlation between that customer and silk shirts. The sales department will look at that information and may begin direct mail marketing of silk shirts to that customer, or it may alternatively attempt to get the customer to buy a wider range of products. In this case, the data mining system used by the retail store discovered new information about the customer that was previously unknown to the company."

Hope Wikipedia's definitions and example simplifies things a bit for the layperson.

1. W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine, Fall 1992, pp. 213-228.

2. D. Hand, H. Mannila, P. Smyth: Principles of Data Mining. MIT Press, Cambridge, MA, 2001. ISBN 0-262-08290-X
 
uhmm? Yes, you can get the data anywhere, from databases or you can create it. This has nothing to do with data mining does it? The guy said databases = data mining, obviously your understanding of data mining is very superficial as well?
 
Data mining foremost has to do with pattern recognition. Whether it be patterns you know about before hand or unknown patterns. Searching for trends for example.

So, the results of data mining can be used for machine learning for example as mentioned before.

Obviously, the organization of the data in the underlying database is important. So knowledge of databases, distributing computing, signal processing are always very useful to have on top of the traditional "artificial intelligence" related topics such as classification, pattern recognition, statistical processing or data, and other various topics related to statistics.
 
Chankama said:
Data mining foremost has to do with pattern recognition. Whether it be patterns you know about before hand or unknown patterns. Searching for trends for example.

So, the results of data mining can be used for machine learning for example as mentioned before.

Obviously, the organization of the data in the underlying database is important. So knowledge of databases, distributing computing, signal processing are always very useful to have on top of the traditional "artificial intelligence" related topics such as classification, pattern recognition, statistical processing or data, and other various topics related to statistics.

Pattern recognition is an implication of machine learning. The simplest machine learning/pattern recognition algorithm and sometimes useless on multidimensional data is the k-means algorithm.

If the data is simply being classified by statistics then the actual structure or organization of the data will not make a bit of difference. But of course this is one of the things you could analyze.

One of the biggest guys in statistical algorithms over the years has been the hidden markov model. Which is used in overspecialized cases of generating a machine to classify objects, but this has proven to be one of the best tools.
 
raross said:
Pattern recognition is an implication of machine learning. The simplest machine learning/pattern recognition algorithm and sometimes useless on multidimensional data is the k-means algorithm.

Pattern recognition is more of a tool that helps machine learning IMO. The topic of machine learning has to also deal with computational complexity and other "computer" related learning issues as well. Ensure the algorithms are efficient and what not.

Pattern recognition can be completely abstracted from the actual "machine" so to speak.

raross said:
If the data is simply being classified by statistics then the actual structure or organization of the data will not make a bit of difference. But of course this is one of the things you could analyze.

Whatever do you mean? The structure of the data is quite important sometimes. For example, with Independent Component Analysis, there is a whole area that deals with the time structure of the input signal and how to incorporate that info. The underlying "organization" could very well be what you are actually trying to discover - through statistics.

raross said:
One of the biggest guys in statistical algorithms over the years has been the hidden markov model.

Read about it, looked at the theory, never used it in my research though. I suppose someone doing their research on Classification should know the subject inside out.
 
Status
Not open for further replies.
Back
Top Bottom