Data Mining

raross · Jul 29, 2006

What kind of research are you doing?

Chankama · Jul 29, 2006

Signal processing man. Signal denoising and feature extraction mainly. I am working full-time now though at a company.

Do you do active research?

raross · Jul 29, 2006

Very nice. I work as a research assistant for a university doing gene finding and DNA Computing research.

Chankama · Jul 29, 2006

Excellent field. So I suppose you are also doing a Masters or a PhD at the same time?

I wanted to do some work in that area as well. Work on data mining to determine which genes are responsible for various human conditions and such. My biology knowledge is pretty low to even start going into that area. Perhaps, sometime later in the future.

.

elitesoldier · Jul 29, 2006

raross said:
Very nice. I work as a research assistant for a university doing gene finding and DNA Computing research.

Which university?

raross · Jul 29, 2006

Chankama said:
Excellent field. So I suppose you are also doing a Masters or a PhD at the same time?

I wanted to do some work in that area as well. Work on data mining to determine which genes are responsible for various human conditions and such. My biology knowledge is pretty low to even start going into that area. Perhaps, sometime later in the future. .

I will be soon, I am currently still an under grad. My knowledge of biology is also very limited. With machine learning it allows you to draw conclusions about biology without knowing anything about it. It annoys the **** out of the biologists

. But I have not worked in machine learning in awhile, we are mostly working on coding theory and different coding sets, bifix, cfc etc. I am currently doing under grad research at CCU.

chanyeehon · Jul 30, 2006

hi raross. I am actually new to this. However i do believe database does cover data mining. Data mining > Data mining (DM), also known as Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining (KDD), is the process of automatically searching large volumes of data for patterns.

I am going to study this when i am at 3rd year at university:

Knowledge and Database Management Plan

Knowledge and Database Management takes its practices from the past and forecasts the future with analysis and processes
to improve business operations.
The objective of this stream is to produce graduates who are capable of:
Â• Promoting the importance of knowledge management in organizations to support corporate goals
Â• Identifying opportunities which produce knowledge in organizations
Â• Deploying skills related to the collection, transmission, analysis, sharing and reuse of the knowledge
Â• Employing technology appropriately to support knowledge management
Â• Establishing enterprise-wide procedures to support knowledge as a corporate asset

Examples of Entry Level Jobs
Data Warehouse Analyst ,IT Consultant
End User Support ,Data Integrity Analyst/Tester
Data Centre Operator ,Database Administrator
Project Planner Decision-Support Systems Specialist

Aspirin · Aug 1, 2006

This thread is getting better. To say that data mining has nothing to do with databases (which contain the data that data mining runs analysis on) is arguing too narrow of a scope imho. I'm not saying that anyone who argues that position is wrong regarding specializing in the discipline of data mining, what I am saying is that I don't think it has to be argued from the position they are indeed completely separate.

Take data-driven DSS which allows users to extract and analyze useful information from large databases by using statistical or other analytical tool to find hidden patterns and relationships in large databases to infer rules. "This way of analyzing data is also known today as data mining or Knowledge discovery in databases or data warehouses. The actual DSS possesses a DSS database and a DSS software system" (1).

You can have data without intelligence but how can you have intelligence without data? The database is necessary or the DSS software system is useless. Hence my argument the scope of which I expand to include all that is material to data and databases as intrinsically tied to data mining theory and not easily separated from it as I see some arguing.

In the realm of biology, since it is brought up, DNA sequencing must have something to sequence right? A random example if you will: Relying on DNA sequence information rather than physical characteristics for comparisons, evolutionary biologists discovered repeatable evolution for Anolis lizards, ranid frogs, cichlid and stickleback fish, river dolphins, mangabey monkeys, and island plants. And in 2003 scientists from UC Berkeley studying tropical salamanders discovered another example of repeatable evolution showing that the phenomenon of repeatable evolution can be properly viewed as a characteristic feature of the biological realm.

All well and good. (Note: for you creationists this isn't really a blow to creationism because the widespread occurrence of Â“repeatable evolutionÂ” strikes a blow at chanceÂ–Â–the essence of the evolutionary process. However, the same data fits old earth creation models beautifully. The repeated occurrence of unrelated organisms possessing trait combinations needed for survival in a particular ecological niche points to Â“repeated creationÂ” rather than to evolution. As one scientist from CalTech told me, "ItÂ’s not surprising that a single Creator would reuse the same good design more than once to bring into existence organisms perfectly suited for their environment.")

But again, not to get too far off track here except to assert that what I am arguing (which is an increased scope) applies both to computer databases and to the natural world.

You can divide them but at some point you have to put them back together again for one of them (and I'm not talking about the data/database) to work.

--------------

(1) Aronson, J. E., Liang, T., & Turban, E. (2004). Decision support systems and intelligent systems (7th ed.). New York: Prentice Hall.has

raross · Aug 1, 2006

Aspirin said:
This thread is getting better. To say that data mining has nothing to do with databases (which contain the data that data mining runs analysis on) is arguing too narrow of a scope imho. I'm not saying that anyone who argues that position is wrong regarding specializing in the discipline of data mining, what I am saying is that I don't think it has to be argued from the position they are indeed completely separate.

Take data-driven DSS which allows users to extract and analyze useful information from large databases by using statistical or other analytical tool to find hidden patterns and relationships in large databases to infer rules. "This way of analyzing data is also known today as data mining or Knowledge discovery in databases or data warehouses. The actual DSS possesses a DSS database and a DSS software system" (1).

You can have data without intelligence but how can you have intelligence without data? The database is necessary or the DSS software system is useless. Hence my argument the scope of which I expand to include all that is material to data and databases as intrinsically tied to data mining theory and not easily separated from it as I see some arguing.

In the realm of biology, since it is brought up, DNA sequencing must have something to sequence right? A random example if you will: Relying on DNA sequence information rather than physical characteristics for comparisons, evolutionary biologists discovered repeatable evolution for Anolis lizards, ranid frogs, cichlid and stickleback fish, river dolphins, mangabey monkeys, and island plants. And in 2003 scientists from UC Berkeley studying tropical salamanders discovered another example of repeatable evolution showing that the phenomenon of repeatable evolution can be properly viewed as a characteristic feature of the biological realm.

All well and good. (Note: for you creationists this isn't really a blow to creationism because the widespread occurrence of Â“repeatable evolutionÂ” strikes a blow at chanceÂ–Â–the essence of the evolutionary process. However, the same data fits old earth creation models beautifully. The repeated occurrence of unrelated organisms possessing trait combinations needed for survival in a particular ecological niche points to Â“repeated creationÂ” rather than to evolution. As one scientist from CalTech told me, "ItÂ’s not surprising that a single Creator would reuse the same good design more than once to bring into existence organisms perfectly suited for their environment.")

But again, not to get too far off track here except to assert that what I am arguing (which is an increased scope) applies both to computer databases and to the natural world.

You can divide them but at some point you have to put them back together again for one of them (and I'm not talking about the data/database) to work.

--------------

(1) Aronson, J. E., Liang, T., & Turban, E. (2004). Decision support systems and intelligent systems (7th ed.). New York: Prentice Hall.has

No, it is simple. Data mining has nothing to do with databases because you can practice it without using a database. You guys are being very superficial when you talk about data mining. You have to look at what you're actually doing, it does not matter where you get your data.

However, when you practice data mining, you could also use a database, but they are seperate. As data mining really has nothing to do with a database.

As for the DNA business, the genetic code at this point has no structure. So I would say you're assuming too much about the evolutionary business. I think what you meant is that these organisms may share 50% of the same genes etc. From a molecular stand point, the creationist have already been blown out of the water. When we share 99.99% the same genes as a chimpanzee and now they are cracking down on this evolutionary business in genes. Evolution started very early as the genetic code has evolved several times before creating the life we know today.

Aspirin · Aug 2, 2006

No, it is simple. Data mining has nothing to do with databases because you can practice it without using a database.

>That's true (I read that data mining was designed to find patterns in data across enterprises of databases most structured some unstructured); however, you still need data.

You guys are being very superficial when you talk about data mining. You have to look at what you're actually doing, it does not matter where you get your data.

>Good point. Again you need data.

However, when you practice data mining, you could also use a database, but they are seperate. As data mining really has nothing to do with a database.

>Data mining needs data to produce information.

As for the DNA business, the genetic code at this point has no structure. So I would say you're assuming too much about the evolutionary business.

> Sure it does. That's why species replicate into species of the same species for example.

I think what you meant is that these organisms may share 50% of the same genes etc. From a molecular stand point, the creationist have already been blown out of the water. When we share 99.99% the same genes as a chimpanzee and now they are cracking down on this evolutionary business in genes.

> The "99 % genetic similarity" has been enshrined as a cultural icon and used as a "proof" evolution. Presumably, the 99 percent sequence overlap for proteins and DNA proves that humans and chimps arose from a common ancestor some time in the relatively recent past. According to this view, the small genetic differences arose after the human and chimpanzee lineages split as a consequence of mutational changes within each species' genetic material.

Studies that reveal a 99 percent genetic similarity between humans and chimpanzees have stacked the deck in a way that guarantees a high degree of likeness. Until recently, evolutionary biologists have looked for only a single type of difference between human and chimpanzee DNA sequences, namely
substitutions of one nucleotide for another. When researchers expand the comparison to include differences that involve insertions and deletions (called indels), marked dissimilarities between human and chimpanzee genomes become evident.

Another study that used this type of approach found a much more
limited genetic similarity when a 1,870,955-base-pair segment of the chimpanzee genome was compared with the corresponding human genome region. When only substitutions were considered, the sequence similarity proved about 98.6 percent. Including indels in the comparison dropped the similarity to 86.7 percent.

In the spring of 2004 the International Chimpanzee Chromosome 22 Consortium affirmed this initial observation when they generated a detailed sequence of chimpanzee chromosome 22 and compared it to human chromosome 21. They discovered a 1.44 percent sequence difference when they lined up the two chromosomes and made a base-by-base comparison.

But they also discovered 68,000 indels in the two sequences, with some indels up to 54,000 nucleotides in length. Another study achieved similar results. This work compared a 1.8-million-base-pair region of human chromosome 7 with the corresponding region in the genomes of several vertebrates. Only a third of the differences between humans and chimpanzees involved substitutions.

Indels accounted for roughly two-thirds of the sequence differences between these two primates. Of these indels, about one-half were greater than 100 base pairs long. As for mitochondrial DNA, a 91.1 percent sequence similarity was seen when the entire sequence was compared, not a 99 percent similarity. This factor promises to be significant because mitochondria play a role in energy metabolism. Several neurodegenerative and muscular degenerative diseases actually stem from mutations in mitochondrial DNA.

Honestly I could write a book on this issue but the bottom line is that although human and chimpanzee genomes display great similarity, that similarity has been magnified to some extent by research methodology. Researchers are starting to uncover significant differences. Results of large-scale comparisons must be considered preliminary, as it's not yet clear what the genetic differences mean in terms of anatomical and behavioral characteristics. However, greater clarity will likely come as research progresses. Already the newly recognized genetic differences between humans and chimpanzees complicate the picture for biologists who view the high degree of genetic similarity between humans and chimpanzees as proof of shared ancestry. If 99 percent genetic similarity represents a close evolutionary connection, what does the more recently measured 86.7 percent genetic similarity mean? And in this discussion small is significant my friend as several recent studies demonstrate that even subtle genetic differences can manifest themselves dramatically in terms of an organism's anatomy, physiology, and behavior.

Sure researchers are beginning to gain knowledge of gene expression patterns in humans and the great apes. Yet these initial studies already indicate that anatomical, physiological, and behavioral differences between humans and chimpanzees (as well as the other great apes) result much more from differences in gene expression than from DNA sequence disparities. In many instances, it's not the genes present that are important but the way they function. What does it mean to be 98 percent chimpanzee? In terms of evolution: essentially nothing.

The most comprehensive genetic comparisons indicate that humans and chimpanzees share a genetic similarity closer to about 85 percent than to 99 percent. From an evolutionary perspective, if a 99 percent genetic similarity reflects a close evolutionary connection, then an 85 percent genetic similarity distances humans from chimpanzees.

Other recent studies demonstrate that even small genetic differences (such as the presence or absence of a single gene or an altered gene structure) translate into significant biological differences. These help explain why humans stand apart from the great apes. Additionally, studies now show that gene expression patterns in the human and chimpanzee brains (and other tissues and organs) also differ. In other words, the difference between human biology and behavior and chimp biology and behavior likely depends to a large extent on the difference in gene usage, not the types of genes present. These discoveries are just the iceberg's tip. Based on the trend line, future work will likely identify other important genetic differences between humans and chimpanzees. Recent studies, for example, identified recombination hot spots in the human genome that are absent from the chimpanzee's genetic makeup." Differences in recombination affect mutation rates and biological variation within a species. Another recent study determined that the human genome has 200 times more copies of a class of noncoding DNA (referred to as Alu Yb8) than does the chimpanzee genome. I would like to get into this further but room doesnÂ’t allow.

Such discoveries may not necessarily invalidate human evolution, but they do make evolutionary explanations for human origins less plausible and more difficult to accept. From an evolutionary perspective, the scientific data indicate that substantial genetic change must have occurred within an exceptionally short time frame (5 to 6 million years at most). More problematic are the growing number of instances in which small differences in human genes produce the just-right biological effects necessary to account for profound biological and behavioral differences between humans and chimpanzees.

The complexity and the intricacy of biological systems, especially those in the brain, underscore the improbability that random mutations could bring about the exacting changes in gene structure necessary to support new biological functions, particularly when structure-altering mutations to single genes more often result in devastating diseases and disabilities. The same is true for changes in gene expression. As indicated by the data, differences in gene usage play an important role in generating the differences between humans and chimpanzees. The intricacy of gene-to-gene interactions and the biological effects manifested when gene expression is altered make it difficult to envision how coordinated and extensive changes in gene expression could occur to generate the anatomical and physiological characteristics that define humanity.

Changes in gene expression are frequently harmful and play a role in the etiology of many diseases. Each new discovery coming from genetic comparisons between humans and chimpanzees seems to weaken the case of evolution.

IÂ’m not saying evolution doesnÂ’t occur. Of course microevolution occurs. IÂ’m just pointing out the facts here with a naturalistic macroevolutionary model in the arena of research dealing with chimpanzee and human similiarities.

Evolution started very early as the genetic code has evolved several times before creating the life we know today.

> Well that's a whole other discussion and there's not enough room here to properly discuss that with you. New post! Lol..

Data Mining

raross

In Runtime

Chankama

Fully Optimized

raross

In Runtime

Chankama

Fully Optimized

elitesoldier

In Runtime

raross

In Runtime

chanyeehon

Beta member

Aspirin

Baseband Member

raross

In Runtime

Aspirin

Baseband Member

Similar threads