Does anyone know...

Status
Not open for further replies.
Yeah Emily. We were slightly off.

I can see the "steps" of your calculation. However, I don't think it's right to do that. Not the #s, but the approach.

Please describe the problem in more detail including the form of the data you have.

Also, you typically don't want to mix the training set and the testing set. You should make your "model" before hand. And then use that to classify the "test data". Here, you seem to be mixing them both. You are generating the boundaries from some data, and then trying to classify that same data. It looks as if you are making the decision "relative" to each other. Which shouldn't be the case.

Perhaps, it's b/c I am still not sure about the form of your data (or b/c I am too sleepy). :(

Also, the link I gave is a very brief flavor of Hypothesis Testing. I don't recommend it as an "intro". If you want, I can try to find a better link for you..
 
Chankama said:
I can see the "steps" of your calculation. However, I don't think it's right to do that. Not the #s, but the approach.
I agree... it's kind of an approximate way to make the criteria for "similar" more rigid. But then again, all I'm looking to do is approximate the earthquake from the noise, because once I get an approximation with a statistical method I'm planning to use a neural network to refine it. I guess it still should be a mathematically valid way of approximating it though...

The problem with mixing the training set and the testing set is that the data is analyzed in real-time, so there is no model, really. The algorithm has to detect that a signal is an earthquake without seeing anything that comes after that signal. (Is that what you mean by mixing the training set and the testing set?)

The data is (are?) basically just velocities of a sensor in the ground. So a very unusually high velocity means an earthquake, but the sensors will also have some very very tiny velocity, which is the noise.

Don't worry about finding another hypothesis testing link - I have some statistics textbooks that will have it. Thanks for your help! :)
 
just a thought:

have you gone outside to look at the trees , the sky, the earth's creatures and just say, "Science is cool and all but it leads nowhere so i'll just enjoy my life!"?
.
.
.
.
.
.
.
.
.
.
.
no? well i have. it's a lot more fun. wanna know why all big scientists believe in religion?
 
Hey Emily. A few things.

Emily said:
But then again, all I'm looking to do is approximate the earthquake from the noise,

sensors will also have some very very tiny velocity, which is the noise.

What do you mean by "noise" exactly. Typically, people reserve the word "noise" to describe the component of the signal that is "useless" to them. i.e. doesn't contain any "useful" information.

Emily said:
because once I get an approximation with a statistical method I'm planning to use a neural network to refine it. I guess it still should be a mathematically valid way of approximating it though...

Why not start with Neural Networks to begin with? Given the type of neural network, they train/classify pretty fast. The back-prop NN are slow to train. But, the PNN (Probabilistic Neural Networks) are very fast. What program are you using to create the networks? Matlab variant or C++ ?

But, if you want to use the standard deviation as a discriminatory feature, then I think Hypothesis testing is worth looking at. In fact, you can use the output of the Hypotesis Testing (0 or 1) as an input to the Neural Network. This way, you can consider "other" features like the amplitude, strength of various frequencies, etc also. So, the "standard deviation/hypothesis testing" component can be "one" of the many feature extractors that goes into your NN.

Emily said:
The problem with mixing the training set and the testing set is that the data is analyzed in real-time, so there is no model, really. The algorithm has to detect that a signal is an earthquake without seeing anything that comes after that signal. (Is that what you mean by mixing the training set and the testing set?)

No. What I meant was that you need "some" data and their classification (earthquake/not earthquake) by an actual human expert to first "build" the model. If you don't have that, it becomes a clustering problem. Clustering is sortta trying to "separate" unlabelled data. Try to get someone to do the classifications for you for "some" of the data - if u don't have that already.

"GIven" these classifications, your algorithm or yourself can figure out the necessary model that can automatically classify "future" data. With a neural network, it has to "train". This is where the "human expert" comes in.

I mean, if I give you a time-series right now, how do you know that it is coming from an earthquake or not? b/c you have a model in your head from "your" experience. You need to give the same ability to the computer. Similarly, if you gave the same time-series to a highly intelligent child, he won't have a clue what it is as he has no prior experience with the data.

But, he "might" be able to separate the data into 2 different classes and say, "well, these signals look different from these other ones" - this is clusterring. It's useful when you don't have any labelled data to train with.
 
Banana%20M-0089.jpg
 
Chankama said:
What do you mean by "noise" exactly. Typically, people reserve the word "noise" to describe the component of the signal that is "useless" to them. i.e. doesn't contain any "useful" information.
The noise is all the data before the earthquake... I guess when you're just considering the earthquake it would be useless.

Why not start with Neural Networks to begin with? Given the type of neural network, they train/classify pretty fast. The back-prop NN are slow to train. But, the PNN (Probabilistic Neural Networks) are very fast. What program are you using to create the networks? Matlab variant or C++ ?
I think time isn't really a huge consideration, as long as the algorithm runs in polynomial time - the problem with existing methods that do this is accuracy rather than time. But then again, I guess it's also a question of efficiency, because it would be highly inefficient to input every single data point into the network, so the statistics acts as a filter for what to further consider as the first arrival. (C++, by the way)

But, if you want to use the standard deviation as a discriminatory feature, then I think Hypothesis testing is worth looking at. In fact, you can use the output of the Hypotesis Testing (0 or 1) as an input to the Neural Network. This way, you can consider "other" features like the amplitude, strength of various frequencies, etc also. So, the "standard deviation/hypothesis testing" component can be "one" of the many feature extractors that goes into your NN.

This looks like what I was thinking of doing... using the statistical dissimilarity as part of determining whether or not it's dissimilar enough to be an earthquake.

No. What I meant was that you need "some" data and their classification (earthquake/not earthquake) by an actual human expert to first "build" the model. If you don't have that, it becomes a clustering problem. Clustering is sortta trying to "separate" unlabelled data. Try to get someone to do the classifications for you for "some" of the data - if u don't have that already.
Hmm... the human classifications are used to train the n.n., but building a mathematical model out of the human classification is, I think, a very very very complex task - if it were simple, detecting first arrival wouldn't be a problem. So I think clustering has to be used to detect possible first arrivals, and then those are inputted into the n.n., in place of a mathematical model built on the human classifications.
 
Xula said:
Just curious...when/where/how would this be used? I can't think of a real-world application...maybe I'm just tired...but...:rolleyes:
i think that is the case 4 everything i learn at skool
 
Emily said:
I think time isn't really a huge consideration, as long as the algorithm runs in polynomial time - the problem with existing methods that do this is accuracy rather than time.

Gotta be careful when you talk about the O(.) approximations. When you say 'polynomial" time, it is useful when you think of the scalability and such. After all, you don't want something to be exponential when you have no idea of the size of your input vector.

Neural Networks should typically run in poly time. But, that doesn't mean all NN are the same. On paper, it's nice to say, "yeah, this NN runs in O(n^2)".. But, when you actually get down to it, the difference between 5 min vs 6 hrs is pretty big.

And Probabilistic Networks aren't necessarily sacrificing accuracy just b/c it trains faster. It actually has other good properties as well - over Back propagation NNs. Everything depends on the problem at hand of course.

Originally posted by Emily
But then again, I guess it's also a question of efficiency, because it would be highly inefficient to input every single data point into the network, so the statistics acts as a filter for what to further consider as the first arrival. (C++, by the way)

Yeah. People usually never put the raw data into the NN as is. Again, it depends on the measurements/observations. Time-series has just points. We'd like to reduce the dimensions first. And then give them to the NN.

Originally posted by Emily
Hmm... the human classifications are used to train the n.n., but building a mathematical model out of the human classification is, I think, a very very very complex task - if it were simple, detecting first arrival wouldn't be a problem. So I think clustering has to be used to detect possible first arrivals, and then those are inputted into the n.n., in place of a mathematical model built on the human classifications.

When I said "build a model" I didn't mean the human should do it himself. Given the expert classifications, you can run a suitable algorithm to build the parameters for you. You just need to give it the good features.

Tell me how it turns out :)
 
so when you work out the difference between the sd and the data set, you are essentially creating another data set. is it the sd of this second data set u want??
 
Status
Not open for further replies.
Back
Top Bottom