the raw code

Status
Not open for further replies.

threadmark

In Runtime
Messages
134
Location
somewhere in this general direction.
As the title suggests I want to begin a discussion on the base line, code processing.this topic involves chatter from variable arrays to character conversion. Logics and advanced mathematics. Anything that computes code in a desired format. I request that contributed posts stick to some guidelines that i hope can adhere to the following rules.
1.try to input to this thread with experiences not Googled results.
2.Do not refer to applications when posting methods relevant to code processing.
3.try to explain the chosen mathematical methods for the desired process.
4.Do not discuss password cracking or code breaking.
5.Popular code topics include Google Yahoo search engine methods and crawling techniques .
To start the computation i will tell one nifty way to reference code without storing a line of code twice.
To start with lets say i have 1000000 lines of html code. Contained in this code is advertising like repeated articles and pages. I need to process every line of code storing code that hasn't been processed before. In one sweep taking no more than 10 minutes. Using mathematics and variables i have created a way. Before I give my answer away I want a person to try for the same outcome but with there techniques. Equipment must not exceed your local computer store and I will allow more than one computer even though its not required. Tm
 
I feel like I'm being given a school assignment - someone else's school assignment. I remember graduating college, so it definitely can't be mine.
 
"using mathematics and variables"? WOAH NO WAY! :p What else did you think everyone was going to use, pixie dust and sweet dreams?

So basically you want a routine (in what language?) that reads a text file with 1,000,000 lines of HTML code that stores new lines of code only once, and it must complete in 10 minutes or less? Lol, that's like a 5-line script.

read xyz.txt as array{
if x is not in newarray then append x to end of newarray
x++}
 
That I tried, but as the array populates with more text, lookups for the logic take to long. But of course you would have known that if you had tried it. Well my method is to convert lines to serial keys of about 5digits so instead of a lookup of lets say 80 character line I only need to match a 5 digit serial. Each line of code is converted the same minus tab and space and the lookup takes no time at all. but you dont use array you use one variable separated by brackets then a single lookup of a serial number.. If serial equivalent of current line is not contained in serial variable save to text variable.
 
Of course, given the possible length of the lines, reading the full line into the array or variable would be an intensive task. Clever solution hashing the lines.
And if you just use a single variable, that's still kinda an array I'd think. Not technically, but for all intents and purposes (it still holds a list of values, just not in the same format as an actual array).
 
Status
Not open for further replies.
Back
Top Bottom