Programming Challenge - Advanced Compression System

Status
Not open for further replies.

LightDark

Beta member
Messages
3
Greetings Everyone, I would just like to thank you in advance if you are reading this Thread. A little background information about this challenge: I've been developing the "template" for how this program should work and operate over the last 2 years. Now, as i am NOT a programmer, by any means, I will leave it up to the community at large to use the template i'm including in this thread in any way they desire. A working model of this template will have very large implications in the way data is stored and retrieved accross all disciplines.

On to the challenge!

Ok, first you must understand the basic principle behind this system.

To do that, you must convert all your thinking into Binary for a brief period.

Ok, now that we're all in the right headspace i will explain the very heart of the situation.

Let's say you had a big pile of decimal numbers, no pattern, no organization at all. Let's say we're talking about the first 8192 digits of PI. Now i'm not gonna post them all here as that would be rhetorical and they are easily found elsewhere. But i will show an example with the first 32 digits.

1415 9265 3589 7932 3846 2643 3832 7950

Now while you may be able to find patterns of some sort in this small sample, consider millions more digits that follow them as i explain the rest.

Taking the above, let's say i applied a 10-digit, binary KEY to the above.

9 8 7 6 5 4 3 2 1 0 would be the value places for each digit of the KEY.
0 0 0 0 0 0 0 0 0 0 would turn all the decimal digits in the sample to a binary 0
0 0 0 0 0 0 0 0 0 1 would turn all the decimal 0's in the sample to a binary 1, the rest to 0's
0 0 0 0 0 0 0 0 1 0 would turn all the decimal 1's in the sample into a binary 1, the rest to 0's

ETC, so on and so forth. Now, excluding 0 0 0 0 0 0 0 0 0 0, and 1 1 1 1 1 1 1 1 1 1, as they would make any sample all 0's or all 1's, which wouldn't be very useful. We could create, 1022 new combinations, of the sample. now, if we apply that to the first 8192 digits of PI, then we have created 1022 - 1024 byte samples, or nearly 1MB - 2048 bytes for the 2 ommited KEY's

Now, let's say we increased our possible variables to 40 MILLION digits of PI, taking 1022 samples for every 8192 digits, and shifting over 1 digits and repeating, til the very end.

This would mean, that you would have (39,991,1808 x 1022) "frames" of 1KB

39,991,1808 x 1022 = nearly 39 TB of data that would have to be stored.

Just the tip of the iceberg. Using the "frames" the goal is to create an Algorithm that would be able to correctly build 1KB of data using multiple frames stored in the system, using the best match method.

The Result will be a system that can create 1kb of data using only the 10 digit key, a 40 million digit database of PI (20MB stored on an average users computer/cell/dvd player etc) and the location within the database of where to go and how to build. NOW, the secret to this system is, once the system has determined how to build 1kb - it will never need to determine how to do it again. it will use that code to build that segment every time. So in essence, you will never need to store, the original 1kb every again, just the code to build it from the system.

The trick is, since we're limiting the system to only use 40 million possible comibination, the addresses to build the data will only be a few bytes length stored in Hexdeimal

10 bits for the key + up to 2625A00 in HEX (4 x 7 bits = 28 bits)

38 bits + modifiers, = 48 bits max per address build, meaning, if 1kb of data already has 8192 bits, then you could use up to 8192/48 = or 170 different FRAMES to build 1kb. the odds of being able to build 1kb of data using 170 different frames are astronomically good. In fact it's a 99.9999% probability that, along with using special build instructions per kb, you can make any combination that you wish.

The biggest part of this....and you'll all probably need to brace yourselves if you've read to this point already. Once you have all the build codes stored for each KB of data in a file, you can then use a compression program, such as winzip or gzip or whatever, to make a zip file, and then repeat the entire process over again.

So a little math:

1GB file, each KB in the file is stored using ~100 frames. 4800 bits per 8192, or roughly 50%. then they are compressed, to 101% capacity, or 52% of the original size, then 50%'d again, now we're down to 250MB of codes etc, etc etc. I estimate, that if the system is perfected, it will be possible to compress any file (providing you assume that the comrpession doesn't happen on a home based computer, and is instead happening at a facility designed specifically to build KB's of data for this type of compression) to no less than 6kb. and then reverse the process on a home based computer using a very small intricate database comprising only the first 40 million digits of PI and the code to decompress any zip 'midway" points, and the ability to connect to the main facility, to allow compression of files already owned by most users that they wish to preserve into this format.

Thank you very much if you decided to read all of this. Any questions or comments, feel free to post.
 
I have to say that I don't fully understand. Are you saying that we should use a 10 bit set of binary switches to represent the 10 decimal values? If so then that is not really a compression at all. Or maybe you are suggesting a scheme where a 10 bit code is used to represent a known 1KB value and decompression would involve looking up the bit code and converting it to the 1KB value? Unfortunately this has the downfall whereby you could only use your 10 bit code to lookup 1024 distinct 1KB values while there are 2^8192 distinct values that 1KB can hold. Maybe you sould provide some psudocode as to how this would work.
 
The Program itself isn't very complicated to understand, the application of the program may be a real mind tease tho.

The program works like this:

Step 1:

Load 40 million digits of PI from file. (4 bits per digits = 160 million bits)


Step 2:

Take the first 8192 digits of the load, and apply 1024 keys to it

Key format (a 1 turns that number found in the sample intoa binary 1, and 0 turns the number into a binary 0)

98 7654 3210
00 0000 0001 (turns 1-9 digits into binary 0's, 0 digits into binary 1's)
00 0000 0010 (turns 0, 2-9 digits into binary 0's, 1 digits into binary 1's)
00 0000 0011 (turns 3-9 digits into bianry 0's, 0-1 digits into binary 1's)
00 0000 0100 (turns 0-1, 3-9 digits into binary 0's, 2 digits into binary 1's)

etc

this will effectively create 8192 bits of binary data, per sample taken from the 40 millino digits of PI.

1024 x 40,000,000 = 40,960,000,000 combinations, 2048 combinations will be duplicates that make all the digits 1's or 0's, so they will be scrapped

This means, we now have 40,960,000,000 templates, that we can now use to make any combination of 1kb of data we wish.

Since the sample size (8192 digits) is constant, the only real information needed to address any of the "building" templates would be the location to start sampling within 40 million digits, and the 10 bit key to transform the sample into the binary state of the template.

00 00 00 00 - 02 62 5A 00

would be the range for the possible addresses to go into PI, (28 + 10 = 38 bits per address)

Using these addresses, and the patterns of data they create, we can assume the outcome of any combination would be a maximum of 8192 bits.

8192/64 (we'll double the side of the address, to make room for modifiers that will help the program build the correct segment of data) That means we can use up to 148 different building blocks to make a tailored set of 1kb of data.

Now, it's not to say the building blocks to build the data would require all 148 templates possible, but even if they did, if the entropy of the original data is greater than 7 bits per byte, then odds are the templates used for building that segment of 1kb will have much less entropy as they do not directly relate to the original input trying to be matched.

Given the fact that there are 2^8192 possiblities of 1kb, it is best to remember there are only about 5% of those possiblities that make any truely useful data to the average computer user. And a majority of that data is already heavily compressed (music, video, pictures) and of an extremely high entropy.

In conclusion to this post, and hopefully more questions will come - I will make the assumption that if this program is made viable through extensive research it may be possible to compress any type of already highly compressed data through re-representation and the high availablity of purely random strings of data (PI in this case) that can be altered freely from the base 10 system to the base 2 system and manipulated in such a way as to reconstruct viable data with no heavy mathematical calculations, from a seeminly "blank space".
 
Status
Not open for further replies.
Back
Top Bottom