difference of binary and text files

Status
Not open for further replies.

novice123

Beta member
Messages
4
Hi all,

hope somebody can help.

this is what I've heard . If you send a XML document, it can go through the firewall with out no problem. because it is a pure text file. but not a binary file.

what I'm trying to understand if you send a text file or a binary file it got to go through a medium such a wire. which means it goes through the medium as pulss meaning some kind of binary extractable format.

so whats the difference between a text file and binary file.

Thanks.
 
There's enough in that question to go on and on for hours. :) To put it simply, think of text files as ASCII files. Files that have standard numbers, letters and symbols (up to the 127-bit set. They are saved in formats that people can see with any simple text editor like notepad or your favorite simple language editor for C/C+/C++/Perl/et cetera. Binary files are files that go above the 127-bit set and look like garbage when opened in a simple editor. This is really a programming question and not a networking question.

If this confused you, sorry. This is an extremely tough question for me to answer through text. It's a complicated answer and you wont get it unless you read up on bits versus bytes, ASCII, a little hex and binary.

What you are stating is absolutely right. If I create a file that contains "XML code" (to use your example), I would be able to send it to someone in an email that has a binary file restriction but no text file restriction. This is because each character in XML is the lower 127 bit set. It contains no control characters that place it into the upper 127 bits (excluding the start 0 and end 0 which would make 128 bit sets).

If you would like to have an EASY answer that completely ignores ALL logic and history behind the answer then think of text files as formats like .php, .txt, .pl, .html, .cc, .c, .bat, .cpp, et cetera. Binary file formats are file formats like .exe, .wav, .gif, .mp3, et cetera.

The reason why its so hard to explain is because of the way that I try to illustrate the following example:
open up notepad and type "d" then save it as textfile.txt. This is an ascii file that is 8bits (1 byte) in size right? But the computer reads binary right? So, the "d" is actually read as "0110 0100". People get confused and think that the textfile.txt is a binary file, because it is saved onto the computer as 1s and 0s... they are correct in a "binary" computer format for saving it to a drive....but what the file actually contains is what we are talking about.... The computer saves it as a series of electronic differences that we, as humans, like to call 1s and 0s so that we can interpret it easier. They are hardly saving little 1s and 0s though.. just positive and negative charges to a magnet. I usually need to draw on a board for people to explain it. LOL Wow, i'm getting too deep and should just stop.

If you COMPLETELY ignore the idea that a computer saves any file to its drive as "binary" then its easier to explain. Because that will just confuse you when trying to understand the difference between what the file contains versus how it's saved.
 
Thank you very much for the explaination. Let me write what I understand now. So simply the text files do not contain instructions. So thats why it is called a text file, because it contains 0 to 127 representation of english alphebet and and other control characters.

Now I understand what you are saying and it works with the computers. I'm just trying to understand how it really works.

ok. When you take the conventional ASCII set, it uses 8 bits to represent 128 characters and control characters. But if you are using 8 bit set that means there are 256 combinations of paterns that can represent.
for an example if you take 0000111 thats the instruction to give a beep. What I did, wrote that patern on a text document and saved it with a .exe extension. And it actually works. that means with in 8 bit limmit you can give instructions to the computer. I understand for the 0000111 pattern there is no glyph to represent. thats why it is called a binary file.

So if you can represent chracters and instructions within 8 bit limit. So surely you can create a program within 8 bit limmit. And at the same time 0 to 127 there are glyphs to represent. So how the computer diffrentiate whether it is a text or binary file? Bingo I just got it at the end now so the computer checks every 8 bits of a file to check wether it falls between 0 to 127, which is text. and if it is there is no thret. Is that right?

Thank you very much again.
 
Exactly. Once you have a representation of the most significant digit (leading bit) then you are into the upper half of the 256 set (in which case are using non-ASCII or a mix)
So,
0000 0000 () -> 0111 1111 = ASCII and/or Text
1000 0000 -> 1111 1111 = Binary

Since you seem to understand it pretty well, i'll go a little further.... it may not only be text which is why using the term "text file" is irritating sometimes to me :) An "ASCII file" is more accurate due to the first 33 representations like (excluding the leading) 000 0000 NUL, 000 0001 SOH, 000 0010 STX and 000 0011 ETX.... and so on, you get the idea :) hehe. But yea, since ASCII's inception way back in the day, computers have been checking for control characters or symbols that are included in the upper 1/2 of the 255 set. that distinguishes a "binary" or what I like to call a "Control" file (using the term control with the scientific meaning).

It's been a while since I've been able to answer a question or just talk about how computers, in general, interpret data bit by bit. Most certainly my favorite topic ... especially when it comes to reading bits across a network in raw format and then a few minutes later finding out which laptop on a network of 50 nodes has Limewire running ... lol :) Sounds geeky, yea... I'm interested in learning "why" you want to know this. I find it very interesting that someone even cares to learn this stuff anymore. :)
 
I think I found the right man to ask all questions then :laughing:which bugs me most of the time. If I become pain in the back just tell me to stop.

so are we still using ASCII character set or UTF 8. how can you change the default character set, a computer is using. Where is the character set saved in the computer, and what's the proccess of interpriting bits to characters.

say if you got a binary file and you run it. Can I write what I understand?
when you run it the OS load the file in to the memory and start executing from the beginning of the file to end. So as a character set is there a specific set of instructions for a specific computer.

if there is where is it in the computer. I know it is a big area to explain like this. If you can direct me for some references that would be fantastic.

Thank you.
 
hehe, sorry about that. I've been pretty busy at work for the past few days. It's one of those weeks that everyone seems to go on holiday during and all the people left are either asking too many questions or are the people that just "float" at work. I call the slackers "floaters" because I'm not really sure what they do... I'm not so sure that they do either?!? hehehe. They are more than likely "Functional Analysts"... which irritates me. They are here because the "Business Analysts" and the "Technical People" apparently can't talk to each other. So, every time the "Business" needs something, the "Functional Analysts" step in to "translate" to the technical people about how something should "function" in a program... hehehe. You ever see a movie called Office Space? The dude that had "People skills" that basically did nothing.. hehe, yea, that's a Functional Analyst all over. They really just get in the way.... :( In my opinion, a functional analyst knows enough about technology to be dangerous in front of a computer in any other fashion other than reading and replying to email... It's pointless. Anyway, so yea, I've been busy with them unfortunately.

As to your questions. Yea, UTF8 has superimposed itself over top of ASCII and then added more. Basically, the first string of characters in UTF8 is ASCII. Unicode (UTF8) was ultimately needed because of 2 major reasons. These may or may not be the biggest reasons according to some book, but they are to me:

1) Writing a program or just a file that can cross-platform in English but also in any other language

2) A lot of character sets and other computer functions (like command line functions) started to bump heads. Basically, as ASCII was added onto to form larger sets, certain characters were also used in OS navigation.

That's why, in my opinion, Unicode will never go away. In saying that, ASCII will never go away either. Since ASCII is ultimately part of Unicode. (So is the Chinese character set, the Western Europe character set, et cetera)

Unicode is machine code driven (like ASCII was). So you really can't tell a computer.. , other than writing a program in something other than unicode, to not read Unicode. It will always read the first negative byte within code (say its 11000000) will tell the program how many bytes following it will be in the character set. So, the example shows that there will be 2 bytes that are to be used to translate the bits or bit set or bytes into a character. This is how it gets over 1 million characters rather than just 255 :)

As for writing in binary so that a computer will understand it, you more or less have to run a program these days that accepts assembly language. The reason for this is because unless you build it from scratch, a computer is already built to declare and interpret variables. So, unless you really want to just build it again (just like re-inventing the wheel :) ) then you would just use a program set to code in bianry (or some sort of assembly program). in the Unix world.. there's plenty of them!
 
Status
Not open for further replies.
Back
Top Bottom