Why are so many internet protocols text-based?

From what I have found, a very large amount of protocols that travel over the internet are “text-based” rather than binary. The protocols in question include, but are not limited to HTTP, SMTP, FTP (I think this one is all text-based?), WHOIS, IRC.

In fact, some of these protocols jump through some hoops whenever they want to transmit binary data.

Is there a reason behind this? Text-based protocols obviously have a bit of an overhead as they require sending more data to transmit the same amount of information (see example below). What benefits outweigh this?


By text-based, I mean most of the characters used in the protocol are between 0x20 (space) and 0x7E (~), with the occasional “special character” used for very special purposes, such as the newlines, null, ETX, and EOT. This is opposed to transmitting raw, binary data over the connection.

For instance, transmitting the integer 123456 as text would involve sending the string 123456 (represented in hex as 31 32 33 34 35 36), whereas the 32-bit binary value would be sent as (represented in hex) 0x0001E240 (and as you can see, “contains” the special null character.

Answer

When the world was younger, and computers weren’t all glorified PCs, word sizes varied (a DEC 2020 we had around here had 36 bit words), format of binary data was a contentious issue (big endian vs little endian, and even weirder orders of bits were reasonably common). There was little consensus on character size/encoding (ASCII, EBCDIC were the main contenders, our DEC had 5/6/7/8 bits/character encodings). ARPAnet (the Internet predecessor) was designed to connect machines of any description. The common denominator was (and still is) text. You could be reasonably certain that 7-bit encoded text wouldn’t get mangled by the underlying means to ship data around (until quite recently, sending email in some 8-bit encoding carried a guarantee that the recipient would get mutilated messages, serial lines were normally configured as 7-bit with one bit parity).

If you rummage around in e.g. the telnet or FTP protocol descriptions (the first Internet protocols, the network idea then was to connect remotely to a “supercomputer”, and shuffle files to and fro), you see that the connection includes negotiating lots of details we take as uniform,

Yes, binary would be (a bit) more efficient. But machines and memories (and also networks) have grown enormously, so the bit scrimping of yore is a thing of the past (mostly). And nobody in their right mind will suggest ripping out all existing protocols to replace them with binary ones. Besides, text protocols offer a very useful debugging technique. Today I never install the telnet server (better use the encrypted SSH protocol for remote connections), but have to telnet client handy to “talk” to some errant server to figure out snags. Today you’d probably use netcat or ncat for futzing around…

Attribution
Source : Link , Question Author : IQAndreas , Answer Author : vonbrand

Leave a Comment