------------------------------------------------
         "Introduction to Endianness"
------------------------------------------------
  C/O :: arp of DynamicHell Development Team
------------------------------------------------
  http://dynamichell.org | irc.dynamichell.org
------------------------------------------------


Endianness is an issue that at some point every low-level programmer will
come across.  It describes the subject of data ordering and more specifically,
in computing, the arrangement of data in memory.  There are various types of
endianness: big-endian, little-endian and middle-endian.  It can refer to bit
ordering, or more commonly to byte ordering.

Thankfully not all computers are 80x86 based.  There are various other 
architectures--PowerPC, SPARC, Motorola 68000, MIPS and ARM to name a few--
they all have their own ideas on how to manipulate and store data (if they 
didn't there'd be little point of having multiple architectures.)  Each 
have their advantages and disadvantages, though the topic is beyond the scope of
this introduction.

Programming problems generally arise due to interaction of two seperate 
computers (of differing architectures) who do not use a common byte ordering
system.


Big-endian
==========

Let us take a look how big-endian byte ordering systems stores its data in 
memory.

Given the 32-bit hexadecimal integer value of 0x1917F4C0 a big-endian system 
will store the data as follows:

Low addr|___________________| High addr
        | 19 | 17 | F4 | C0 |
        |-------------------|

As you can see the largest byte (most significant byte) is stored first,
followed by the remaining three bytes in decreasing order.

Let us take a closer look at the data:


                  19            17           F4           C0
  	      0001  1001    0001  0111   1111  0100   1100  0000

The important thing to notice here is that although these architectures use 
big-endian byte ordering, they do not use big-endian bit ordering (some may,
though it is very uncommon.)  


Little-endian
=============

Now little-endian byte ordering.  For simplicity we will use the same 32-bit
integer 0x1917F4C0.  Litttle-endian machines will store the data as follows:


Low addr|___________________| High addr
        | C0 | F4 | 17 | 19 |
        |-------------------|


Let us look closely at the data:

                  C0            F4           17           19
  	      1100  0000    1111  0100   0001  0111   0001  1001   

Again, rather confusingly, although each byte is stored from the least-
significant byte, the bit endianness does not follow the pattern the byte 
ordering takes.  It is important to distinguish between byte-ordering 
endianness and bit-ordering endianness.  The latter issue is much less common,
though worthy of note.  Visually little-endian byte ordering with big-endian bit 
ordering makes sense.  This, however, is rarely the case.


Middle-endian
=============

Middle-endian machines are much less common than big-endian and little-endian 
machines.  However, they do exist.  Their behaviour differs between 
architecture.  Middle-endian machines are characterised by the mixture of 
little-endian and big-endian byte ordering depending on the data size;
much more caution must be taken by the programmer.


Problem and solution
====================

As stated earlier, one problem arises when two computers of differing 
architecures and byte-ordering systems try to communicate with each other.
This is  only a problem with data above octet size (if both machines have a common
bit endianness).

One example is the struct sockaddr_in--used to create a socket--which 
requires the member port to be in host byte order.  On little-endian machines 
this value must be converted in order to assign the port as expected.

For example:

struct sockaddr_in client;

client.port = 23;        /* Wrong on little-endian machines!*/
client.port = htons(23); /* The solution...                 */ 

Luckily there are other functions which can aide the programmer in similar
situations:

/*

#ifdef BSD
#include <arpa/inet.h>
#endif

#ifdef LINUX
#include <netinet/in.h>
#endif

	uint32_t htonl(uint32_t hostlong);

	Converts from unsigned int host (little-endian) byte order to
	(big-endian) network byte order.


	uint16_t htons(uint16_t hostshort);

	Converts from unsigned short int (little-endian) host byte order to
	(big-endian) network byte order.

	
	uint32_t ntohl(uint32_t netlong);

	Converts from unsigned int (big-endian) network byte order to (little-
	endian) host byte order.


	uint16_t ntohs(uint16_t netshort);

	Converts from unsigned short int (big-endian) network byte order to
	(little-endian) host byte order.
*/



Conclusion
==========
Endianness can refer to byte ordering as well as bit ordering, though byte 
ordering is much more common and generally what is discussed under the term
endianness. Little-endian bit ordering is very common amongst architectures,
unlike big-endian bit ordering, which is rare.  Endianness is something which 
must always be considered when working on projects that need to be portable,
and communicate with other machines of potentially differing endianness.  



Copright (c) 2006.  Alastair Poole.

Verbatim copying and distribution of this entire article are permitted
worldwide, without royalty, in any medium, provided this notice, and the
copyright notice, are preserved.
