Soldier Web Knowledge Base: 2008

Saturday, May 17, 2008

History of Computer Programming, Part III: Second Generation Languages

The fact is, most people don't enjoy thinking and writing in binary. Plus, its verbose and complex as all get out--especially when your trying to move beyond basic counting into an array of functions. Who wants to memorize the hundreds of machine codes corresponding to every instruction? Not me.

Enter 2nd Generation Computer Programming--aka assembly languages. After WWII, as folks were playing around with circuit board computing logic (sometimes aware, sometimes unaware of Shannon's work), they realized that you could set up a family of processors to enable programmer friendly coding. These machines could take a set of human-friendly instructions (such as "ADD, or TOTAL") and translate--assemble--them into machine level code. Sure, this adds a step and slows the process down, but this cost is nicely outweighed by the mnemonic joy programming with human words.

But let's not get ahead of ourselves, there's only so much you can do with 2nd gen computer languages. The language scope is directly defined by the physical shape of the processing hardware you're writing for. Thus we call assembly language a "low level" programming language; it provides minimal (a single step of) abstraction from machine level instructions--mapping in a one-to-one manner your instruction words into binary.

Some Resources

http://www.networkdictionary.com/software/numbers.php

http://en.wikipedia.org/wiki/First-generation_programming_language

Tuesday, April 22, 2008

History of Computer Programming - Part 2: Machine Level Languages

In Z Beginning...

As we saw, the beginning of the industrial age brought with it the kind of complexity that would reduce human calculators to tears. We're just not made to do complex counting and equations, you know? Well, that's just what Konrad Zuse of Germany thought: Why should we have to do equations that machines can do? So he went and built the Z1, a first-generation computer featuring mechanical memory, arithmetic units and a binary based language system (in both input , processing and output).

Basically, the Z series machines processed input via punched tape. As long as the user submitted valid arithmetic in binary form, the computer could handle it. As such, these machines are considered the first freely programmable computer based on binary floating point numbers and a binary switching system.

This "first generation" method of interacting with machines is called coding in a "machine level" language, since the input format exactly matches the computer's processing format.

A Bit of Theory
If your going to write a masters thesis, try to make it as groundbreaking as Claude Elwood Shannon's A Symbolic Analysis of Relay and Switching Circuits, (MIT, 1937). This paper made Shannon pretty much the father of electronic digital computation. In the first part, Shannon uses concepts from boolean algebra and binary arithmetic to demonstrate how to simplify the arrangement of electromechanical relays used in telephone routing switches. Then he proceeds to show the real value of the relationship between binary math and these electromagnetic structures: look! since we can represent boolean algebra functions with the layout of electronic circuits, then we can use the arrangements of electronic relays to solve complex math problems!

Connecting the Dots
So far, both Zuse and Shannon represent hardware innovators, and as such, their machines only work when you speak in "their" language: the language of 1's and 0's (or left's and right's, or on's and off's, etc.). This language is effective and fast--but it is also hard to write and debug. Good thing there's been some software innovations to match and complement these hardware based achievements; otherwise I'd be writing and you'd be reading only 1's and 0's!

Resources
Zuse's son documents his father's career
Wikipedia on Shannon
Wikipedia on the History of Computers

Monday, April 21, 2008

History of Computer Programming, Part 1

Humans have been using machines to get things done for a long time. However, they haven't been able to program those buggers to do those things automatically until relatively recently.

Though I haven't done a tremendous amount of reading on the subject, it appears most computer historians agree the first legit programmed machine was Joseph Marie Jacquard's textile loom, which would weave according to templates defined by a series of punched cards.

Somewhat later, Herman Hollerith struck upon the real value of punched cards for basic computing in order to solve a major national problem. See, by 1980 the United States was growing at such a rate that the constitutional mandate to record and process a national census every ten years was approaching the line of impossibility. Since the results had to be counted and processed by hand, the interval between census and result was exceeding the ten year limit! Quite the problem.

Well one day, Hollerith is sitting on the train, watching the conductors issue tickets. He sees how they are recording basic information about each traveler by hole-punching the ticket. Then, all of the sudden, "punch!" went the conductor's machine and "ding!" went Hollerith's brain: ah ha!

Hollerith realized that, given the proper material and the right electronic apparatus, the information recorded on a punch card could be "read" and processed by a machine. This would work well with counting operations especially. Hmmm, what major national crisis needed some quick counting?

So Hollerith constructed a machine for counting people and their basic attributes. A insulated card would be placed over a series of mercury pools corresponding to all the possible punch holes. Then a series of spring loaded wires (placed similarly) would be brought down over the card. Add some electricity, and bam!, holes = electrical circuits which power a specific counter to advance.

So began Hollerith's corporate counting endeavors, which formed into Computing Tabulating Recording Corporation (CTR), and which, in 1924 became IBM (International Business Machine Corporation).

What does Hollerith's counting machine have to do with computer languages? Well, as we'll see in the next article, Hollerith built a machine that counted according to a very very basic assembly language, one that "spoke in" the exact literal language of the machine's processes themselves.

Programming Track Beginnings

Web development requires more than a simple knowledge of networking facts and protocols--it calls for a knowledge of software development. That's why I'm kicking off a branch of research into understanding the origin and development of computer programming!

Here are some initial questions to get this going:

1) What do I need to understand about computer hardware in order to have the concept of software and software development cease to be a black box?
2) Why did software development emerge when and where it did?
3) How do we begin to make sense of the generation and development of programming languages?
4) What are the different kinds of programming languages?
5) How do we understand the relationship between programming languages and the web?
6) What are other good questions to ask?

Here are some initial resources to get this going:

A visual time-line representing the history of programming languages to the end of 2007
A companion link-list to descriptions of all the languages in the time-line

Wednesday, April 16, 2008

A Few Gaps Filled

So far the we've just been watching the train wiz by--trying to take as much in as possible. Time to review the tape just a bit.

Transport Layers

TCP isn't the only transport layer, that honor is shared with UDP (User Datagram Protocol). Both contain information concerning sending and destination ports, they differ in how they handle packet sequencing. Where TCP insures packets are processed in the order sent, UDP just sends them on to the application in the order received--no insurance, just pure raw speed. UDP works well when you're sending/receiving a single packet or when you're streaming audio or video for recreational use.

Ports

Remember the mail analogy? Well, if the IP address is like the address to your building, a port number is your name. All incoming mail has your name on it because all the mail you sent out has your name on the top left, so those catalog companies know whom to send stuff to.

The concept of port is borrowed from the physical ports on your machine--those interfaces that allow you to connect machine to machine with various wires and such. But port numbers are simply that, numbers that are used to identify the sending and receiving nodes of internet applications.

Port numbers range from 65,535 to 0 since they are represented by 16 bits. Numbers 0-1023 are restricted for common internet application use (HTTP, FTP, ETC).

Servers generally receive HTTP requests on port number 80 (standard just to say port 80). Client applications, on the other hand, define a new and unique port for each data element coming in. Think of it as an insanely fast moving deli line at your local grocery store, every part of the request gets its own number.

The Data Link Layer

What's under IP? Good question: it's the data link layer. Let's not get into it much here, just know that every physical device on your network is assigned a Media Access Control (MAC) address to identify itself on a network. The MAC address is usually burned into the machine during manufacturing, like a fingerprint or something. OK, there it is, now you know.

Sockets

A "socket" is a class of software that is used to "bind" a connection between two applications into a persistent connection. For example, it is the deal that is sealed after TCPs three way handshake: client reaches out to shake the hand of server, server reaches back to shake the hand of client, client and server shake the shake and establish a socket connection that enables their a working and productive relationship. How nice.

Resources
From java.sun.com
RandomGuy James again
Socket stuff from java.sun.com
About.com on Sockets

Monday, April 14, 2008

HTTP!

HTTP is the protocol which enables your web applications to interact with resources located anywhere on the internet. On the most basic level, the protocol takes the form of request/response. Here, a client program establishes a connection with a server and begins making requests. Likewise the server, once a connection has been established makes a series of responses.

Upon closer inspection, we see that this response/request interaction is defined by one of seven core request methods available to the client. These methods are as follows:

GET
POST
PUT
DELETE
HEAD
OPTIONS
TRACE

Each of these tokes defines the action the client wants the server to make on the resource specified in the URI (Universal Resource Identifier, which is the combination of URLs (Locators) and URNs (Names)--ok, they're also known as WWW addresses).

Think of the request protocol like a grumpy caveman: He points at a given resource (via URI) and makes his demand: "GET!" then "PUT!" then "DELETE!" You get the picture.

The first four methods in the list might be seen as "active use" methods, since they define active interaction with a given resource. The last three options are mostly reserved for "test use" and are used mostly to test the attributes of a given server or communication state. We're not going to worry about the test use methods for now.

The Four Active Methods (Careful, I made up the phrase "active method")

GET is the most common of the HTTP methods, and it pretty much works as advertised: it requests that the server fetch and send over a given entity (head+body) specified in the URI. Every server is required to respond to GET. GET is designed to be a "safe" operation since it does not change the resource and it does not perform any legally binding operations (such as seal an agreement)--it merely supposes retrieval of information.

POST asks that an enclosed entity be worked into the structure of the identified resource. This action enables network-based annotations, online posts, form submissions and mediated database updates. As you may imagine, it is considered an "unsafe" operation and servers are not required to support POST requests.

PUT asks that a given resource be stored under the requested URI. This could be an altogether new resource or it could replace the existing resource. Whereas the POST method uses a URI to specify a handling resource which contains the processing logic to work the new resource into the existing body, the PUT uses the URI to refer to the resource it is supplying. (unsafe/not required)

DELETE asks that a server delete the resource identified in a URI. (unsafe/not required)

Resources:

Hypertext Transfer Protocol -- HTTP/1.1 RFC 2068
WC3 Article on Methods, URIs and Safe/Unsafe Interactions

Monday, March 31, 2008

TCP/IP Layers

In the last TCP/IP overview, we focused on getting an absolute basic understanding of the way information is transmitted from and to machines connected to the internet. Now we're going to look at some of the higher layers of network protocol. If we stay with the basic Mail analogy, we're now moving from the [disassembling, packing, addressing, shipping, routing, receiving, unpacking and assembling] set of standards to standards which apply to the nature of our packaged objects.

The only problem with this model of understanding is that there really isn't any "wrapping" going on in the various levels of protocol encapsulation. IP and TCP headers are more like stamps and stickers than the package material itself. In the world of data transmission, there is no package itself except the data itself, sent as a burst of bytes, a sequence of zeros and ones.

Anyway, right now we're not so much interested in the TCP/IP stamps (headers) as we are on the application-level data that forms the tail of our packet of transmitted bytes.

Like TCP/IP, these higher level layers in our data-stream are protocols, standards by which computers communicate and interpret what's going in and out of them via the network(s). Let's just run through the most interesting of these higher layers and see if we have any energy to talk about them individually. Probably not...

HTTP: Hyper Text Transfer Protocol; this is the obvious standard protocol governing the dynamics of transmission across the world wide web
HTTPS: Secure HTTP
SSL - Secure Sockets Layer: takes advantage of encryption technologies to ensure safe transmission of data across a network
SMTP - Simple Mail Transfer Protocol: enables you to send email
MIME - Multi-purpose Internet Mail Extensions: Originally developed for multimedia email attachments, now standardizes the transmission of multimedia anywhere on WWW
POP - Post Office Protocol: Used for downloading mail onto your machine
IMAP - Internet Mail Access Protocol: Used to store and retrieve email
FTP - File Transfer Protocol: Enables file transfer from computer to computer

Like I imagined, I don't care to try to elaborate on each one of these protocols at this point in time. Each one may well take up an summary article at a later date. Sounds good to me.

The main thing is to realize that these protocols are executed and then parsed by higher level applications like web browsers, mail clients and ftp clients (on the client side) and the developed framework on the server-side. They form all the access and interface points that make up the fabric of web technologies and UI.

Resources
W3Schools

Internet Basics

The following is an adaptation of an overview of the internet I wrote for a class, may be somewhat helpful

WHAT IS THE INTERNET?

The Internet is an ever-growing network of "computers of all stripes--mainframes, minicomputers, powerful servers, the desktop PC, and any number of mobile devices" (Battelle, 6-7)--connected via physical and wireless infrastructures. The Internet supports the World Wide Web, a vast browser-accessible system of inter-linked websites.

A BRIEF HISTORY

The earliest forms of wide-area networks and the Internet emerged in the late 1960's; this early network connected the research facilities at MIT and UCLA. The first email application was developed in 1972 and quickly became the most popular Internet application. Throughout the 1970's and 80's, the Internet quickly grew as an "open architecture network environment" (ISOC REF) where computers and networks of computers could be arbitrarily added by virtually anyone at any time.

In 1989, Tim Berners-Lee invented a program called WorldWideWeb which created documents in hypertext to be stored on a server and accessed on the Internet through a web browser (which renders hypertext into electronic "pages"). Here, each web page could be linked to any other web page by means of a hyperlink (typically rendered as blue underlined text). The phrase "World Wide Web" quickly became synonymous with the growing system of linked hypertext pages that began to emerge in the early 1990's.

In 1994, Netscape released its 1.0 version of the first highly popular web browser, Netscape Navigator. In 1995, Microsoft launched its first version of Internet Explorer. The ensuing "browser war" between Microsoft and Netscape effectively brought the Internet into real public consciousness for the first time. Before long, millions of people were signing up with an internet service provider (or ISP) to gain access to this "new" phenomenon of the Internet's World Wide Web.

The massive and ongoing expansion of the Web quickly created a need for online applications that could "search" its contents for specific kinds of pages. The first search engine designed for the Web was the WWW Wanderer, created by Matthew Gray (at MIT) in 1993. By 1995, as the public began pouring onto the Web, several search engines became popular; these included AltaVista, Lycos and Excite. Currently, Google clearly dominates the world of search, controlling 51% of the global search market (Yahoo and MSN are the closest competitors, at 24% and 13% respectively).

During the past ten years, the World Wide Web has been bolstered by the development of many new web applications which allow individuals to upload, share and store information through and on the web. These applications include:

• Blogging applications such as Blogger and WordPress

• Instant Messaging (or IM) applications such as AIM and Google Talk

• Social networking applications such as MySpace

• Music-downloading applications such as iTunes and Napster

• Video-sharing applications such as YouTube

Finally, what might the future of the Internet look like? A few things seem reasonably certain. First, the technology that makes up the Internet will continue to drive toward universality: free-wireless access will become standard for metropolitan areas around the globe, every electronic device will be tied into the Web, and new innovations will give us devices and device systems which will weave ever more deeply into the fabric of our daily lives. Second, the Web will continue to become the global marketplace. The search engine business is already pointed in that direction. For example, Google "would like to provide a platform that mediates supply and demand for pretty much the entire world economy" (Battelle, 247). Finally, it appears that the need for the "personal computer" will diminish as connectivity and remote storage capabilities rapidly increase in scope. PC's will be replaced by extremely cheap (if not free) connection consoles. In sum, the World Wide Web will increasingly be everywhere, virtually connecting eveyone to everyone and everything.

The Very Base: TCP/IP

The internet is basically the grand physical structure that connects all sorts of devices together into a grand network of networks, TCP/IP is the magic that makes makes the fact of computer-to-computer connection possible. Trying to imagine internet-based communication without TCP/IP is like trying to imagine sending a Wii to your grandmother in California without a single piece of information on the postal package.

TCP/IP stands for Transmission Control Protocol / Internet Protocol. In the internet protocol suite, TCP/IP represents the standards adopted by those smart internet engineers to get computers sending, routing and assembling data across an incredibly complex network in a reliable and consistent way. Think of it as the computer equivalent of the rules governing the physical transmission of goods via the USPS (you have to have a stamp, you have to have an address in a certain form, packages have to be a certain size, etc), except computers don't have minds, so the protocols exist as a standard set of applications and interfaces...

The Address: IP
IP is the most basic internet protocol (a "network protocol"), and it enables your computer to locate, send to and receive from other machines on the internet. How is this done, exactly? Well, from the transmission side, the an IP header is the last thing added to a packet (see TCP below), and this header includes the most basic addressing information necessary for data transmission across a network: the destination address of the machine to which you want to send data, and the address of your machine. Unlike TCP, IP does not check if a real connection exists between machines, and it does not include support for breaking down and serializing/ordering data packets. IP simply supports the most basic components of data navigation across a network. IP data is primarily interpreted by the IP router (see below).

Address verification/disassembly and assembly: TCP
The higher level transmission issues are handled by TCP, a "transport protocol". TCP includes baked-in support for establishing a real connection between applications on different machines (a three way handshake that initiates a "full duplex" communication between computers until it is terminated by either party). It also handles the complexities involved with breaking large data sets into small packets of data and sending them across a variegated network where the packets will likely not be received in the order sent. Thus, the TCP header will include information pertaining to the real connection between machines as well as information about individual packets. The TCP information is processed by applications on your computer and on the web server.

TCP also includes information about the sending and receiving ports involved in the transmission. More on ports at some other point perhaps.

Packet Routing: IP Routers
Like the USPS with its processing centers , you're going to have to have some kind of a third party involved in handling the packets sent from one machine to another; and that's the job of an IP Router (or network of routers) to which packages are actually sent and from which packages are finally received. Again like the USPS, routers identify sending and receiving machines by means of addresses. IP addresses consist of four numbers, separated by periods, each ranging from 0 to 255 (since each number must be represented in a single computer byte). On the world wide web (to be covered in another article), IP addresses are represented by domain names which are registered and actively translated into IP addresses by a Domain Name Server (DNS) process.

That's the absolute basics of the whole thing. Next time we'll look at the nature of the things you can put "in" (or on) those packages which is defined by further protocols.

Resources
The W3Schools Resource
A guy who seems to know a lot about this

Soldier Web Knowledge Base