1 Kagak

Socket Programming Assignment 4 Http Web Proxy Server

COS-461 Assignments: HTTP Proxy


Assignment 2: HTTP Proxy

Overview

In this assignment, you will implement a simple web proxy that passes requests and data between a web client and a web server. This will give you a chance to get to know one of the most popular application protocols on the Internet- the Hypertext Transfer Protocol (HTTP)v. 1.0- and give you an introduction to the Berkeley sockets API. When you're done with the assignment, you should be able to configure your web browser to use your personal proxy server as a web proxy.

Introduction: The Hypertext Transfer Protocol

The Hypertext Transfer Protocol or (HTTP) is the protocol used for communication on this web. That is, it is the protocol which defines how your web browser requests resources from a web server and how the server responds. For simplicity, in this assignment we will be dealing only with version 1.0 of the HTTP protocol, defined in detail in RFC 1945. You should read through this RFC and refer back to it when deciding on the behavior of your proxy.

HTTP communications happen in the form of transactions, a transaction consists of a client sending a request to a server and then reading the response. Request and response messages share a common basic format:

  • An initial line (a request or response line, as defined below)
  • Zero or more header lines
  • A blank line (CRLF)
  • An optional message body.


For most common HTTP transactions, the protocol boils down to a relatively simple series of steps (important sections of RFC 1945 are in parenthesis):

  1. A client creates a connection to the server.
  2. The client issues a request by sending a line of text to the server. This request line consists of a HTTP method (most often GET, but POST, PUT, and others are possible), a request URI (like a URL), and the protocol version that the client wants to use (HTTP/1.0). The message body of the initial request is typically empty. (5.1-5.2, 8.1-8.3, 10, D.1)
  3. The server sends a response message, with its initial line consisting of a status line, indicating if the request was successful. The status line consists of the HTTP version (HTTP/1.0), a response status code (a numerical value that indicates whether or not the request was completed successfully), and a reason phrase, an English-language message providing description of the status code. Just as with the the request message, there can be as many or as few header fields in the response as the server wants to return. Following the CRLF field separator, the message body contains the data requested by the client in the event of a successful request. (6.1-6.2, 9.1-9.5, 10)
  4. Once the server has returned the response to the client, it closes the connection.

It's fairly easy to see this process in action without using a web browser. From a Unix prompt, type:

This opens a TCP connection to the server at www.yahoo.com listening on port 80- the default HTTP port. You should see something like this:

Trying 69.147.125.65... Connected to any-fp.wa1.b.yahoo.com. Escape character is '^]'.

type the following:

and hit enter twice. You should see something like the following:

HTTP/1.0 200 OK Date: Tue, 16 Feb 2010 19:21:24 GMT (More HTTP headers...) Content-Type: text/html; charset=utf-8 <html><head> <title>Yahoo!</title> (More HTML follows)

There may be some additional pieces of header information as well- setting cookies, instructions to the browser or proxy on caching behavior, etc. What you are seeing is exactly what your web browser sees when it goes to the Yahoo home page: the HTTP status line, the header fields, and finally the HTTP message body- consisting of the HTML that your browser interprets to create a web page. You may notice here that the server responds with HTTP 1.1 even though you requested 1.0. Some web servers refuse to serve HTTP 1.0 content.

HTTP Proxies

Ordinarily, HTTP is a client-server protocol. The client (usually your web browser) communicates directly with the server (the web server software). However, in some circumstances it may be useful to introduce an intermediate entity called a proxy. Conceptually, the proxy sits between the client and the server. In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client).

Why use a proxy? There are a few possible reasons:

  • Performance: By saving a copy of the pages that it fetches, a proxy can reduce the need to create connections to remote servers. This can reduce the overall delay involved in retrieving a page, particularly if a server is remote or under heavy load.
  • Content Filtering and Transformation: While in the simplest case the proxy merely fetches a resource without inspecting it, there is nothing that says that a proxy is limited to blindly fetching and serving files. The proxy can inspect the requested URL and selectively block access to certain domains, reformat web pages (for instances, by stripping out images to make a page easier to display on a handheld or other limited-resource client), or perform other transformations and filtering.
  • Privacy: Normally, web servers log all incoming requests for resources. This information typically includes at least the IP address of the client, the browser or other client program that they are using (called the User-Agent), the date and time, and the requested file. If a client does not wish to have this personally identifiable information recorded, routing HTTP requests through a proxy is one solution. All requests coming from clients using the same proxy appear to come from the IP address and User-Agent of the proxy itself, rather than the individual clients. If a number of clients use the same proxy (say, an entire business or university), it becomes much harder to link a particular HTTP transaction to a single computer or individual.

Links:

  • RFC 1945 The Hypertext Transfer Protocol, version 1.0

Assignment Details

The Basics

Your first task is to build a basic web proxy capable of accepting HTTP requests, forwarding requests to remote (origin) servers, and returning response data to a client. The proxy does NOT need to handle concurrent requests, i.e. no need for threaded, forked, or event-based non-blocking operation. Rather, the proxy should handle requests sequentially. You will only be responsible for implementing the GET method. All other request methods received by the proxy should elicit a "Not Implemented" (501) error (see RFC 1945 section 9.5 - Server Error).

This assignment can be completed in either C or C++. It should compile and run (using g++) without errors or warnings from the FC 010 cluster, producing a binary called that takes as its first argument a port to listen from. Don't use a hard-coded port number.

You shouldn't assume that your server will be running on a particular IP address, or that clients will be coming from a pre-determined IP.

Listening

When your proxy starts, the first thing that it will need to do is establish a socket connection that it can use to listen for incoming connections. Your proxy should listen on the port specified from the command line and wait for incoming client connections. Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request. Specifically, the proxy should ensure that the request contains a valid request line:

<METHOD> <URL or PATH> <HTTP VERSION> And a Host header, if the specified resource is a PATH: Host: <HOSTNAME> All other headers just need to be properly formatted: <HEADER NAME>: <HEADER VALUE> An invalid request from the client should be answered with an appropriate error code, i.e. "Bad Request" (400) or "Not Implemented" (501) for valid HTTP methods other than GET. See the note on network programming for more guidelines on how to handle real world clients and semi-valid requests.

Parsing the URL

Once the proxy sees a valid HTTP request, it will need to parse the requested URL. The proxy needs at most three pieces of information: the requested host and port, and the requested path. See the manual page for more info. You will need to parse the URL (absolute or relative) specified in the request line. Note that since a relative URL request, i.e. () does not include a hostname it must include the Host header in addition to the standard request line. Otherwise the proxy will not know where to retrieve the original resource. If the hostname, indicated in either the absolute URL or in the Host header, does not have a port specified, use the default HTTP port 80.

Getting Data from the Remote Server

Once the proxy has parsed the URL, it can make a connection to the requested host (using the appropriate remote port, or the default of 80 if none is specified) and send the HTTP request for the appropriate resource. The proxy should always send the request in the relative URL + Host header format regardless of how the request was received from the client:

Accept from client:

GET http://www.princeton.edu/ HTTP/1.0 or GET / HTTP/1.0 Host: www.princeton.edu Send to remote server: GET / HTTP/1.0 Host: www.princeton.edu (Additional client specified headers, if any...)

Returning Data to the Client

After the response from the remote server is received, the proxy should send the response message (as-is) to the client via the appropriate socket. Once the transaction is complete, the proxy should close the connection to the client. Note: the proxy should terminate the connection to the remote server once the response has been fully received. For HTTP 1.0, the remote server will terminate the connection once the transaction is complete.

Testing Your Proxy

Run your client with the following command:

, where is the port number that the proxy should listen on. As a basic test of functionality, try requesting a page using telnet:

telnet localhost <port> Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. GET http://www.google.com/ HTTP/1.0

If your proxy is working correctly, the headers and HTML of the Google homepage should be displayed on your terminal screen. Notice here that we request the absolute URL () instead of just the relative URL (). Again, your proxy should support both of these formats from the client, and only send the relative URL format along with a Host header. A good sanity check of proxy behavior would be to compare the HTTP response (headers and body) obtained via your proxy with the response from a direct telnet connection to the remote server.

For a slightly more complex test, you can configure your web browser to use your proxy server as its web proxy. See the section below for details.

Configuring a Web Browser to Use a Proxy

A Caveat

If you write a single-threaded proxy server, you will probably see some problems when you use your proxy with a standard web browser. Because a web browser like Firefox or IE issues multiple HTTP requests for each URL you request (for instance, to download images and other embedded content), a single-threaded proxy will likely miss some requests, resulting in missing images or other minor errors. That's OK. You are not required to use threading in this assignment. As long as your proxy works correctly for a simple HTML document (like, for instance, this assignment page) and follows the RFC, you can still receive all the points for this assignment.

Firefox

Version 3.x:

  1. Select Tools->Options (or Edit->Preferences) from the menu.
  2. Click on the 'Advanced' icon in the Options dialog.
  3. Select the 'Network' tab, and click on 'Settings' in the 'Connections' area.
  4. Select 'Manual Proxy Configuration' from the options available. In the boxes, enter the hostname and port where proxy program is running.

Earlier Versions:

  1. Upgrade your browser. You're vulnerable to security threats.

To stop using the proxy server, select 'No Proxy' in the connection settings dialog.

Configuring Firefox to use HTTP/1.0

Because Firefox defaults to using HTTP/1.1 and your proxy speaks HTTP/1.0, there are a couple of minor changes that need to be made to Firefox's configuration. Fortunately, Firefox is smart enough to know when it is connecting through a proxy, and has a few special configuration keys that can be used to tweak the browser's behavior.

  1. Type 'about:config' in the title bar.
  2. In the search/filter bar, type 'network.http.proxy'
  3. You should see three keys: , , and .
  4. Set to false. Set to 1.0. Make sure that is set to false.

Internet Explorer

Take a look at this page for complete instructions on enabling a proxy for various versions of Internet Explorer.

You should also do the following to make Internet Explorer work in a HTTP 1.0 compatible mode with your proxy:

  1. Under Internet Options, select the 'Advanced' tab.
  2. Scroll down to HTTP 1.1 Settings. Uncheck 'Use HTTP 1.1 through proxy connections'.

Socket Programming

In order to build your proxy you will need to learn and become comfortable programming sockets. The Berkeley sockets library is the standard method of creating network systems on Unix. There are a number of functions that you will need to use for this assignment:

  • Parsing addresses:
    • inet_addr
      Convert a dotted quad IP address (such as 36.56.0.150) into a 32-bit address.
      gethostbyname
      Convert a hostname (such as argus.stanford.edu) into a 32-bit address.
      getservbyname
      Find the port number associated with a particular service, such as FTP.
  • Setting up a connection:
    • socket
      Get a descriptor to a socket of the given type
      connect
      Connect to a peer on a given socket
      getsockname
      Get the local address of a socket
  • Creating a server socket:
    • bind
      Assign an address to a socket
      listen
      Tell a socket to listen for incoming connections
      accept
      Accept an incoming connection
  • Communicating over the connection:
    • read/write
      Read and write data to a socket descriptor
      htons, htonl / ntohs , ntohl
      Convert between host and network byte orders (and vice versa) for 16 and 32-bit values

You can find the details of these functions in the Unix pages (most of them are in section 2) and in the Stevens Unix Network Programming book, particularly chapters 3 and 4. Other sections you may want to browse include the client-server example system in chapter 5 (you will need to write both client and server code for this assignment) and the name and address conversion functions in chapter 9.


Links:

Grading

You should submit your completed proxy by the date posted on the course website to Blackboard. You will need to submit a tarball file containing the following:

  • All of the source code for your proxy
  • A Makefile that builds your proxy
  • A README file describing your code and the design decisions that you made.

Your tarball should be named where is your username. The sample Makefile in the skeleton zip file we provide will make this tarball for you with the command.

Your proxy will be graded out of ten points, with the following criteria:

  1. When running on your assignment, it should compile without errors or warnings on the FC 010 cluster machines and produce a binary named . The first command line argument should be the port that the proxy will listen from.
  2. Your proxy should run silently- any status messages or diagnostic output should be off by default.
  3. You can complete the assignment in either C or C++.
  4. Your proxy should work with both Firefox and Internet Explorer.
  5. We'll first check that your proxy works correctly with a small number of major web pages, using the same script that we've given you to test your proxy. If your proxy passes all of these 'public' tests, you will get 6 of the possible points.
  6. We'll then check a number of additional URLs and transactions that you will not know in advance. If your proxy passes all of these tests, you get 2 additional points. These tests will check the overall robustness of your proxy, and how you handle certain edge cases. This may include sending your proxy incorrectly formed HTTP requests, large transfers, etc.
  7. Well written (good abstraction, error checking, readability) and well commented code will get 2 additional points, for a total of 10.
  8. The first student to submit a proxy that scores a perfect 10 will win a prize!

There will also be some sort of prize for the best extension to the proxy. Adding an extension will not change your grade. Take a look below for some hints about possible extensions that you can add to the proxy.

As mentioned above you are not required to implement a multi-threaded proxy for this assignment. If you write a single-threaded client, you may see errors when using your proxy with a standard web browser, but that's OK. As long as your proxy works correctly for single HTTP transactions (for instance, try telnetting to to the port the proxy is running from and requesting a single HTML document) you can still receive all the possible points for this assignment.

A Note on Network Programming

Writing code that will interact with other programs on the Internet is a little different than just writing something for your own use. The general guideline often given for network programs is: be lenient about what you accept, but strict about what you send. That is, even if a client doesn't do exactly the right thing, you should make a best effort to process their request if it is possible to easily figure out their intent. On the other hand, you should ensure that anything that you send out conforms to the published protocols as closely as possible. If an incoming request has a single field out of whack (such as sending you a request using HTTP 0.9 or 1.1), uses non-standard line terminators (some clients only send \r instead of the standard \r\n), or does something you don't quite expect with HTTP headers, you should still handle the request rather than dropping the request, i.e. clean up the request by removing/replacing the offending fields with the appropriate values before sending it off. Pay attention to parts of the RFC that specify areas where not all clients may conform exactly to what you expect. We'll be looking for this kind of interoperability in both the second round of tests that we run and in the style portion of your grade.

When in doubt, try to follow the behavior specified in RFC 1945. Also, check the FAQ for more specific guidelines.


Last updated: Thu Oct 10 11:00:01 -0400 2013

COS-461 Assignments: HTTP Proxy


Assignment 2: HTTP Proxy

Overview

In this assignment, you will implement a web proxy that passes requests and data between multiple web clients and web servers, concurrently. This will give you a chance to get to know one of the most popular application protocols on the Internet -- the Hypertext Transfer Protocol (HTTP) -- and give you an introduction to the Berkeley sockets API. When you're done with the assignment, you should be able to configure Firefox to use your personal proxy server as a web proxy.

Introduction: The Hypertext Transfer Protocol

The Hypertext Transfer Protocol (HTTP) is the protocol used for communication on this web: it defines how your web browser requests resources from a web server and how the server responds. For simplicity, in this assignment, we will be dealing only with version 1.0 of the HTTP protocol, defined in detail in RFC 1945. You may refer to that RFC while completing this assignment, but our instructions should be self-contained.

HTTP communications happen in the form of transactions; a transaction consists of a client sending a request to a server and then reading the response. Request and response messages share a common basic format:

  • An initial line (a request or response line, as defined below)
  • Zero or more header lines
  • A blank line (CRLF)
  • An optional message body.


The initial line and header lines are each followed by a "carriage-return line-feed" (\r\n) signifying the end-of-line.

For most common HTTP transactions, the protocol boils down to a relatively simple series of steps (important sections of RFC 1945 are in parenthesis):
  1. A client creates a connection to the server.
  2. The client issues a request by sending a line of text to the server. This request line consists of a HTTP method (most often GET, but POST, PUT, and others are possible), a request URI (like a URL), and the protocol version that the client wants to use (HTTP/1.0). The request line is followed by one or more header lines. The message body of the initial request is typically empty. (5.1-5.2, 8.1-8.3, 10, D.1)
  3. The server sends a response message, with its initial line consisting of a status line, indicating if the request was successful. The status line consists of the HTTP version (HTTP/1.0), a response status code (a numerical value that indicates whether or not the request was completed successfully), and a reason phrase, an English-language message providing description of the status code. Just as with the the request message, there can be as many or as few header fields in the response as the server wants to return. Following the CRLF field separator, the message body contains the data requested by the client in the event of a successful request. (6.1-6.2, 9.1-9.5, 10)
  4. Once the server has returned the response to the client, it closes the connection.

It's fairly easy to see this process in action without using a web browser. From a Unix prompt, type:

This opens a TCP connection to the server at www.yahoo.com listening on port 80 (the default HTTP port). You should see something like this:

Trying 69.147.125.65... Connected to any-fp.wa1.b.yahoo.com. Escape character is '^]'.

type the following:

and hit enter twice. You should see something like the following:

HTTP/1.0 200 OK Date: Tue, 16 Feb 2010 19:21:24 GMT (More HTTP headers...) Content-Type: text/html; charset=utf-8 <html><head> <title>Yahoo!</title> (More HTML follows)

There may be some additional pieces of header information as well- setting cookies, instructions to the browser or proxy on caching behavior, etc. What you are seeing is exactly what your web browser sees when it goes to the Yahoo home page: the HTTP status line, the header fields, and finally the HTTP message body- consisting of the HTML that your browser interprets to create a web page. You may notice here that the server responds with HTTP 1.1 even though you requested 1.0. Some web servers refuse to serve HTTP 1.0 content.

HTTP Proxies

Ordinarily, HTTP is a client-server protocol. The client (usually your web browser) communicates directly with the server (the web server software). However, in some circumstances it may be useful to introduce an intermediate entity called a proxy. Conceptually, the proxy sits between the client and the server. In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client).

Why use a proxy? There are a few possible reasons:

  • Performance: By saving a copy of the pages that it fetches, a proxy can reduce the need to create connections to remote servers. This can reduce the overall delay involved in retrieving a page, particularly if a server is remote or under heavy load.
  • Content Filtering and Transformation: While in the simplest case the proxy merely fetches a resource without inspecting it, there is nothing that says that a proxy is limited to blindly fetching and serving files. The proxy can inspect the requested URL and selectively block access to certain domains, reformat web pages (for instances, by stripping out images to make a page easier to display on a handheld or other limited-resource client), or perform other transformations and filtering.
  • Privacy: Normally, web servers log all incoming requests for resources. This information typically includes at least the IP address of the client, the browser or other client program that they are using (called the User-Agent), the date and time, and the requested file. If a client does not wish to have this personally identifiable information recorded, routing HTTP requests through a proxy is one solution. All requests coming from clients using the same proxy appear to come from the IP address and User-Agent of the proxy itself, rather than the individual clients. If a number of clients use the same proxy (say, an entire business or university), it becomes much harder to link a particular HTTP transaction to a single computer or individual.

References:

  • RFC 1945 The Hypertext Transfer Protocol, version 1.0

Assignment Details

The Basics

Your task is to build a web proxy capable of accepting HTTP requests, forwarding requests to remote (origin) servers, and returning response data to a client. The proxy MUST handle concurrent requests by forking a process for each new client request using the system call. You will only be responsible for implementing the GET method. All other request methods received by the proxy should elicit a "Not Implemented" (501) error (see RFC 1945 section 9.5 - Server Error).

This assignment can be completed in either C or C++. It should compile and run (using g++) without errors or warnings from the FC 010 cluster, producing a binary called that takes as its first argument a port to listen from. Don't use a hard-coded port number.

You shouldn't assume that your server will be running on a particular IP address, or that clients will be coming from a pre-determined IP.

Listening

When your proxy starts, the first thing that it will need to do is establish a socket connection that it can use to listen for incoming connections. Your proxy should listen on the port specified from the command line and wait for incoming client connections. Each new client request is accepted, and a new process is spawned using to handle the request. To avoid overwhelming your server, you should not create more than a reasonable number of child processes (for this experiment, use at most 20), in which case your server should wait until one of its ongoing child processes exits before forking a new one to handle the new request.

Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request -- but don't worry, we have provided you with libraries that parse the HTTP request lines and headers. Specifically, you will use our libraries to ensure that the proxy receives a request that contains a valid request line:

<METHOD> <URL> <HTTP VERSION> All other headers just need to be properly formatted: <HEADER NAME>: <HEADER VALUE> In this assignment, client requests to the proxy must be in their absolute URI form (see RFC 1945, Section 5.1.2), e.g., GET http://www.cs.princeton.edu/index.html HTTP/1.0 Your browser will send absolute URI if properly configured to explicitly use a proxy (as opposed to a transparent on-path proxies that some ISPs deploy, unbeknownst to their users). On the other form, your proxy should issue requests to the webserver properly specifying relative URLs, e.g., GET /index.html HTTP/1.0 Host: www.cs.princeton.edu An invalid request from the client should be answered with an appropriate error code, i.e. "Bad Request" (400) or "Not Implemented" (501) for valid HTTP methods other than GET. Similarly, if headers are not properly formatted for parsing, your proxy should also generate a type-400 message.

Parsing Library

We have provided a parsing library to do string parsing on the header of the request. This library is in in the skeleton code. The library can parse the request into a structure called which has fields for things like the host name (domain name) and the port. It also parses the custom headers into a set of ParsedHeader structs which each contain a key for the header field name and value corresponding to the value to which the header is set. You can search for headers by the key or header field name and modify them. The library can also recompile the headers into a string given the information in the structs.

More details as well as an example of how to use the library is included in the library file, . This library can also be used to verify that the headers are in the correct format since the parsing functions return error codes if this is not the case.

Parsing the URL

Once the proxy receives a valid HTTP request, it will need to parse the requested URL. The proxy needs at least three pieces of information: the requested host, port, and path. See the manual page for more info. You will need to parse the absolute URL specified in the given request line. You can use the parsing library to help you. If the hostname indicated in the absolute URL does not have a port specified, you should use the default HTTP port 80.

Getting Data from the Remote Server

Once the proxy has parsed the URL, it can make a connection to the requested host (using the appropriate remote port, or the default of 80 if none is specified) and send the HTTP request for the appropriate resource. The proxy should always send the request in the relative URL + Host header format regardless of how the request was received from the client:

Accept from client:

GET http://www.princeton.edu/ HTTP/1.0 Send to remote server: GET / HTTP/1.0 Host: www.princeton.edu Connection: close (Additional client specified headers, if any...) Note that we always send HTTP/1.0 flags and a header to the server, so that it will close the connection after its response is fully transmitted, as opposed to keeping open a persistent connection (as we learned in Recitation 2). So while you should pass the client headers you receive on to the server, you should make sure you replace any header received from the client with one specifying , as shown. To add new headers or modify existing ones, use the HTTP Request Parsing Library we provide.

Returning Data to the Client

After the response from the remote server is received, the proxy should send the response message as-is to the client via the appropriate socket. To be strict, the proxy would be required to ensure a is present in the server's response to let the client decide if it should close it's end of the connection after receiving the response. However, checking this is not required in this assignment for the following reasons. First, a well-behaving server would respond with a anyway given that we ensure that we sent the server a close token. Second, we configure Firefox to always send a by setting keepalive to false. Finally, we wanted to simplify the assignment so you wouldn't have to parse the server response.

The following summarizes how status replies should be sent from the proxy to the client:

  1. For any error your proxy should return the status 500 'Internal Error'. This means for any request method other than GET, your proxy should return the status 500 'Internal Error' rather than 501 'Not Implemented'. Likewise, for any invalid, incorrectly formed headers or requests, your proxy should return the status 500 'Internal Error' rather than 400 'Bad Request' to the client. For any error that your proxy has in processing a request such as failed memory allocation or missing files, your proxy should also return the status 500 'Internal Error'. (This is what is done by default in this case.)
  2. Your proxy should simply forward status replies from the remote server to the client. This means most 1xx, 2xx, 3xx, 4xx, and 5xx status replies should go directly from the remote server to the client through your proxy. Most often this should be the status 200 'OK'. However, it may also be the status 404 'Not Found' from the remote server. (While you are debugging, make sure you are getting valid 404 status replies from the remote server and not the result of poorly forwarded requests from your proxy.)

Testing Your Proxy

Run your client with the following command:

, where is the port number that the proxy should listen on. As a basic test of functionality, try requesting a page using telnet:

telnet localhost <port> Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. GET http://www.google.com/ HTTP/1.0

If your proxy is working correctly, the headers and HTML of the Google homepage should be displayed on your terminal screen. Notice here that we request the absolute URL () instead of just the relative URL (). A good sanity check of proxy behavior would be to compare the HTTP response (headers and body) obtained via your proxy with the response from a direct telnet connection to the remote server. Additionally, try requesting a page using telnet concurrently from two different shells.

For a slightly more complex test, you can configure Firefox to use your proxy server as its web proxy as follows:

  1. Go to the 'Edit' menu.
  2. Select 'Preferences'. Select 'Advanced' and then select 'Network'.
  3. Under 'Connection', select 'Settings...'.
  4. Select 'Manual Proxy Configuration'. If you are using localhost, remove the default 'No Proxy for: localhost 127.0.0.1". Enter the hostname and port where your proxy program is running.
  5. Save your changes by selecting 'OK' in the connection tab and then select 'Close' in the preferences tab.

Socket and Multi-Process Programming

You can find details for the Berkeley sockets library in the Unix pages (most of them are in section 2) and in the Stevens Unix Network Programming book, particularly chapters 3 and 4. Other sections you may want to browse include the client-server example system in chapter 5 (you will need to write both client and server code for this assignment) and the name and address conversion functions in chapter 9. Please refer to the first precept slides in order to review the Socket Programming tutorial and the reference solution given for Assignment 0 during precept 2 and on Piazza.

In addition to the Berkeley sockets library, there are some functions you will need to use for creating and managing multiple processes: fork and waitpid. These will be reviewed in precept 3.

References:

Grading

You should submit your completed proxy by the date posted on the course website to CS Dropbox. You will need to submit a tarball file containing the following:

  • All of the source code for your proxy
  • A Makefile that builds your proxy
  • A README file describing your code and the design decisions that you made.

Your tarball should be named . The sample Makefile in the skeleton zip file we provide will make this tarball for you with the command.

Your proxy will be graded out of twenty points, with the following criteria:

  1. When running on your assignment, it should compile without errors or warnings on the course VM and produce a binary named . The first command line argument should be the port that the proxy will listen from.
  2. Your proxy should run silently: any status messages or diagnostic output should be off by default.
  3. You can complete the assignment in either C or C++.
  4. Your proxy should work with Firefox.
  5. We'll first check that your proxy works correctly with a small number of major web pages, using the same scripts that we've given you to test your proxy. If your proxy passes all of these 'public' tests, you will get 10 of the possible points.
  6. We'll then check a number of additional URLs and transactions that you will not know in advance. If your proxy passes all of these tests, you get 5 additional points. These tests will check the overall robustness of your proxy, and how you handle certain edge cases. This may include sending your proxy incorrectly formed HTTP requests, large transfers, etc.
  7. You'll get 2 additional points for writing a reasonable README.
  8. Well written code will get 3 additional points -- for readability, error checking, and comments -- for a total of 20 points.

Check the FAQ for more specific guidelines.


Last updated: Tue May 13 09:10:26 -0400 2014

Leave a Comment

(0 Comments)

Your email address will not be published. Required fields are marked *