PV222 Security Architectures Lecture 2 Web Security 2nd April 2009 PV222: Security Architectures: Lec 2 2 Lecture Overview What is the web? The web components: HTTP and HTML HTTP, state and cookies Web server hazards SSL/TLS Further information 2nd April 2009 PV222: Security Architectures: Lec 2 3 The World Wide Web The World Wide Web (or just the web) is essentially a means of providing access to data across the Internet in a way that hides most of the complexity. In technical terms it does not do much more than the simple file transfer protocol FTP. However, the combination of transparency and hyperlinks enable the construction of the enormously complex web we have today. 2nd April 2009 PV222: Security Architectures: Lec 2 4 Web browsers and servers Two key elements of the web are the browsers and servers. The web browser is a programme running on a PC that provides a means of viewing information provided by web servers connected to the Internet. 2nd April 2009 PV222: Security Architectures: Lec 2 5 What is Web Security? Garfinkel and Spafford (in Web Security, Privacy & Commerce) define web security as: 1. "Securing the web server and the data that is on it." 2. "Securing information that travels between the web server and the user." 3. "Securing the end user's computer and other devices that people use to access the Internet." 2nd April 2009 PV222: Security Architectures: Lec 2 6 The protocol The web protocol, i.e. the set of rules by which data is transferred between web browsers and web servers is called HTTP, for HyperText Transfer Protocol. This is a very simple "request/reply" protocol running over TCP (the Transmission Control Protocol). Requests are directed from a web browser to a resource at a specific address. 2nd April 2009 PV222: Security Architectures: Lec 2 7 Addresses (URIs and URLs) URIs (Universal Resource Identifiers) are means of identifying network resources. A URI is either a URL (Uniform Resource Locator) or a Name (URN). URL syntax is defined in RFCs 1738 and 1808. A URL looks like: http:// where is an Internet host name or IP address. 2nd April 2009 PV222: Security Architectures: Lec 2 8 The language When a web browser receives a request, it responds with information (a "web page") in a language called HTML (HyperText Markup Language). An HTML file is essentially a text file containing a series of "markup tags" instructing the recipient how to display the text. A tag may also include a URI for a different web page, and the browser will display this as a hyperlink. 2nd April 2009 PV222: Security Architectures: Lec 2 9 Web standards Some of the most fundamental web-related specifications are IETF RFCs (requests for comments). W3C (World Wide Web Consortium) is a forum that develops and publishes web specifications. 2nd April 2009 PV222: Security Architectures: Lec 2 10 HTTP overview There are two main versions of HTTP: Version 1.0 (HTTP/1.0 defined in RFC 1945) and version 1.1 (HTTP/1.1 defined in RFC 2616). HTTP is an application-level protocol. The fundamental unit of HTTP communication is a message (a structured sequence of bytes). 2nd April 2009 PV222: Security Architectures: Lec 2 11 HTTP requests/responses HTTP is a request/response protocol that is, a user agent (typically a web browser on a PC) sends a request, and a remote server sends a response to that request. The request consists of a request method, a URI, and a protocol version number, followed by a MIME-like message containing a request modifiers (parameters), client information, and (possibly) content of some kind. 2nd April 2009 PV222: Security Architectures: Lec 2 12 HTTP responses A server response consists of: a status line, including the protocol version number, and a success/error code, and a MIME-like message, containing server information, content meta-information (headers), and content. The content will typically be written in HTML. 2nd April 2009 PV222: Security Architectures: Lec 2 13 HTML The latest versions of HTML are HTML 4.01 and XHTML 1.0. HTML 4.01 is a W3C Recommendation from 1999. (HTML 2.0 was published as RFC 1866). HTML 5 is currently in the W3C Working Draft phase of publication. XHTML is a reformulation of HTML in XML 1.0 (the latest version was published by W3C in August 2002). 2nd April 2009 PV222: Security Architectures: Lec 2 14 HTML syntax An HTML document is divided into: a head section (between and ) and a body (between and ). The title appears in the head (along with other information about the document), and the content appears in the body. The body will typically contain paragraphs, marked up with

...

. 2nd April 2009 PV222: Security Architectures: Lec 2 15 HTML and SGML SGML (Standard Generalized Markup Language) was published as international standard ISO 8879 in 1986. SGML is a system for defining markup languages. Authors mark up their documents by representing structural, presentational, and semantic information alongside content. HTML is one example of a markup language. 2nd April 2009 PV222: Security Architectures: Lec 2 16 SGML use A markup language defined in SGML is called an SGML application. An SGML application is characterised by: An SGML declaration that specifies which characters and delimiters appear. A document type definition (DTD) that defines the syntax of markup constructs. A specification that describes the semantics of the markup. Document instances containing data (content) and markup. 2nd April 2009 PV222: Security Architectures: Lec 2 17 XML The Extensible Markup Language (XML) is a subset of SGML. Its goal is to enable generic SGML to be server, received, and processed on the web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML. 2nd April 2009 PV222: Security Architectures: Lec 2 18 Using HTML Writing in HTML is simple. The easiest way is to use a tool which automatically produces the HTML syntax (adds the correct tags). However, because HTML is essentially plain text plus tags, direct editing simple HTML pages is very straightforward (particularly if you have a few examples to work from). 2nd April 2009 PV222: Security Architectures: Lec 2 19 HTTP is stateless The HTTP protocol does not require the server to maintain any protocol state. That is, the server does not keep any information to enable consecutive requests from a single user agent to be linked. Hence HTTP does not support "sessions", e.g. as might be required to support e-commerce. 2nd April 2009 PV222: Security Architectures: Lec 2 20 Cookies HTTP Cookies are simple means of enabling browser sessions with a server. The idea is that the server sends back state information in its response header, in the form of a Cookie. The Cookie is then resubmitted with the next request to the same server. A Cookie might, for example, specify the current contents of your shopping basket. 2nd April 2009 PV222: Security Architectures: Lec 2 21 Cookie contents A cookie header (in a response header) contains: attribute, the data payload; domain scope, enables sharing of cookies by web hosts with specified domain name; path scope, limits the URI path to which the cookie should be sent back; expiration, the expiry date of the Cookie; SSL flag, if set the Cookie should only be sent back via an HTTPS (HTTP over SSL) connection. 2nd April 2009 PV222: Security Architectures: Lec 2 22 Cookies and privacy Whilst Cookies are an invaluable tool for ecommerce and other uses of the web, they also constitute a privacy threat. Clearly, a server can use Cookies to track individual user PCs (even if the server cannot automatically discover the owner of a particular PC). We look at one way this tracking can pose a threat. 2nd April 2009 PV222: Security Architectures: Lec 2 23 Tracking cookies Web-based advertising agencies, e.g. DoubleClick, Focalink, Globaltrack, and ADSmart put advertisements on web sites. These web pages contain an tag, pointing to a URL on the advertising agency's server. When a web browser sees this tag, it contacts the agency server to retrieve the graphic. The first time the graphic is downloaded, the user browser will receive an agency cookie containing a random ID. 2nd April 2009 PV222: Security Architectures: Lec 2 24 Tracking cookies Every time the browser visits a site containing the agency's advertisements, it sends the cookie (the random ID) along with the URL of the page that is being read (using the referer field) to the agency. This enables the agency to track a single user's behaviour across multiple web sites. 2nd April 2009 PV222: Security Architectures: Lec 2 25 Countermeasures Software can be used to detect tracking cookies and eliminate them (and, in some cases, even prevent them being loaded). Sources of software include: www.spybot.info (for Spybot Search and Destroy), and www.lavasoftusa.com (for Ad-Aware 6.0) 2nd April 2009 PV222: Security Architectures: Lec 2 26 Referer field One of the fields in the header of an HTTP request message is the Referer field. This allows the client to specify, for the server's benefit, the address (URI) of the resource from which the URI of this request was obtained. In most browsers, when you look at a new page, the browser will send the URL of the current page in the referer field. Under the HTTP definitions, this is means to be an option for the user, but according to Garfinkel and Spafford, they have never seen a browser where it is optional. 2nd April 2009 PV222: Security Architectures: Lec 2 27 OWASP Top Ten I The Open Web Application Security Project (OWASP) is an open community dedicated to improving the security of web applications. The OWASP Top Ten is a project to collate information on what the most critical web application security flaws are. 2nd April 2009 PV222: Security Architectures: Lec 2 28 OWASP Top Ten II 1. Unvalidated Input 2. Broken Access Control 3. Broken Authentication and Session Management 4. Cross Site Scripting 5. Buffer Overflow 6. Injection Flaws 7. Improper Error Handling 8. Insecure Storage 9. Application Denial of Service 10. Insecure Configuration Management 2nd April 2009 PV222: Security Architectures: Lec 2 29 Unvalidated Input Unvalidated Input: Covers attacks types such as: cross site scripting; buffer overflows; format string attacks; SQL injection. One way to protect the web server is to filter out malicious input this has the problem that there are a large number of ways of encoding information. Other applications use only client-side mechanisms to validate input but these are easily bypassed. The best way to defend against these types of attacks is to check against a strict format that specifies what will be allowed. Validate against a "positive" specification: Data type; allowed character set; minimum and maximum length; ... 2nd April 2009 PV222: Security Architectures: Lec 2 30 Cross Site Scripting I Cross Site Scripting (XSS): When an attacker uses a web application to send malicious code to a different end user. Can occur anywhere a web application uses input from a user in the output it generates without validating it. Victim's browser has no way of knowing that the script should not be trusted, and will execute it. XSS attacks can generally be categorised into two categories: Stored Reflected 2nd April 2009 PV222: Security Architectures: Lec 2 31 Cross Site Scripting II Stored attacks are those where the injected code is permanently stored on the target servers, such as in a: database; message forum; visitor log; ... Reflected attacks are those where the injected code is reflected off the web server, such as in an error message, search result, etc. They are delivered to the victim via another route, such as in an email message, or on some other web server. When a user is tricked into clicking on a malicious link or submitting a specially crafted form, the injected code travels to the vulnerable web server, which reflects the attack back to the server. The browser then executes the code because it came from a "trusted" server. 2nd April 2009 PV222: Security Architectures: Lec 2 32 Cross Site Scripting III XSS can cause a variety of problems for the end user. The most severe XSS attacks involve disclosure of the user's session cookie, allowing an attacker to hijack the user's session and take over the account. Other attacks include: Disclosure of end user files Installing a Trojan horse Modifying presentation of content Best method of protection is to ensure that web applications perform validation of all a rigorous specification. 2nd April 2009 PV222: Security Architectures: Lec 2 33 Web server scripting Most web browsers have the capability to interpret scripts embedded in the web pages downloaded from a web server. Such scripts may be written in a variety of scripting languages and are run by the client's browser. In the past most browsers were installed with the capability to run scripts enabled by default. 2nd April 2009 PV222: Security Architectures: Lec 2 34 Impact of scripting attacks Users may unintentionally execute scripts written by an attacker when they follow untrusted links in web pages, mail messages, or newsgroup postings. Users may also unkowingly execute malicious scripts when viewing dynamically generated pages based on content provided by other users. 2nd April 2009 PV222: Security Architectures: Lec 2 35 Scripting attack simple example An attacker might post a message such as: Hello message board. This is a message. This is the end of my message. to an Internet discussion group. When a victim with scripts enabled in their browser reads this message, the malicious code may be executed unexpectedly. Scripting tags that can be embedded in this way include