Web Application Security

Information on this page is taken from The Basics of Web Application Security by Cairns and Somerfield.

Reject Unexpected Form Input

On our scale of trust, data coming from the user's browser, whether we are providing the form or not, and regardless of whether the connection it HTTPS-protected, is effectively zero. In order to ensure the integrity of incoming data, validation needs to be handled on the server.

Input Validation

Input validation is the process of ensuring input data is consistent with application expectations. Data that falls outside of an expected set of values can cause our application to yield unexpected results, for example violating business logic, triggering faults, and even allowing an attacker to take control of resources or the application itself.

Input validation functionality is built in to most modern frameworks and, when absent, can also be found in external libraries that enable the developer to put multiple constraints to be applied as rules on a per field basis.

Whitelisting

Input validation is more effective for inputs that can be restricted to a small set. Numeric types can typically be restricted to values within a specific range. For example, it doesn’t make sense for a user to request to transfer a negative amount of money or to add several thousand items to their shopping cart. This strategy of limiting input to known acceptable types is known as positive validation or whitelisting.

Ideally, we reject input if it fails validation. If you must provide user feedback, you are best served with a canned response that doesn't echo back untrusted user data, for example "You must choose email or text." Reject the web content before it gets deeper into application logic to minimize ways to mishandle untrusted data or, even better, use your web framework to whitelist input.

Blacklisting

It might be tempting to try filtering the <script> tag to thward this attack. Rejecting input that contains known dangerous values is a strategy referred to as negative validation or blacklisting. The trouble with this approach is that the number of possible bad inputs is extremely large. If you must blacklist, reference OWASP's XSS Filter Evasion Cheat Sheet.

Resist the temptation to filter out invalid input. This is a practice commonly called "sanitization". It is essentially a blacklist that removes undesirable input rather than rejecting it. Like other blacklists, it is hard to get right and provides the attacker with more opportunities to evade it.

Encode HTML Output

In addition to limiting data coming into an application, we application developers need to pay close attention to the data as it comes out. Risk becomes particularly high when rending content from an untrusted source. We should reject input that falls outside the bounds of the contract, but what do we do when we need to accept input containing characters that has the potential to change our code, like a single quote of an open bracket? This is where output encoding comes in.

Output Encoding

Output encoding is converting outgoing data to a final output format. You need a different codec depending on how the outgoing data is going to be consumed. Avoid nested rendering contexts as much as possible because that adds complexity to your codec.

Most modern web frameworks have mechanisms for rendering content safely and escaping reserved characters.

Store your data in raw form and encode at rendering time. This allows you to re-encode at a later time.

Bind Parameters for Database Queries

Whether you are writing SQL against a relational database, using an object-relational mapping framework, or querying a NoSQL database, you probably need to worry about how input data is used within your queries.

Parameter Binding

Parameter binding provides a means of separating executable code, such as SQL, from content, transparently handling content encoding and escaping. Any full-featured data access layer will have the ability to bind variables and defer implementation to the underlying protocol.

Protect Data in Transit

When using an ordinary HTTP connection, users are exposed to many risks arising from the fact data is transmitted in plaintext. An attacker capable of intercepting network traffic anywhere between a user's browser and a server can eavesdrop or even tamper with the data completely undetected in a man-in-the-middle attack. There is no limit to what the attacker can do, including stealing the user's session or their personal information, injecting malicious code that will be executed by the browser in the context of the website, or altering data the user is sending to the server. We can protect against many of these risks with HTTPS.

The HTTPS protocol uses the Transport Layer Security (TLS) protocol to secure communications. When configured and used correctly, it provides protection against eavesdropping and tampering, along with a reasonable guarantee that a website is the one we intend to be using.

Use HTTPS for everything and use HSTS to enforce it. You will need a certificate from a trusted certificate authority if you plan to trust normal we browsers. Also, be sure to set the "secure" flag in cookies.

It is dangerous to put sensitive data inside of a URL. Doing so presents a risk if the URL is cached in browser history, not to mention if it is recorded in logs on the server side.

Hash and Salt Your Users' Passwords

The most obvious way to write password-authentication is to store username and password in a table and do look ups against it. Don't ever do this. Either store credentials safely or don't store them at all.

You never want to store the password itself, but rather store a hash of the password. A cryptographic hashing algorithm is a one-way transformation from an input to an output from which the original input is, for all practical purposes, impossible to recover. You aren't storing the password at all, but rather this hash. In order to validate a user's password, you just apply the same hash algorithm to the password text they send, and, if they match, you know the password is valid.

A salt is some extra data that is added to the password before it is hashed so that two instances of a given password do not have the same hash value. The salt doesn't require any special protection and can live alongside the hash. A salt should be globally unique per user (e.g. you can randomly generate a UUID).

The best widely-available algorithms are now considered to be scrypt and bcrypt. Don't use SHA-1 or MD5.

Be careful not to set password size limits that are too small, or character set limits that are too narrow.

Authenticate Users Safely

Authentication confirms that a user is who they claim to be. Authorization defines whether a user is allowed to do something. Session management makes it possible to relate requests made by a particular user. Keep these three separate in your software to reduce complexity (and therefore risk).

Regardless of which method you choose, it is always wise to try to find an existing, mature framework that provides the capabilities you need.

Authentication Options

Although authenticating using a username and a password works well for many systems, it isn’t our only option. We can rely on external service providers where users may already have accounts to identify them. We can also authenticate users using a variety of different factors: something you know, such as a password or a PIN, something you have, such as your mobile phone or a key fob, and something you are, such as your fingerprints. Depending on your needs, some of these options may be worth considering, while others are helpful when we want to add an extra layer of protection.

One option that offers a convenience for many users is to allow them to log in using their existing account on popular services such as Facebook, Google, and Twitter, using a service called Single Sign-On (SSO). SSO allows users to log in to different systems using a single identity managed by an identity provider. For example, when visiting a website you may see a button that says “Sign in with Twitter” as an authentication option. To achieve this, SSO relies on the external service to manage logging the user in and to confirm their identity. The user never provides any credentials to our site.

SSO can significantly reduce the amount of time it takes to sign up for a site and eliminates the need for users to remember yet another username and password. However, some users may prefer to keep their use of our site private and not connect it to their identity elsewhere. Others may not have an existing account with the external providers we support. It is always preferable to allow users to register by manually entering their information as well.

With Two-Factor Authentication (2FA), a second, different factor of authentication is required to confirm the identity of a user.

Reauthenticate For Important Actions

Authentication isn’t only important when logging in. We can also use it to provide additional protection when users perform sensitive actions such as changing their password or transferring money. This can help limit the exposure in the event a user’s account is compromised. For example, some online merchants require you to re-enter details from your credit card when making a purchase to a newly-added shipping address. It is also helpful to require users to re-enter their passwords when updating their personal information.

Conceal Whether Users Exist

When a user makes a mistake entering their username or password, we might see a website respond with a message like this: The user ID is unknown. Revealing whether a user exists can help an attacker enumerate accounts on our system to mount further attacks against them or, depending on the nature of the site, revealing the user has an account may compromise their privacy. A better, more generic, response might be: Incorrect user ID or password.

Preventing Brute Force Attacks

A good starting point that will slow an attacker down is to lock users out temporarily after a number of failed login attempts. This can help reduce the risk of an account being compromised, but it can also have the unintended effect of allowing an attacker to cause a denial-of-service condition by abusing it to lock users out. Using short lockouts of between 10 to 60 seconds can be an effective deterrent without imposing the same availability risks.

Another popular option is to use CAPTCHAs, which attempt to deter automated attacks by presenting a challenge that a human can solve but a computer can not.

Layering these options has been used as an effective strategy on sites that see frequent brute force attacks. After two login failures occur for an account, a CAPTCHA might be presented to the user.

Protect User Sessions

Session management relates user data across requests. As always, it is preferable to use an existing, mature framework to handle session management for you and tune it for your needs rather than trying to implement it yourself from scratch.

Sessions are typically created by setting a session identifier inside a cookie that will be sent by a user's browser in subsequent requests. The security of these identifiers depend on them being unpredictable, unique, and confidential

Don't Expose Session Identifiers

Store session identifiers in cookies, not URLs.

When cookies are used for sessions, we should take some simple precautions to make sure they are not unintentionally exposed. There are four attributes that are important to understand for this purpose: Domain, Path, HttpOnly, and Secure.

Domain restricts the scope of a cookie to a particular domain and its subdomains, and Path further restricts the scope to a path and its subpaths. The default for Domain will only permit a cookie to be sent to the originating domain and its subdomains, and the default for Path will restrict a cookie to the path of the resource where the cookie was set and its subpaths.

The other two attributes, Secure and HttpOnly, control how the cookie is used. The Secure flag indicates that the browser should only send the cookie when using HTTPS. The HttpOnly flag instructs the browser that the cookie should not be accessible through JavaScript or other client side scripts, which helps prevent it being stolen by malicious code.

Managing the Session Lifecycle

We should always create a new session when a user authenticates or elevates their privilege level. Also, we should only create session identifiers ourselves and ignore identifiers that aren’t valid.

The longer a session is active, the greater the chance an attacker might be able to get their hands on it. To reduce that risk and keep our session table clean, we can impose timeouts on sessions that are left inactive for some amount of time. The duration of time depends on your risk tolerance. On our captioned cat pictures site, it might only be necessary to do this after a month or even longer. A bank, on the other hand, might have a strict policy of timing out sessions after 10 minutes of inactivity as a security precaution.

When a user does log out, we must instruct the browser to destroy their session cookie by indicating that it expired at a date in the past.

Authorize Actions

Authorization is the process of enforcing what is and is not permitted. Authorization is generally expressed as permission to take a particular action against a particular resource, where a resource is a page, a file on the files system, a REST resource, or even the entire system.

Authorize on the Server

Among the most critical mistakes a programmer can make is hiding capabilities rather than explicitly enforcing authorization on the server. For example, it is not sufficient to simply hide the "delete user" button from users that are not administrators. The request coming from the user cannot be trusted, so the server code must perform the authorization of the delete.

Further, the client should never pass authorization information to the server. Rather the client should only be allowed to pass temporary identity information, such as session ids, that have been previously generated on the server, and are unguessable (see above for session management practices). Again, the server should not trust anything from the client as far as identity, permissions, or roles, that it cannot explicitly validate.

Deny by Default

Your authorization mechanism should always deny actions by default unless they are explicitly allowed. Similarly, if you have some actions that require authorization and others that do not, it is much safer to deny by default and override any actions that don't require a permission. In both cases, providing a safe default limits the damage that can occur if you neglect to specify the permissions for a particular action.

Authorize Actions on Resources

Code should authorize against specific resources such as files, profiles, or REST endpoints. Generally speaking, you will encounter two different kinds of authorization requirements: global permissions and resource-level permissions. You can think of global permission as having an implicit system resource.

Use Policy to Authorize Behavior

Fundamentally, the entire process from identification through execution of an action could be summarized as follows:

An anonymous actor becomes a known principal through authentication
Policy determines whether an action can be taken by that principal against a resource.
Assuming the policy allows the action, the action is executed.

A policy contains the logic that answers the question of whether an action is or is not allowed, but the way it makes that assessments varies broadly based on the needs of the application. Below we'll look at two common approaches. These are not the only two approaches.

RBAC

Probably the most common variant of authorization is role-based access control (RBAC). As the name implies, users are assigned roles and roles are assigned permissions. Users inherit the permission for any roles they have been assigned. Actions are validated for permissions.

Consider RBAC when:

Permissions are relatively static
Roles in your policies actually map reasonably to roles within your domain, rather than feeling like contrived aggregations of permissions
There isn't a terribly large number of permutations of permission, and therefore roles that will have to be maintained
You have no compelling reason to use one of the other options.

ABAC

If your application has more advanced needs than you can reasonably implement with RBAC, you may want to look at attribute-based access control (ABAC). Attribute-based access control can be thought of as a generalization of RBAC that extends to any attribute of the user, the environment in which the user exists, or the resource being accessed. ABAC policies can be defined in code, markup, of a DSL.

Consider ABAC when:

Permissions are highly dynamic and simply changing user roles is going to be a significant maintenance headache
The profile attributes on which permissions depend are already maintained for other purposes, such as managing an employee's HR profile
Access control is sufficiently sensitive that control flows need to vary based on temporal attributes such as whether it's during the normal working hours of your employees
You wish to have centralized policy with very fine-grained permissions, managed independently of your application code.