Please wait

Sanitization

We learned about filter functions in the previous lesson. Filter functions are not limited to validating data, but they can be used to sanitize data too.

Validation vs. Sanitization

Sanitization and validation are two essential processes in handling data, especially user-generated input. They serve different purposes but are complementary in ensuring data integrity and security.

Sanitization is the process of cleaning and filtering input to ensure it's free of unwanted or potentially harmful characters. It involves removing or altering these characters to make the data safe to use. For example, if you're expecting a plain text string, sanitization might include stripping out HTML tags to prevent potential cross-site scripting (XSS) attacks.

Validation, on the other hand, is the process of checking whether the data meets specific criteria or rules. It doesn't alter the data but rather verifies that it's in the correct format or within acceptable limits. For example, validating an email address ensures that it conforms to the standard email format.

Differences

  • Alteration: Sanitization may change the data by removing or altering parts of it, whereas validation only checks the data without modifying it.
  • Purpose: Sanitization is focused on making data safe, while validation is focused on ensuring data is correct and conforms to expected formats or constraints.
  • Outcome: If sanitization finds unwanted characters, it removes or alters them; if validation finds non-conforming data, it typically returns an error or false.

Both sanitization and validation are vital for secure and reliable data handling. Together, sanitization and validation help in building robust applications by controlling the quality and safety of the data being processed. They are indispensable tools for preserving the integrity, security, and proper functioning of software systems.

Sanitization Flags

We can use any of the filter functions to use sanitization flags. The following flags are available:

FlagDescription
FILTER_SANITIZE_FULL_SPECIAL_CHARSPerforms the exact same action as the htmlspecialchars() function with the ENT_QUOTES option enabled.
FILTER_SANITIZE_ENCODEDTransforms a URL into a URI-encoded string.
FILTER_SANITIZE_EMAILRemove all characters except letters, digits, and !#$%&'*+-/=?^_.|~@[].
FILTER_SANITIZE_NUMBER_INTRemove all characters except digits, plus and minus signs
FILTER_SANITIZE_NUMBER_FLOATRemoves all characters except digits, plus and minus signs, and a decimal separator.
FILTER_SANITIZE_STRINGStrip tags, optionally strip or encode special characters
FILTER_SANITIZE_URLRemove all characters except letters, digits, and $-\_.+!\*'(),{}\^~[]<>#%";:/?@&=.

Using a Sanitize Flag

Let's look at a simple example of sanitizing an email. If you need to sanitize an email address, you can use the FILTER_SANITIZE_EMAIL filter with the filter_var() function. This filter removes all characters that are not allowed in an email address.

Here's an example:

$email = "john.doe@example.<com>";
 
// Apply the FILTER_SANITIZE_EMAIL filter
$sanitizedEmail = filter_var($email, FILTER_SANITIZE_EMAIL);
 
echo $sanitizedEmail;
// Output will be "john.doe@example.com"

In this example, the FILTER_SANITIZE_EMAIL filter removes the unwanted characters, including <, >, and everything following, leaving only characters that are valid in an email address. This ensures that the resulting email address conforms to the expected format, which can be particularly useful when accepting email addresses from user input.

Consider regular expressions

While PHP's filter functions offer a convenient way to sanitize values, they may not be suitable for all use cases or might not cover every specific requirement. Their behavior is predefined, and while they work well for general purposes, there may be situations where more nuanced control over the sanitization process is needed.

In such cases, regular expressions can be a more reliable and flexible tool. By using regular expressions, developers can define exactly what characters or patterns are allowed or disallowed, tailoring the sanitization process to the precise needs of the application.

Key Takeaways

  • PHP's filter functions provide a built-in and convenient way to sanitize various types of data, such as URLs, email addresses, strings, and numbers.
  • While effective for many use cases, filter functions might not offer the level of customization or control needed for specialized requirements.
  • Depending on the context, filter functions may not always produce the expected results. Thorough testing is essential to ensure they meet specific needs.
  • For more complex or nuanced sanitization needs, developers might consider using regular expressions or other methods to achieve the desired level of control and reliability.

Comments

Please read this before commenting