Please wait

Concept: Escaping

Typically, you may want to display user-submitted data on your page. For example, take the ice cream example we've been working with from the past few lessons.

$flavor = $_GET['flavor'] ?? 'vanilla';
echo "My favorite ice cream flavor is {$flavor}.";

Rendering raw data can allow hackers to insert a dangerous script onto our page. To avoid this issue, we can perform what's known as "escaping."

What does escaping mean?

Escaping data in PHP means converting certain characters into a format that can be safely rendered or interpreted by a system, such as a database or a web browser. This is often necessary when handling user input that might include characters that have special meaning in the context where the data will be used.

In a web context, user input might contain HTML or JavaScript code. Escaping this data ensures that it's treated as plain text and not executed as code in the user's browser, protecting against XSS attacks. Certain characters may have special meanings in various contexts (like HTML, XML, or SQL). Escaping these characters ensures that they are interpreted as literal text and not as code or control characters.

HTML Reserved Characters

In HTML, certain characters are reserved because they have special meanings within the language's syntax. These characters need to be escaped if you want to display them as literal characters on a web page. The main reserved characters in HTML include:

  • Less-Than Sign (<): Used to start HTML tags.
  • Greater-Than Sign (>): Used to end HTML tags.
  • Ampersand (&): Used to introduce HTML entities.
  • Double Quote ("): Used to define attribute values within HTML tags.

HTML Entities

Let's say we attempted to visit the following URL:

http://www.example.com/page.php?flavor=<script>alert(1)</script>

Visiting this page causes a popup to appear. It's not the most dangerous example, but it's certainly possible to make it more dangerous. If you look closely at the URL, we have a portion of the URL with the following: <script>alert(1)</script>

This portion of the URL is causing the issue. It's because we're rendering the content from the query parameter onto the page, which causes this portion of the URL to be interpreted as valid HTML.

For this reason, it's recommended to escape any data you plan on outputting to the page. We can do so by converting the HTML reserved characters into their HTML entity counterpart.

HTML entities are a specific set of characters used in HTML to represent reserved characters, non-printable characters, and other characters that may need special representation. They begin with an ampersand (&) and end with a semicolon (;). They're useful for displaying characters that have special meanings in HTML, ensuring correct rendering, and representing characters that might be difficult or impossible to type directly.

Here's a table of HTML reserved characters and their entities.

Character HTMLEntity
<&lt;
>&gt;
&&amp;
"&quot;
'&#39;

These entities allow you to include the above characters in your HTML documents without them being interpreted by the browser as control characters. They are essential for writing correct and secure HTML, especially when dealing with user-generated content that might include these characters.

If we were to escape the value:

<script>alert(1)</script>

It would produce the following result:

&#x3C;script&#x3E;alert(1)&#x3C;/script&#x3E;

We could use a regular expression to replace these values, but luckily, we don't have to.

Escaping HTML Reserved Characters

We can use a function called htmlspecialchars(). The htmlspecialchars() function in PHP is used to convert special characters to HTML entities. This helps in escaping HTML reserved characters so that they can be displayed as literal text on a webpage rather than being interpreted as HTML code.

Here's the function definition:

htmlspecialchars(
  string $string,
  int $flags = ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401,
  ?string $encoding = null,
  bool $double_encode = true
): string

Parameters

  • string $string: The input string to be converted. Special characters are converted to HTML entities.
  • int $flags (optional): A bitmask of one or more of the following flags combined using the bitwise OR (|) operator. These flags determine how to handle quotes and what document type to use.
  • ?string|null $encoding (optional): Defines the character encoding. If omitted, the default charset from the PHP configuration is used.
  • bool $double_encode = true (optional): If set to true, the function encodes existing HTML entities in the string. If set to false, it leaves them as-is.

We can modify the behavior of the htmlspecialchars() function by configuring the second argument. Here's a list of valid flags.

FlagDescription
ENT_COMPATConverts only double quotes.
ENT_QUOTESConverts both double and single quotes.
ENT_NOQUOTESLeaves both double and single quotes unconverted.
ENT_HTML401Handles the string as HTML 4.01.
ENT_XHTMLHandles the string as XHTML.
ENT_HTML5Handles the string as HTML5.

Here's how we would update our example to use this function.

$flavor = htmlspecialchars($_GET['flavor'] ?? 'vanilla');
echo "My favorite ice cream flavor is {$flavor}.";

As you can see, we're wrapping the value of the $flavor variable with the htmlspecialchars() function. Now, if a user attempts to inject a script into the page, it'll appear as regular text instead of the browser attempting to execute the code.

The htmlspecialchars() function is an essential tool in PHP for escaping HTML content, ensuring that special characters are rendered safely on the webpage without risk of unexpected code execution or other vulnerabilities.

Key Takeaways

  • Escaping content helps in mitigating major security risks like Cross-Site Scripting (XSS).
  • PHP provides specific functions like htmlspecialchars() for different escaping needs.
  • Escaping should be used in conjunction with validation to ensure that user input meets specific criteria before it's processed.

Comments

Please read this before commenting