mooc-course.com is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

How to Remove Script tag from HTML String in Javascript?

Are you battling with pesky script tags in your HTML strings? Removing script tags is crucial for security and proper rendering of user-generated content. In this article, we’ll explore five different methods to effectively remove script tags from HTML strings using JavaScript, ranging from simple regex solutions to more robust DOM-based approaches. Whether you’re building a content management system, a comment section, or just need to sanitize user input, these techniques will help you create safer and cleaner HTML output.

Read more: Showcase Arrays of Images in JavaScript

How to Remove Script tag from HTML String in Javascript?

Understanding how to remove script tags from HTML strings is essential for:

  • Preventing cross-site scripting (XSS) attacks
  • Ensuring proper rendering of user-generated content
  • Maintaining the integrity and security of your web applications

Let’s dive into the methods, each offering a unique approach to script tag removal.

Method 1: Using Regular Expressions

The simplest approach uses a regular expression to match and remove script tags.

function removeScriptTags(html) {
    return html.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '');
}

// Usage
const dirtyHtml = '<p>Hello</p><script>alert("XSS")</script><p>World</p>';
const cleanHtml = removeScriptTags(dirtyHtml);
console.log(cleanHtml); // Output: <p>Hello</p><p>World</p>

Pros:

  • Simple and easy to implement
  • Works for basic script tag removal

Cons:

  • May not catch all variations of script tags
  • Can be bypassed with more complex XSS attacks

Method 2: Using DOMParser

A more robust approach involves using the DOMParser to parse the HTML and then remove script nodes.

function removeScriptTags(html) {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const scripts = doc.getElementsByTagName('script');
    for (let i = scripts.length - 1; i >= 0; i--) {
        scripts[i].remove();
    }
    return doc.body.innerHTML;
}

// Usage
const dirtyHtml = '<p>Hello</p><script>alert("XSS")</script><p>World</p>';
const cleanHtml = removeScriptTags(dirtyHtml);
console.log(cleanHtml); // Output: <p>Hello</p><p>World</p>

Pros:

  • More reliable than regex for complex HTML structures
  • Handles nested and malformed script tags better

Cons:

  • Slightly more complex implementation
  • May be slower for very large HTML strings

Method 3: Using createTreeWalker

For a more thorough approach, we can use createTreeWalker to traverse the DOM and remove script nodes.

function removeScriptTags(html) {
    const doc = new DOMParser().parseFromString(html, 'text/html');
    const walker = document.createTreeWalker(doc.body, NodeFilter.SHOW_ELEMENT);

    const nodesToRemove = [];
    while (walker.nextNode()) {
        if (walker.currentNode.tagName === 'SCRIPT') {
            nodesToRemove.push(walker.currentNode);
        }
    }

    nodesToRemove.forEach(node => node.remove());
    return doc.body.innerHTML;
}

// Usage
const dirtyHtml = '<p>Hello</p><script>alert("XSS")</script><p>World</p>';
const cleanHtml = removeScriptTags(dirtyHtml);
console.log(cleanHtml); // Output: <p>Hello</p><p>World</p>

Pros:

  • Very thorough, catches deeply nested script tags
  • Allows for more complex filtering logic

Cons:

  • More verbose than other methods
  • Might be overkill for simple HTML strings

Method 4: Using a Sanitization Library (DOMPurify)

For production-grade sanitization, consider using a well-maintained library like DOMPurify.

// Make sure to include DOMPurify in your project
// <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/2.3.3/purify.min.js"></script>

function removeScriptTags(html) {
    return DOMPurify.sanitize(html);
}

// Usage
const dirtyHtml = '<p>Hello</p><script>alert("XSS")</script><p>World</p>';
const cleanHtml = removeScriptTags(dirtyHtml);
console.log(cleanHtml); // Output: <p>Hello</p><p>World</p>

Pros:

  • Highly secure and well-maintained
  • Handles a wide range of XSS attack vectors

Cons:

  • Adds an external dependency to your project
  • May remove more than just script tags, depending on configuration

Method 5: Using the Sanitizer API (Experimental)

For cutting-edge browsers, the new Sanitizer API provides a native solution.

async function removeScriptTags(html) {
    if ('Sanitizer' in window) {
        const sanitizer = new Sanitizer();
        const fragment = document.createRange().createContextualFragment(html);
        const cleanFragment = await sanitizer.sanitize(fragment);
        return cleanFragment.firstChild.innerHTML;
    } else {
        // Fallback to another method
        return removeScriptTagsUsingRegex(html);
    }
}

// Usage
const dirtyHtml = '<p>Hello</p><script>alert("XSS")</script><p>World</p>';
removeScriptTags(dirtyHtml).then(cleanHtml => {
    console.log(cleanHtml); // Output: <p>Hello</p><p>World</p>
});

Pros:

  • Native browser API, potentially more performant
  • Standardized approach to sanitization

Cons:

  • Limited browser support as of now
  • Requires fallback for unsupported browsers

Which Method Should You Use?

The choice depends on your specific needs:

  1. Use regex for quick and simple script tag removal in controlled environments.
  2. Opt for DOMParser or createTreeWalker for more reliable removal in complex HTML.
  3. Consider DOMPurify for production-grade sanitization.
  4. Explore the Sanitizer API for future-proof, native browser support.

For most scenarios, a combination of DOMParser (Method 2) and a fallback to regex (Method 1) provides a good balance of reliability and simplicity.

Leave a Reply

Your email address will not be published. Required fields are marked *

Free Worldwide Courses

Learn online for free

Enroll in Multiple Courses

Learn whatever your want from anywhere, anytime

International Language

Courses offered in multiple languages & Subtitles

Verified Certificate

Claim your verified certificate