Fork me on GitHub

rm-ns

The namespace remover

Removing Namespaced Elements since 2010

Have you ever wanted to remove all nodes of a certain namespace from your XML documents? For example, you receive XML from an AJAX request full of needless stuff. Or you want to easily remove the SOAP envelope. Or you get an “HTML” file, that was exported from Microsoft Word and you need to get rid fast of that silly <o:p/> stuff.

Then rm-ns is for you. It’s a combination detergent and Swiss Army Knife for exactly this purpose: removing stuff from unwanted namespaces. And the good news is: You have the choice of weapon. rm-ns is implemented in (quite exactly) the same way in three languages: XSLT, Python and Javascript.

Example

Oh, I love examples! I assume, that you use the XSLT version with saxon, then the code does the following:

Input document

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:x="http://example.com">
  <head>
    <title>My Beautiful Markup</title>
  </head>
  <body>
    <x:remove-me>
      <p>Help! I’m trapped in a hostile element!</p>
    </x:remove-me>
  </body>
</html>

Command line

$ saxon -s:input.xml -xsl:rm-ns.xsl -o:output.xml namespace='http://example.com'

The Output

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:x="http://example.com">
  <head>
    <title>My Beautiful Markup</title>
  </head>
  <body>
    
      <p>Help! I’m trapped in a hostile element!</p>
    
  </body>
</html>

If you had specified this instead:

$ saxon -s:input.xml -xsl:rm-ns.xsl -o:output.xml namespace='http://example.com' \
        ?copy_children=false()

the output would look like this:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:x="http://example.com">
  <head>
    <title>My Beautiful Markup</title>
  </head>
  <body>
    

  </body>
</html>

Download the Code

You can find the complete source at GitHub. Specific downloads are available, too:

Known Issues

Copyright & License

Copyright © 2010 Manuel Strehl. All rights reserved.

The code is dual licensed under an MIT-style and the Gnu Public License, version 2. You are free to choose any of the two for your project. In most cases, you can just use the files, as long as the copyright statement stays intact.

Other Projects from me

Syntax highlighters are quite common nowadays. But none is tailored to the specific needs of XML documents. That’s where view-source takes over. The XSLT stylesheet takes any XML document and renders a beautiful, informative and functional HTML version of it source.

Unicodeinfo is a set of tools to access the data of the Unicode database. At the moment the collection consists of a toolset to convert the Unicode Dataset to an SQLite database, and a Python module with various useful methods to accompany Python’s own unicodedata library.

The Official README

                                     rm-ns

                                remove namespace

These files provide possibilities to remove elements and attributes from an XML
document, that are in a certain, defined namespace.

Consider you have an XHTML document with math formulas in the MathML namespace.
For some reason, you want the formulas stripped out of the file. Nothing
simpler than that! Use the tools provided by this project and enjoy.

The package contains three different implementations, that all work the same
way. You can use any of the three XSLT, native Javascript or native Python
implementation independently.

P a r a m e t e r s :
=====================

* namespace: The namespace whose elements and attributes are to be removed.

* copy_children: If true, the children of matching elements will be copied.
  Otherwise, they will be erased, too. Default is 'true'.

* remove_attributes: This parameter controls, if attributes are removed, too.
  Attributes are per definitionem in the empty namespace, as long as they are
  not explicitly, that is, with a prefix, bound to a namespace. That means,
  that if elements in the empty namespace should be removed, most of the
  attributes of other elements will vanish as well. This parameter allows to
  address this problem. Default is 'true'.

B e h a v i o u r :
===================

* If the root element is bound to the namespace, that should be removed, the
  result could be invalid XML. Therefore in the XSLT version in this case the
  whole document is embedded in a new root element <root /> in the empty
  namespace and a warning message is issued.

H o w   t o   D e p l o y :
===========================

XSLT version:
  a) in Firefox: Add the following lines to your XML file:

     <?xslt-param name="namespace" value="urn:my-unwanted-namespace"?>
     <?xml-stylesheet type="text/xsl" href="rm-ns.xsl"?>

     (other browsers don't support <?xslt-param ?>, you have to touch
     rm-ns.xsl itself there.)

  b) via a command line XSLT processor:

     $ saxon -s:source.xml -xsl:rm-ns.xsl -o:out.xml \
       namespace='urn:my-unwanted-namespace'

     $ xalan -IN source.xml -XSL rm-ns.xsl -OUT out.xml -PARAM namespace \
       'urn:my-unwanted-namespace'

  c) inside PHP:

     <?php
     $xsl = new DOMDocument;
     $xsl->load('rm-ns.xsl');
     $proc = new XSLTProcessor;
     $proc->importStyleSheet($xsl);

     $xml = new DOMDocument;
     $xml->load('source.xml');
     $proc->setParameter('', 'namespace', 'urn:my-unwanted-namespace');
     $proc->transformToURI($xml, 'file:///tmp/output.xml');
     ?>

  d) in Python with libxml2 and libxslt bindings:

     #! /usr/bin/env python

     import libxml2, libxslt

     styledoc = libxml2.parseFile("rm-ns.xsl")
     style = libxslt.parseStylesheetDoc(styledoc)
     doc = libxml2.parseFile('source.xml')
     result = style.applyStylesheet(doc, {"namespace":
                                          "'urn:my-unwanted-namespace'"})

     out = open('output.xml', 'w')
     out.write(result.serialize())

     style.freeStylesheet()
     doc.freeDoc()
     result.freeDoc()
     out.close()


JS version:
  Add the following lines to your XML file:

  <script type="text/javascript" src="rm-ns.js"></script>
  <script type="text/javascript">remove_namespace("urn:my-unwanted-namespace");</script>

  Presto! Namespace removed.


Native Python version:

  >>> from rm_ns import remove_namespace
  >>> from xml.dom.minidom import parse
  >>> source = parse('source.xml')
  >>> remove_namespace("urn:my-unwanted-namespace", source)
  >>> out = open('output.xml', 'w')
  >>> out.write(source.toxml())
  >>> out.close()


L i c e n s e :
===============

The tools are published under an MIT-style license and the GPL v2. Choose at
your liking.
        

That’s All Folks!