rm-ns
The namespace remover
Removing Namespaced Elements since 2010
Have you ever wanted to remove all nodes of a certain namespace from your XML documents?
For example, you receive XML from an AJAX request full of needless stuff. Or you want to easily remove
the SOAP envelope. Or you get an “HTML” file, that was exported from Microsoft Word and you need to get
rid fast of that silly <o:p/>
stuff.
Then rm-ns is for you. It’s a combination detergent and Swiss Army Knife for exactly this purpose: removing stuff from unwanted namespaces. And the good news is: You have the choice of weapon. rm-ns is implemented in (quite exactly) the same way in three languages: XSLT, Python and Javascript.
Example
Oh, I love examples! I assume, that you use the XSLT version with saxon, then the code does the following:
Input document
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:x="http://example.com"> <head> <title>My Beautiful Markup</title> </head> <body> <x:remove-me> <p>Help! I’m trapped in a hostile element!</p> </x:remove-me> </body> </html>
Command line
$ saxon -s:input.xml -xsl:rm-ns.xsl -o:output.xml namespace='http://example.com'
The Output
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:x="http://example.com"> <head> <title>My Beautiful Markup</title> </head> <body> <p>Help! I’m trapped in a hostile element!</p> </body> </html>
If you had specified this instead:
$ saxon -s:input.xml -xsl:rm-ns.xsl -o:output.xml namespace='http://example.com' \ ?copy_children=false()
the output would look like this:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:x="http://example.com"> <head> <title>My Beautiful Markup</title> </head> <body> </body> </html>
Download the Code
You can find the complete source at GitHub. Specific downloads are available, too:
- Download version 0.9 (zip) of rm-ns
- Clone the Git repository: git://github.com/Boldewyn/rm-ns.git
Known Issues
- At this time, the namespace axes are not yet respected, hence namespace declarations themselves will most probably live on.
- If you use namespaced values in attributes, for example XPath syntax, the attribute values are not changed automatically.
- The Javascript and Python versions do not yet support the removal of the root node. (That is, they will remove it happily, but die, if the resulting document is erroneous.)
Copyright & License
Copyright © 2010 Manuel Strehl. All rights reserved.
The code is dual licensed under an MIT-style and the Gnu Public License, version 2. You are free to choose any of the two for your project. In most cases, you can just use the files, as long as the copyright statement stays intact.
Other Projects from me
Syntax highlighters are quite common nowadays. But none is tailored to the specific needs of XML documents. That’s where view-source takes over. The XSLT stylesheet takes any XML document and renders a beautiful, informative and functional HTML version of it source.
Unicodeinfo is a set
of tools to access the data of the Unicode database. At the moment the collection
consists of a toolset to convert the Unicode Dataset to an SQLite database, and a
Python module with various useful methods to accompany Python’s own unicodedata
library.
The Official README
rm-ns remove namespace These files provide possibilities to remove elements and attributes from an XML document, that are in a certain, defined namespace. Consider you have an XHTML document with math formulas in the MathML namespace. For some reason, you want the formulas stripped out of the file. Nothing simpler than that! Use the tools provided by this project and enjoy. The package contains three different implementations, that all work the same way. You can use any of the three XSLT, native Javascript or native Python implementation independently. P a r a m e t e r s : ===================== * namespace: The namespace whose elements and attributes are to be removed. * copy_children: If true, the children of matching elements will be copied. Otherwise, they will be erased, too. Default is 'true'. * remove_attributes: This parameter controls, if attributes are removed, too. Attributes are per definitionem in the empty namespace, as long as they are not explicitly, that is, with a prefix, bound to a namespace. That means, that if elements in the empty namespace should be removed, most of the attributes of other elements will vanish as well. This parameter allows to address this problem. Default is 'true'. B e h a v i o u r : =================== * If the root element is bound to the namespace, that should be removed, the result could be invalid XML. Therefore in the XSLT version in this case the whole document is embedded in a new root element <root /> in the empty namespace and a warning message is issued. H o w t o D e p l o y : =========================== XSLT version: a) in Firefox: Add the following lines to your XML file: <?xslt-param name="namespace" value="urn:my-unwanted-namespace"?> <?xml-stylesheet type="text/xsl" href="rm-ns.xsl"?> (other browsers don't support <?xslt-param ?>, you have to touch rm-ns.xsl itself there.) b) via a command line XSLT processor: $ saxon -s:source.xml -xsl:rm-ns.xsl -o:out.xml \ namespace='urn:my-unwanted-namespace' $ xalan -IN source.xml -XSL rm-ns.xsl -OUT out.xml -PARAM namespace \ 'urn:my-unwanted-namespace' c) inside PHP: <?php $xsl = new DOMDocument; $xsl->load('rm-ns.xsl'); $proc = new XSLTProcessor; $proc->importStyleSheet($xsl); $xml = new DOMDocument; $xml->load('source.xml'); $proc->setParameter('', 'namespace', 'urn:my-unwanted-namespace'); $proc->transformToURI($xml, 'file:///tmp/output.xml'); ?> d) in Python with libxml2 and libxslt bindings: #! /usr/bin/env python import libxml2, libxslt styledoc = libxml2.parseFile("rm-ns.xsl") style = libxslt.parseStylesheetDoc(styledoc) doc = libxml2.parseFile('source.xml') result = style.applyStylesheet(doc, {"namespace": "'urn:my-unwanted-namespace'"}) out = open('output.xml', 'w') out.write(result.serialize()) style.freeStylesheet() doc.freeDoc() result.freeDoc() out.close() JS version: Add the following lines to your XML file: <script type="text/javascript" src="rm-ns.js"></script> <script type="text/javascript">remove_namespace("urn:my-unwanted-namespace");</script> Presto! Namespace removed. Native Python version: >>> from rm_ns import remove_namespace >>> from xml.dom.minidom import parse >>> source = parse('source.xml') >>> remove_namespace("urn:my-unwanted-namespace", source) >>> out = open('output.xml', 'w') >>> out.write(source.toxml()) >>> out.close() L i c e n s e : =============== The tools are published under an MIT-style license and the GPL v2. Choose at your liking.
That’s All Folks!