Hello everyone, today I will tell you how I implemented the safe_html transform using lxml library of python. I tried to port the safe_html form CMFDefault dependencies to lxml libraries and tried to install my new transform in place of the old safe_html transform. So when ever our add-on is installed then it will uninstall the safe_html and install our new transoform. So there are a lot of things going on in the mind about lxml, why we use that and all.. So lets explore all these things.
What is lxml ?
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.
Why we need to port the transform to lxml ?
Earlier the safe_html transform had the dependencies of CMFDefault and we all are working to make plone free from CMFDefault dependencies as due to CMFDefault dependencies the transform was slow and also the code base for safe_html was old and needs to be updates or we can say it needs to be changed. So as we have seen lxml is fast so we choose that for our transform.
How to implement our transform using lxml ?
Till now its all good that we have decided what to use to remove CMFDefault dependencies. But now main thing is how to implement the lxml for our new transoform so that it functions same as the previous old safe_html transform. So for that I have to dig the lxml libraries and find out the modules that are useful for our transform. So I found out that we have use the cleaner class of lxml package. This class have several functions like "__init__" and "__call__". So I inherited the cleaner class into my HTMLParser class and overwrite the "__call__" function according to the requirements of our transforms.
Also I created a new function named "fragment_fromString()" which return the string by removing the nasty tags or element from it. Here is the snippet for the function :-
def fragment_fromstring(html, create_parent=False, parser=None, base_url=None, **kw):
if not isinstance(html, _strings):
raise TypeError('string required')
accept_leading_text = bool(create_parent)
elements = fragments_fromstring(html, parser=parser,
no_leading_text=not accept_leading_text,
base_url=base_url, **kw)
if not elements:
raise etree.ParserError('No elements found')
temp2 = []
if len(elements) > 1:
for i in range(len(elements)):
result = elements[i]
if result.tail and result.tail.strip():
raise etree.ParserError('Element followed by text: %r' % result.tail)
result.tail = None
temp2.append(result)
else:
result = elements[0]
if result.tail and result.tail.strip():
raise etree.ParserError('Element followed by text: %r' % result.tail)
result.tail = None
temp2.append(result)
return temp2
After that I created the main class for our transform named SafeHTML and in that class I defined the pre configured transform status as in the nasty tags and valid tags for the transform initially.
After that the transform is created that it will take the data as a stream and will give out data also as a stream. We created a data object of IDataStream class.
Now after that the convert function will take data as input and will do the operations as required as if the user give the input of nasty tags and valid tags it will filter the input html accordingly or if user doesn't give the input then it will take the default configuration of the transform and will do operations accordingly.
After writing that transform I test that transform with a lot of html inputs and checked their outputs also. They were all as required. There we go, tests cases were passing and the safe_html transform script we created was working perfectly. So the last thing that was left was to register our transform and remove old safe_html transform of PortalTransform.
Register new transform and remove old safe_html transform on add-on installation..
As of new the transform is ready and new have to integrate with plone. For that we have to modify the setuphandlers.py file as in that file we have our add-on configuration after add-on installation. We have function class "post_install" so we will configure our transform and remove the old safe_html transform on post_installation of our add-on.
There are 2 things that have to be done on the add-on installation :-
1) The old safe_html of PortalTransform have to be uninstalled/unregistered.
2) The new transform that we have created above named "exp_safe_html" have to installed.
So for uninstalling the old transform we will unregister the transform with name by using the transformEngine of PortalTransform. We will get the transform name by "getToolByName(context, 'portal_transforms')" this will give us all the transform of the portal_transforms and we will just uninstall the tranfrom with name safe_html. For confirming that we will use the logger message which will say "safe_html transform un registered" .
After unregistering the old safe_html its time to register our new exp_safe_html transform. For that we will use pkgutil to get the module where we have our new transform and we will register our new transform using getToolByName(context, 'portal_transforms') so by using TranfromEngine of portal Transform we will be able to register our new transform for our new add-on and put the logger message on successful registration of new transform.
Finally when I ran the test cases after implementing these things, I saw the logger message as "UnRegistering the Safe_html" and then next message is "Registering exp_safe_html".
Yayaya!! Finally able to register my new transform and unregister the old transform.
I tried to make you understand the code as much as possible but most part of it was coding so it better to see the code for the same as it will be more clear form the code as it quite impossible to tell all the minute things done in code to be detailed here. Hope you will understand.
Cheers!!
What is lxml ?
The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.
Why we need to port the transform to lxml ?
Earlier the safe_html transform had the dependencies of CMFDefault and we all are working to make plone free from CMFDefault dependencies as due to CMFDefault dependencies the transform was slow and also the code base for safe_html was old and needs to be updates or we can say it needs to be changed. So as we have seen lxml is fast so we choose that for our transform.
How to implement our transform using lxml ?
Till now its all good that we have decided what to use to remove CMFDefault dependencies. But now main thing is how to implement the lxml for our new transoform so that it functions same as the previous old safe_html transform. So for that I have to dig the lxml libraries and find out the modules that are useful for our transform. So I found out that we have use the cleaner class of lxml package. This class have several functions like "__init__" and "__call__". So I inherited the cleaner class into my HTMLParser class and overwrite the "__call__" function according to the requirements of our transforms.
Also I created a new function named "fragment_fromString()" which return the string by removing the nasty tags or element from it. Here is the snippet for the function :-
def fragment_fromstring(html, create_parent=False, parser=None, base_url=None, **kw):
if not isinstance(html, _strings):
raise TypeError('string required')
accept_leading_text = bool(create_parent)
elements = fragments_fromstring(html, parser=parser,
no_leading_text=not accept_leading_text,
base_url=base_url, **kw)
if not elements:
raise etree.ParserError('No elements found')
temp2 = []
if len(elements) > 1:
for i in range(len(elements)):
result = elements[i]
if result.tail and result.tail.strip():
raise etree.ParserError('Element followed by text: %r' % result.tail)
result.tail = None
temp2.append(result)
else:
result = elements[0]
if result.tail and result.tail.strip():
raise etree.ParserError('Element followed by text: %r' % result.tail)
result.tail = None
temp2.append(result)
return temp2
After that I created the main class for our transform named SafeHTML and in that class I defined the pre configured transform status as in the nasty tags and valid tags for the transform initially.
After that the transform is created that it will take the data as a stream and will give out data also as a stream. We created a data object of IDataStream class.
Now after that the convert function will take data as input and will do the operations as required as if the user give the input of nasty tags and valid tags it will filter the input html accordingly or if user doesn't give the input then it will take the default configuration of the transform and will do operations accordingly.
After writing that transform I test that transform with a lot of html inputs and checked their outputs also. They were all as required. There we go, tests cases were passing and the safe_html transform script we created was working perfectly. So the last thing that was left was to register our transform and remove old safe_html transform of PortalTransform.
Register new transform and remove old safe_html transform on add-on installation..
As of new the transform is ready and new have to integrate with plone. For that we have to modify the setuphandlers.py file as in that file we have our add-on configuration after add-on installation. We have function class "post_install" so we will configure our transform and remove the old safe_html transform on post_installation of our add-on.
There are 2 things that have to be done on the add-on installation :-
1) The old safe_html of PortalTransform have to be uninstalled/unregistered.
2) The new transform that we have created above named "exp_safe_html" have to installed.
So for uninstalling the old transform we will unregister the transform with name by using the transformEngine of PortalTransform. We will get the transform name by "getToolByName(context, 'portal_transforms')" this will give us all the transform of the portal_transforms and we will just uninstall the tranfrom with name safe_html. For confirming that we will use the logger message which will say "safe_html transform un registered" .
After unregistering the old safe_html its time to register our new exp_safe_html transform. For that we will use pkgutil to get the module where we have our new transform and we will register our new transform using getToolByName(context, 'portal_transforms') so by using TranfromEngine of portal Transform we will be able to register our new transform for our new add-on and put the logger message on successful registration of new transform.
Finally when I ran the test cases after implementing these things, I saw the logger message as "UnRegistering the Safe_html" and then next message is "Registering exp_safe_html".
Yayaya!! Finally able to register my new transform and unregister the old transform.
I tried to make you understand the code as much as possible but most part of it was coding so it better to see the code for the same as it will be more clear form the code as it quite impossible to tell all the minute things done in code to be detailed here. Hope you will understand.
Cheers!!
Comments
Post a Comment