html_escape_ascii

html_escape_ascii() is an efficient function for escaping 8-bit ASCII for use in XML or HTML.

It consumes input text in 4-byte chunks, and uses a bitmask to copy safe patterns quickly, only reverting to 1-byte-at-a-time for characters that may require escaping.

A 2.2GHz K8 Opteron runs the test in 0.218 seconds using gcc 4.2.1 with -O3; about 240MB/s. This is variable with the input data, as it dictates how often it can do 4 bytes at a time.

Currently it probably only works on x86; other architectures have stricter alignment needs, so if you want to run it on a Sparc, please send me patches to fixup the alignment.

#include <stdio.h>
#include "html.h"

int main(void) {
	char *data = "& some <text> which \"needs\" to be escaped.";

	// Allocate a buffer for escaping strings up to this length
	char *escaped = html_entities_ascii_buffer(strlen(data));

	html_entities_ascii(data, strlen(data), escaped);

	printf("%s\n", escaped); // => &amp; some &lt;text> which &quot;needs&quot; to be escaped.
	free(escaped);

	return 0;
}
			
Listing of /files/c/html/
FilenameFilesizeLast Modified
html.c4.59k2009-09-14 12:09
html.h0.18k2009-03-31 09:03