Original Post: 21st April 2009
Updated: 16th March 2010
The following VBScript / VBA function removes special (illegal in file names) characters from a string and returns a clean string. Illegal/special characters are replaced by a space by default, but can be set to replace specific characters with another string/character. The function can be used to remove characters not supported in filenames and directory names of the file system, SharePoint lists, etc.
A string is passed to the function which may or may not contain illegal characters declared in an array at the beginning of the function. If special/illegal characters exist in the string , they will be replaced by a space or a specified string. Once all illegal characters have been removed the function returns the cleaned string.
Replace Characters using Regular Expressions:
The following method uses the Regular Expression (RegEx) Object to remove or replace characters from the input string. This method is the most efficient and also allows for greater control over the operation.
Specify specific characters to remove or replace:
Function strClean (strtoclean) Dim objRegExp, outputStr Set objRegExp = New Regexp objRegExp.IgnoreCase = True objRegExp.Global = True objRegExp.Pattern = "[(?*"",\\<>&#~%{}+_.@:\/!;]+" outputStr = objRegExp.Replace(strtoclean, "-") objRegExp.Pattern = "\-+" outputStr = objRegExp.Replace(outputStr, "-") strClean = outputStr End Function
Remove all characters except specific characters, or a range of characters by allowing specific ranges (eg. a-z, A-Z, 0-9)
Function strClean (strtoclean) Dim objRegExp, outputStr Set objRegExp = New Regexp objRegExp.IgnoreCase = True objRegExp.Global = True objRegExp.Pattern = "((?![a-zA-Z0-9]).)+" outputStr = objRegExp.Replace(strtoclean, "-") objRegExp.Pattern = "\-+" outputStr = objRegExp.Replace(outputStr, "-") strClean = outputStr End Function
Replace Characters using the Replace() method:
This method is efficient and easy to use. Each character in the array is removed or replaced using the Replace() method, until all characters from the array have been removed from the string.
Specify specific characters to remove or replace:
Function strClean (strtoclean) tempstr = strtoclean charArray = Array("?", "/", "\", ":", "*", """", "<", ">", ",", "&", "#", "~", "%", "{", "}", "+", "_", ".") For Each tmpChar in charArray Select Case tmpChar Case "&" changeTo = " and " Case ":" changeTo = "-" Case Else changeTo = " " End Select tempstr = replace( tempstr, tmpChar, changeTo ) Next strClean = tempstr End Function
Replace Characters without using the replace method:
This method is slower than using the Replace method, but allows contextual tests based on surrounding characters of the current position in the input string. It removes the characters supplied in the Array by iterating through the input string character by character. It then removes or replaces a character if required by concatenating the left and right part of the input string surrounding the character being removed or replaced. As this method parses the input string character by character, additional tests can be performed if required based on characters surrounding the current position in the input string.
Specify specific characters to remove or replace:
Function strClean(strToClean) Dim inStringArray() ReDim inStringArray(len(strToClean) - 1) For iterator = 1 to Len(strToClean) currentChar = Mid(strToClean, iterator, 1) Select Case currentChar Case "?","/","\",":","*","""","<",">","","#","~","%","{","}","+","_","." inStringArray(iterator - 1) = "-" Case "&" inStringArray(iterator - 1) = " and " Case Else inStringArray(iterator - 1) = currentChar End Select Next strClean = Join(inStringArray, "") End Function
Remove all characters except specific characters, or a range of characters using ASCII values (eg. allow a-z, A-Z, 0-9)
Function strClean(strToClean) Dim inStringArray() ReDim inStringArray(len(strToClean) - 1) For iterator = 1 to Len(strToClean) currentChar = Mid(strToClean, iterator, 1) currentCharASCII = Asc(currentChar) if currentCharASCII >= 48 AND currentCharASCII <= 57 Then '0-9 inStringArray(iterator - 1) = currentChar Elseif currentCharASCII >= 65 AND currentCharASCII <= 90 Then 'A-Z inStringArray(iterator - 1) = currentChar Elseif currentCharASCII >= 97 AND currentCharASCII <= 122 Then 'a-z inStringArray(iterator - 1) = currentChar Elseif currentCharASCII = 32 OR currentCharASCII = 10 Then 'space or new line inStringArray(iterator - 1) = currentChar End If Next strClean = Join(inStringArray, "") End Function
I should also note that example code is provided above to specify a new string value to replace specific characters when found in the string being "cleaned". For example, this would allow you to replace an '&' character with the word " and ", with a space either side of the character being replaced for instances where the "&" character joins two words with no spaces.
ReplyDeleteYou could then perform a separate string replace operation to replace any double spaces if present in the string (" ") with a single space (" "). The InStr(stringToClean , " ") method could also be used similar to the function provided in this post, but the function would need to be configured to to allow for two characters to be replaced instead of just one.
Daniel
In addition to the above note, it would obviously be possible to incorporate this into the String Clean function provided in this post, to cater for double spaces on-the-fly. In many cases this would not be required and as a result, would be using/consuming processing and memory resources unnecessarily.
ReplyDeletePlease let me know your thoughts on whether or not the the function should incorporate checking for double spaces after replacing an "&" character with " and ", or any other illegal characters padded with spaces. I am happy to publish an updated version of the string clean method if requested.
Hi Daniel,
ReplyDeletei'm trying to adapt your code to replace specific characters supplied by my customer. These characters include, characters with Grave, Accut, curcumflex, Stroke, Macron etc...
do you think you can assist?
Thanks Daniel. Needed something quick and easy to pass into a containstable query.
ReplyDelete