Original Post: 21st April 2009
Updated: 16th March 2010
The following VBScript / VBA function removes special (illegal in file names) characters from a string and returns a clean string. Illegal/special characters are replaced by a space by default, but can be set to replace specific characters with another string/character. The function can be used to remove characters not supported in filenames and directory names of the file system, SharePoint lists, etc.
A string is passed to the function which may or may not contain illegal characters declared in an array at the beginning of the function. If special/illegal characters exist in the string , they will be replaced by a space or a specified string. Once all illegal characters have been removed the function returns the cleaned string.
Replace Characters using Regular Expressions:
The following method uses the Regular Expression (RegEx) Object to remove or replace characters from the input string. This method is the most efficient and also allows for greater control over the operation.
Specify specific characters to remove or replace:
Function strClean (strtoclean)
Dim objRegExp, outputStr
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[(?*"",\\<>&#~%{}+_.@:\/!;]+"
outputStr = objRegExp.Replace(strtoclean, "-")
objRegExp.Pattern = "\-+"
outputStr = objRegExp.Replace(outputStr, "-")
strClean = outputStr
End Function
Remove all characters except specific characters, or a range of characters by allowing specific ranges (eg. a-z, A-Z, 0-9)
Function strClean (strtoclean) Dim objRegExp, outputStr Set objRegExp = New Regexp objRegExp.IgnoreCase = True objRegExp.Global = True objRegExp.Pattern = "((?![a-zA-Z0-9]).)+" outputStr = objRegExp.Replace(strtoclean, "-") objRegExp.Pattern = "\-+" outputStr = objRegExp.Replace(outputStr, "-") strClean = outputStr End Function
Replace Characters using the Replace() method:
This method is efficient and easy to use. Each character in the array is removed or replaced using the Replace() method, until all characters from the array have been removed from the string.
Specify specific characters to remove or replace:
Function strClean (strtoclean)
tempstr = strtoclean
charArray = Array("?", "/", "\", ":", "*", """", "<", ">", ",", "&", "#", "~", "%", "{", "}", "+", "_", ".")
For Each tmpChar in charArray
Select Case tmpChar
Case "&"
changeTo = " and "
Case ":"
changeTo = "-"
Case Else
changeTo = " "
End Select
tempstr = replace( tempstr, tmpChar, changeTo )
Next
strClean = tempstr
End Function
Replace Characters without using the replace method:
This method is slower than using the Replace method, but allows contextual tests based on surrounding characters of the current position in the input string. It removes the characters supplied in the Array by iterating through the input string character by character. It then removes or replaces a character if required by concatenating the left and right part of the input string surrounding the character being removed or replaced. As this method parses the input string character by character, additional tests can be performed if required based on characters surrounding the current position in the input string.
Specify specific characters to remove or replace:
Function strClean(strToClean)
Dim inStringArray()
ReDim inStringArray(len(strToClean) - 1)
For iterator = 1 to Len(strToClean)
currentChar = Mid(strToClean, iterator, 1)
Select Case currentChar
Case "?","/","\",":","*","""","<",">","","#","~","%","{","}","+","_","."
inStringArray(iterator - 1) = "-"
Case "&"
inStringArray(iterator - 1) = " and "
Case Else
inStringArray(iterator - 1) = currentChar
End Select
Next
strClean = Join(inStringArray, "")
End Function
Remove all characters except specific characters, or a range of characters using ASCII values (eg. allow a-z, A-Z, 0-9)
Function strClean(strToClean) Dim inStringArray() ReDim inStringArray(len(strToClean) - 1) For iterator = 1 to Len(strToClean) currentChar = Mid(strToClean, iterator, 1) currentCharASCII = Asc(currentChar) if currentCharASCII >= 48 AND currentCharASCII <= 57 Then '0-9 inStringArray(iterator - 1) = currentChar Elseif currentCharASCII >= 65 AND currentCharASCII <= 90 Then 'A-Z inStringArray(iterator - 1) = currentChar Elseif currentCharASCII >= 97 AND currentCharASCII <= 122 Then 'a-z inStringArray(iterator - 1) = currentChar Elseif currentCharASCII = 32 OR currentCharASCII = 10 Then 'space or new line inStringArray(iterator - 1) = currentChar End If Next strClean = Join(inStringArray, "") End Function
2 comments: