VBScript String Clean Functions -3 Methods to Remove or Replace Characters in a String - Code-Tips.com - Web Development, Programming, SEO

Tuesday, April 21, 2009

VBScript String Clean Functions -3 Methods to Remove or Replace Characters in a String

VBScript String Clean Functions -3 Methods to Remove or Replace Characters in a String

Original Post: 21st April 2009
Updated: 16th March 2010

The following VBScript / VBA function removes special (illegal in file names) characters from a string and returns a clean string. Illegal/special characters are replaced by a space by default, but can be set to replace specific characters with another string/character. The function can be used to remove characters not supported in filenames and directory names of the file system, SharePoint lists, etc.

A string is passed to the function which may or may not contain illegal characters declared in an array at the beginning of the function. If special/illegal characters exist in the string , they will be replaced by a space or a specified string. Once all illegal characters have been removed the function returns the cleaned string.

Replace Characters using Regular Expressions:


The following method uses the Regular Expression (RegEx) Object to remove or replace characters from the input string. This method is the most efficient and also allows for greater control over the operation.

Specify specific characters to remove or replace:


Function strClean (strtoclean)
Dim objRegExp, outputStr
Set objRegExp = New Regexp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[(?*"",\\<>&#~%{}+_.@:\/!;]+"
outputStr = objRegExp.Replace(strtoclean, "-")

objRegExp.Pattern = "\-+"
outputStr = objRegExp.Replace(outputStr, "-")

strClean = outputStr
End Function



 

Remove all characters except specific characters, or a range of characters by allowing specific ranges (eg. a-z, A-Z, 0-9)

Function strClean (strtoclean)
Dim objRegExp, outputStr
Set objRegExp = New Regexp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "((?![a-zA-Z0-9]).)+"
outputStr = objRegExp.Replace(strtoclean, "-")

objRegExp.Pattern = "\-+"
outputStr = objRegExp.Replace(outputStr, "-")

strClean = outputStr
End Function



 

Replace Characters using the Replace() method:
This method is efficient and easy to use. Each character in the array is removed or replaced using the Replace() method, until all characters from the array have been removed from the string.

 

Specify specific characters to remove or replace:

Function strClean (strtoclean)
tempstr = strtoclean
charArray = Array("?", "/", "\", ":", "*", """", "<", ">", ",", "&", "#", "~", "%", "{", "}", "+", "_", ".")

For Each tmpChar in charArray

Select Case tmpChar
Case "&"
changeTo = " and "
Case ":"
changeTo = "-"
Case Else
changeTo = " "
End Select

tempstr = replace( tempstr, tmpChar, changeTo )
Next

strClean = tempstr
End Function



 

Replace Characters without using the replace method:
This method is slower than using the Replace method, but allows contextual tests based on surrounding characters of the current position in the input string. It removes the characters supplied in the Array by iterating through the input string character by character. It then removes or replaces a character if required by concatenating the left and right part of the input string surrounding the character being removed or replaced. As this method parses the input string character by character, additional tests can be performed if required based on characters surrounding the current position in the input string.

 

Specify specific characters to remove or replace:

Function strClean(strToClean)
Dim inStringArray()
ReDim inStringArray(len(strToClean) - 1)

For iterator = 1 to Len(strToClean)
currentChar = Mid(strToClean, iterator, 1)

Select Case currentChar
Case "?","/","\",":","*","""","<",">","","#","~","%","{","}","+","_","."
inStringArray(iterator - 1) = "-"
Case "&"
inStringArray(iterator - 1) = " and "
Case Else
inStringArray(iterator - 1) = currentChar
End Select

Next

strClean = Join(inStringArray, "")
End Function



 

Remove all characters except specific characters, or a range of characters using ASCII values (eg. allow a-z, A-Z, 0-9)

Function strClean(strToClean)
Dim inStringArray()
ReDim inStringArray(len(strToClean) - 1)

For iterator = 1 to Len(strToClean)
currentChar = Mid(strToClean, iterator, 1)
currentCharASCII = Asc(currentChar)

if currentCharASCII >= 48 AND currentCharASCII <= 57 Then '0-9
inStringArray(iterator - 1) = currentChar
Elseif currentCharASCII >= 65 AND currentCharASCII <= 90 Then 'A-Z
inStringArray(iterator - 1) = currentChar
Elseif currentCharASCII >= 97 AND currentCharASCII <= 122 Then 'a-z
inStringArray(iterator - 1) = currentChar
Elseif currentCharASCII = 32 OR currentCharASCII = 10 Then 'space or new line
inStringArray(iterator - 1) = currentChar
End If

Next

strClean = Join(inStringArray, "")
End Function

4 comments:

  1. I should also note that example code is provided above to specify a new string value to replace specific characters when found in the string being "cleaned". For example, this would allow you to replace an '&' character with the word " and ", with a space either side of the character being replaced for instances where the "&" character joins two words with no spaces.

    You could then perform a separate string replace operation to replace any double spaces if present in the string (" ") with a single space (" "). The InStr(stringToClean , " ") method could also be used similar to the function provided in this post, but the function would need to be configured to to allow for two characters to be replaced instead of just one.

    Daniel

    ReplyDelete
  2. In addition to the above note, it would obviously be possible to incorporate this into the String Clean function provided in this post, to cater for double spaces on-the-fly. In many cases this would not be required and as a result, would be using/consuming processing and memory resources unnecessarily.

    Please let me know your thoughts on whether or not the the function should incorporate checking for double spaces after replacing an "&" character with " and ", or any other illegal characters padded with spaces. I am happy to publish an updated version of the string clean method if requested.

    ReplyDelete
  3. Hi Daniel,

    i'm trying to adapt your code to replace specific characters supplied by my customer. These characters include, characters with Grave, Accut, curcumflex, Stroke, Macron etc...

    do you think you can assist?

    ReplyDelete
  4. Thanks Daniel. Needed something quick and easy to pass into a containstable query.

    ReplyDelete