University of Maryland Libraries

VICTOR/VICTORWeb normalization rules

Horizontal Rule
What follows is a comparison of the effects of punctuation, special characters and diacritics on searching in VICTOR vs. VICTORWeb. CARL normalization rules are contrasted with actual normalization in both versions of USM's catalogs. The author wishes to hear of any necessary additions or corrections to these normalization descriptions.

Normalization of:

"A", "An", "The" as initial words in title browse

Stopwords--neither VICTOR nor VICTORWeb observe any stopwords in names searches, except VICTOR will not permit searching of Boolean operator words as elements of name headings via name keyword (//n)

Also,VICTOR keyword (//w) searches containing stopwords fail.

Horizontal Rule

CARL normalization rules
as recorded in B500 appendix E (12/97)

In general, the normalization rules that apply to a text string when it is indexed are the same ones which apply to a text string when it is entered by a user in PAC. For instance, the rules for each type of keyword search (Word, Name, and Subject Word) are the same. However, the rules for browsable searches (Name Browse, Subject Browse, Title Browse, Series Browse, and Control Number Browse) vary for each specific search.

Word (//W), Name (//N), and Subject Word (//SW/)

  • Diacritics and other non-printable characters are translated to printed equivalents, if available; otherwise they are removed.

  • All letters are shifted to uppercase.

  • Asterisks are retained if they are embedded in a word; otherwise, they are removed.

    M*A*S*H --> M*A*S*H

  • Commas are removed if they are embedded in a number; otherwise, they are replaced with spaces.

    	1,001    -->     1001
    	A,B,C    -->     A B C
    

  • Periods are retained if they are embedded in a number; otherwise, they are turned into spaces.

        6.0       -->   6.0
        A.L.A   -->     A L A
    

  • Hyphens are replaced with spaces.

    	stress-induced  -->      STRESS INDUCED
    	on-line         -->      ON LINE
    

  • Exclamation points and slashes are replaced with spaces.

         snap!crackle!pop  -->  SNAP CRACKLE POP
         snap/crackle/pop  -->  SNAP CRACKLE POP
    

  • Any other punctuation is removed.

         its'     -->     ITS
         it's       -->   ITS
         C++    -->       C 
    	etc.
    

  • The following stopwords are not indexed in the Word or Subject Word indexes:

    	A              AN
    	AND            BY
    	EDITED         FOR
    	IN             OF
    	ON             THE
    	TO             WITH
    

    There are no stopwords in the Name index.

Name Browse (//NB/)

  • Diacritics and other non-printable characters are translated to printed equivalents, if available; otherwise, they are removed.

  • The first character is changed to uppercase while the remaining characters are changed to lowercase.

  • The first comma is retained while the others are replaced with spaces.

    Hemingway, Ernest --> Hemingway, ernest

  • Hyphens are retained if followed by a number; otherwise, they are replaced with spaces. Spaces around hyphens are removed.

    	1899-1961  -->                    1899-1961
    	1899-         -->                 1899
    	Masters-Johnson  -->              Masters johnson
    	1899 - 1961  -->                  1899-1961
    

  • Apostrophes are retained.

    O'Neill --> O'neill

  • Any other punctuation is replaced with spaces.

         U.S.    -->                 U s
         Master/Johnson   -->        Masters johnson
    

  • Leading and trailing spaces are removed. Multiple spaces are collapsed into one space.

Subject Browse (//SB/)

  • Diacritics and other non-printable characters are translated to printed equivalent, if available; otherwise, they are removed.

  • The first character is changed to upper case while the remaining characters are changed to lowercase.

  • Punctuation is retained/blanked/removed as described in the section about Word (//W). Exception: for tags 600, 610, and 611, punctuation is treated as described in the section about Name Browse (//NB/).

  • Leading and trailing spaces are removed. Multiple spaces are collapsed into one space.

Title Browse (//T)

  • Characters represented by the nonfiling indicator for tags 130, 240, 245, 730, and 740 are removed from the beginning of the title.

    _4^aThe moon by night -->Moon by night

  • Diacritics and other non-printable characters are translated to printed equivalents, if available; otherwise, they are removed.

  • The first character is changed to uppercase while the remaining characters are changed to lowercase.

  • Asterisks are removed if they occur at the beginning of the title; otherwise, they are retained.

  • Commas are removed if they are embedded in a number; otherwise, they are replaced with spaces.

    	1,001 dalmations  -->  1001 dalmations
    	Snap, crackle, pop --> Snap crackle pop
    

  • Periods are retained if they are embedded in a number; otherwise, they are replaced with spaces.

    	DOS 6.0  -->           Dos 6.0
    	A.L.A. directory -->   A l a directory
    

  • Hyphens are retained if they are embedded in a number; otherwise, they are replaced with spaces.

    	1994-95 catalog  -->     1994-95 catalog
    	On-line searching  -->   On line searching
    

  • Apostrophes are retained if they are embedded in a word; otherwise, they are removed.

    	It's alive  -->            It's alive
    	Its' annual report  -->    Its annual report
    	'Tis magic  -->            Tis magic
    

  • Slashes (/), colons (:), quotes ("), semicolons (;), question marks (?), left and right brackets ([]}, underscores (_), grave accents (`), left and right braces ({}), verticle lines (|), and tildes (~) are replaced with spaces. Other punctuation is retained.

  • Leading and trailing spaces are removed. Multiple spaces are collapsed into one space.

    Series Browse (//S)

    • Diacritics and other non-printable characters are translated to printed equivalents, if available; otherwise, they are removed.

    • The first character is changed to uppercase while the remaining characters are changed to lowercase.

    • Punctuation is removed if embedded in a word; otherwise, it is replaced with spaces.

    • Leading and trailing spaces are removed. Multiple spaces are collapsed into one space.

    • Spaces are inserted between enumeration data and the series title to right-justify the enumeration in the six-character space following the series title.

      	Collected works     3
      	Collected works     10
      

    Horizontal Rule

    UM Libraries Home | USM Libraries | Cataloging Dept., UM Libraries | Search UM Libraries

    © 1999 University of Maryland Libraries
    Last Revised: May 23, 1999