Skip to main content
Ctrl+K
cudf 25.04.00 documentation - Home cudf 25.04.00 documentation - Home
  • cuDF User Guide
  • cudf.pandas
  • Polars GPU engine
  • pylibcudf documentation
  • libcudf documentation
    • Developer Guide
  • GitHub
  • Twitter
Home
cudf
cucimcudf-javacudfcugraphcumlcuprojcuspatialcuvscuxfilterdask-cudadask-cudfkvikiolibcudflibcumllibcuprojlibcuspatiallibkvikiolibrmmlibucxxraftrapids-cmakermm
stable (25.04)
nightly (25.06)stable (25.04)legacy (25.02)
  • cuDF User Guide
  • cudf.pandas
  • Polars GPU engine
  • pylibcudf documentation
  • libcudf documentation
  • Developer Guide
  • GitHub
  • Twitter

Section Navigation

  • API reference
    • Series
      • cudf.core.series.DatetimeProperties.year
      • cudf.core.series.DatetimeProperties.month
      • cudf.core.series.DatetimeProperties.day
      • cudf.core.series.DatetimeProperties.hour
      • cudf.core.series.DatetimeProperties.minute
      • cudf.core.series.DatetimeProperties.second
      • cudf.core.series.DatetimeProperties.microsecond
      • cudf.core.series.DatetimeProperties.nanosecond
      • cudf.core.series.DatetimeProperties.dayofweek
      • cudf.core.series.DatetimeProperties.weekday
      • cudf.core.series.DatetimeProperties.dayofyear
      • cudf.core.series.DatetimeProperties.day_of_year
      • cudf.core.series.DatetimeProperties.quarter
      • cudf.core.series.DatetimeProperties.is_month_start
      • cudf.core.series.DatetimeProperties.is_month_end
      • cudf.core.series.DatetimeProperties.is_quarter_start
      • cudf.core.series.DatetimeProperties.is_quarter_end
      • cudf.core.series.DatetimeProperties.is_year_start
      • cudf.core.series.DatetimeProperties.is_year_end
      • cudf.core.series.DatetimeProperties.is_leap_year
      • cudf.core.series.DatetimeProperties.days_in_month
      • cudf.core.column.string.StringMethods.byte_count
      • cudf.core.column.string.StringMethods.capitalize
      • cudf.core.column.string.StringMethods.cat
      • cudf.core.column.string.StringMethods.center
      • cudf.core.column.string.StringMethods.character_ngrams
      • cudf.core.column.string.StringMethods.character_tokenize
      • cudf.core.column.string.StringMethods.code_points
      • cudf.core.column.string.StringMethods.contains
      • cudf.core.column.string.StringMethods.count
      • cudf.core.column.string.StringMethods.detokenize
      • cudf.core.column.string.StringMethods.edit_distance
      • cudf.core.column.string.StringMethods.edit_distance_matrix
      • cudf.core.column.string.StringMethods.endswith
      • cudf.core.column.string.StringMethods.extract
      • cudf.core.column.string.StringMethods.filter_alphanum
      • cudf.core.column.string.StringMethods.filter_characters
      • cudf.core.column.string.StringMethods.filter_tokens
      • cudf.core.column.string.StringMethods.find
      • cudf.core.column.string.StringMethods.findall
      • cudf.core.column.string.StringMethods.find_multiple
      • cudf.core.column.string.StringMethods.get
      • cudf.core.column.string.StringMethods.get_json_object
      • cudf.core.column.string.StringMethods.hex_to_int
      • cudf.core.column.string.StringMethods.htoi
      • cudf.core.column.string.StringMethods.index
      • cudf.core.column.string.StringMethods.insert
      • cudf.core.column.string.StringMethods.ip2int
      • cudf.core.column.string.StringMethods.ip_to_int
      • cudf.core.column.string.StringMethods.is_consonant
      • cudf.core.column.string.StringMethods.is_vowel
      • cudf.core.column.string.StringMethods.isalnum
      • cudf.core.column.string.StringMethods.isalpha
      • cudf.core.column.string.StringMethods.isdecimal
      • cudf.core.column.string.StringMethods.isdigit
      • cudf.core.column.string.StringMethods.isempty
      • cudf.core.column.string.StringMethods.isfloat
      • cudf.core.column.string.StringMethods.ishex
      • cudf.core.column.string.StringMethods.isinteger
      • cudf.core.column.string.StringMethods.isipv4
      • cudf.core.column.string.StringMethods.isspace
      • cudf.core.column.string.StringMethods.islower
      • cudf.core.column.string.StringMethods.isnumeric
      • cudf.core.column.string.StringMethods.isupper
      • cudf.core.column.string.StringMethods.istimestamp
      • cudf.core.column.string.StringMethods.istitle
      • cudf.core.column.string.StringMethods.jaccard_index
      • cudf.core.column.string.StringMethods.join
      • cudf.core.column.string.StringMethods.len
      • cudf.core.column.string.StringMethods.like
      • cudf.core.column.string.StringMethods.ljust
      • cudf.core.column.string.StringMethods.lower
      • cudf.core.column.string.StringMethods.lstrip
      • cudf.core.column.string.StringMethods.match
      • cudf.core.column.string.StringMethods.minhash
      • cudf.core.column.string.StringMethods.ngrams
      • cudf.core.column.string.StringMethods.ngrams_tokenize
      • cudf.core.column.string.StringMethods.normalize_characters
      • cudf.core.column.string.StringMethods.normalize_spaces
      • cudf.core.column.string.StringMethods.pad
      • cudf.core.column.string.StringMethods.partition
      • cudf.core.column.string.StringMethods.porter_stemmer_measure
      • cudf.core.column.string.StringMethods.repeat
      • cudf.core.column.string.StringMethods.removeprefix
      • cudf.core.column.string.StringMethods.removesuffix
      • cudf.core.column.string.StringMethods.replace
      • cudf.core.column.string.StringMethods.replace_tokens
      • cudf.core.column.string.StringMethods.replace_with_backrefs
      • cudf.core.column.string.StringMethods.rfind
      • cudf.core.column.string.StringMethods.rindex
      • cudf.core.column.string.StringMethods.rjust
      • cudf.core.column.string.StringMethods.rpartition
      • cudf.core.column.string.StringMethods.rsplit
      • cudf.core.column.string.StringMethods.rstrip
      • cudf.core.column.string.StringMethods.slice
      • cudf.core.column.string.StringMethods.slice_from
      • cudf.core.column.string.StringMethods.slice_replace
      • cudf.core.column.string.StringMethods.split
      • cudf.core.column.string.StringMethods.startswith
      • cudf.core.column.string.StringMethods.strip
      • cudf.core.column.string.StringMethods.swapcase
      • cudf.core.column.string.StringMethods.title
      • cudf.core.column.string.StringMethods.token_count
      • cudf.core.column.string.StringMethods.tokenize
      • cudf.core.column.string.StringMethods.translate
      • cudf.core.column.string.StringMethods.upper
      • cudf.core.column.string.StringMethods.url_decode
      • cudf.core.column.string.StringMethods.url_encode
      • cudf.core.column.string.StringMethods.wrap
      • cudf.core.column.string.StringMethods.zfill
      • cudf.core.column.categorical.CategoricalAccessor.categories
      • cudf.core.column.categorical.CategoricalAccessor.ordered
      • cudf.core.column.categorical.CategoricalAccessor.codes
      • cudf.core.column.categorical.CategoricalAccessor.reorder_categories
      • cudf.core.column.categorical.CategoricalAccessor.add_categories
      • cudf.core.column.categorical.CategoricalAccessor.remove_categories
      • cudf.core.column.categorical.CategoricalAccessor.set_categories
      • cudf.core.column.categorical.CategoricalAccessor.as_ordered
      • cudf.core.column.categorical.CategoricalAccessor.as_unordered
      • cudf.core.column.lists.ListMethods.astype
      • cudf.core.column.lists.ListMethods.concat
      • cudf.core.column.lists.ListMethods.contains
      • cudf.core.column.lists.ListMethods.index
      • cudf.core.column.lists.ListMethods.get
      • cudf.core.column.lists.ListMethods.leaves
      • cudf.core.column.lists.ListMethods.len
      • cudf.core.column.lists.ListMethods.sort_values
      • cudf.core.column.lists.ListMethods.take
      • cudf.core.column.lists.ListMethods.unique
      • cudf.core.column.struct.StructMethods.field
      • cudf.core.column.struct.StructMethods.explode
    • DataFrame
      • cudf.DataFrame.dtypes
      • cudf.DataFrame.info
      • cudf.DataFrame.select_dtypes
      • cudf.DataFrame.values
      • cudf.DataFrame.ndim
      • cudf.DataFrame.size
      • cudf.DataFrame.shape
      • cudf.DataFrame.memory_usage
      • cudf.DataFrame.empty
    • Index objects
      • cudf.Index.dtype
      • cudf.Index.duplicated
      • cudf.Index.empty
      • cudf.Index.has_duplicates
      • cudf.Index.hasnans
      • cudf.Index.is_monotonic_increasing
      • cudf.Index.is_monotonic_decreasing
      • cudf.Index.is_unique
      • cudf.Index.name
      • cudf.Index.names
      • cudf.Index.ndim
      • cudf.Index.nlevels
      • cudf.Index.shape
      • cudf.Index.size
      • cudf.Index.values
      • cudf.CategoricalIndex.codes
      • cudf.CategoricalIndex.categories
      • cudf.IntervalIndex.from_breaks
      • cudf.IntervalIndex.values
      • cudf.IntervalIndex.get_indexer
      • cudf.IntervalIndex.get_loc
      • cudf.MultiIndex.from_arrays
      • cudf.MultiIndex.from_tuples
      • cudf.MultiIndex.from_product
      • cudf.MultiIndex.from_frame
      • cudf.MultiIndex.from_arrow
      • cudf.DatetimeIndex.year
      • cudf.DatetimeIndex.month
      • cudf.DatetimeIndex.day
      • cudf.DatetimeIndex.hour
      • cudf.DatetimeIndex.minute
      • cudf.DatetimeIndex.second
      • cudf.DatetimeIndex.microsecond
      • cudf.DatetimeIndex.nanosecond
      • cudf.DatetimeIndex.day_of_year
      • cudf.DatetimeIndex.dayofyear
      • cudf.DatetimeIndex.dayofweek
      • cudf.DatetimeIndex.weekday
      • cudf.DatetimeIndex.quarter
      • cudf.DatetimeIndex.is_leap_year
      • cudf.DatetimeIndex.isocalendar
      • cudf.TimedeltaIndex.days
      • cudf.TimedeltaIndex.seconds
      • cudf.TimedeltaIndex.microseconds
      • cudf.TimedeltaIndex.nanoseconds
      • cudf.TimedeltaIndex.components
      • cudf.TimedeltaIndex.inferred_freq
    • GroupBy
      • cudf.Grouper
      • cudf.core.groupby.groupby.DataFrameGroupBy.bfill
      • cudf.core.groupby.groupby.DataFrameGroupBy.corr
      • cudf.core.groupby.groupby.DataFrameGroupBy.count
      • cudf.core.groupby.groupby.DataFrameGroupBy.cumcount
      • cudf.core.groupby.groupby.DataFrameGroupBy.cummax
      • cudf.core.groupby.groupby.DataFrameGroupBy.cummin
      • cudf.core.groupby.groupby.DataFrameGroupBy.cumsum
      • cudf.core.groupby.groupby.DataFrameGroupBy.describe
      • cudf.core.groupby.groupby.DataFrameGroupBy.diff
      • cudf.core.groupby.groupby.DataFrameGroupBy.ffill
      • cudf.core.groupby.groupby.DataFrameGroupBy.fillna
      • cudf.core.groupby.groupby.DataFrameGroupBy.idxmax
      • cudf.core.groupby.groupby.DataFrameGroupBy.idxmin
      • cudf.core.groupby.groupby.DataFrameGroupBy.nunique
      • cudf.core.groupby.groupby.DataFrameGroupBy.quantile
      • cudf.core.groupby.groupby.DataFrameGroupBy.shift
      • cudf.core.groupby.groupby.DataFrameGroupBy.size
    • General Functions
    • General Utilities
    • Window
    • Input/output
    • String handling
      • cudf.core.column.string.StringMethods.byte_count
      • cudf.core.column.string.StringMethods.capitalize
      • cudf.core.column.string.StringMethods.cat
      • cudf.core.column.string.StringMethods.center
      • cudf.core.column.string.StringMethods.character_ngrams
      • cudf.core.column.string.StringMethods.character_tokenize
      • cudf.core.column.string.StringMethods.code_points
      • cudf.core.column.string.StringMethods.contains
      • cudf.core.column.string.StringMethods.count
      • cudf.core.column.string.StringMethods.detokenize
      • cudf.core.column.string.StringMethods.edit_distance
      • cudf.core.column.string.StringMethods.edit_distance_matrix
      • cudf.core.column.string.StringMethods.endswith
      • cudf.core.column.string.StringMethods.extract
      • cudf.core.column.string.StringMethods.filter_alphanum
      • cudf.core.column.string.StringMethods.filter_characters
      • cudf.core.column.string.StringMethods.filter_tokens
      • cudf.core.column.string.StringMethods.find
      • cudf.core.column.string.StringMethods.findall
      • cudf.core.column.string.StringMethods.find_multiple
      • cudf.core.column.string.StringMethods.get
      • cudf.core.column.string.StringMethods.get_json_object
      • cudf.core.column.string.StringMethods.hex_to_int
      • cudf.core.column.string.StringMethods.htoi
      • cudf.core.column.string.StringMethods.index
      • cudf.core.column.string.StringMethods.insert
      • cudf.core.column.string.StringMethods.ip2int
      • cudf.core.column.string.StringMethods.ip_to_int
      • cudf.core.column.string.StringMethods.is_consonant
      • cudf.core.column.string.StringMethods.is_vowel
      • cudf.core.column.string.StringMethods.isalnum
      • cudf.core.column.string.StringMethods.isalpha
      • cudf.core.column.string.StringMethods.isdecimal
      • cudf.core.column.string.StringMethods.isdigit
      • cudf.core.column.string.StringMethods.isempty
      • cudf.core.column.string.StringMethods.isfloat
      • cudf.core.column.string.StringMethods.ishex
      • cudf.core.column.string.StringMethods.isinteger
      • cudf.core.column.string.StringMethods.isipv4
      • cudf.core.column.string.StringMethods.isspace
      • cudf.core.column.string.StringMethods.islower
      • cudf.core.column.string.StringMethods.isnumeric
      • cudf.core.column.string.StringMethods.isupper
      • cudf.core.column.string.StringMethods.istimestamp
      • cudf.core.column.string.StringMethods.istitle
      • cudf.core.column.string.StringMethods.jaccard_index
      • cudf.core.column.string.StringMethods.join
      • cudf.core.column.string.StringMethods.len
      • cudf.core.column.string.StringMethods.like
      • cudf.core.column.string.StringMethods.ljust
      • cudf.core.column.string.StringMethods.lower
      • cudf.core.column.string.StringMethods.lstrip
      • cudf.core.column.string.StringMethods.match
      • cudf.core.column.string.StringMethods.minhash
      • cudf.core.column.string.StringMethods.ngrams
      • cudf.core.column.string.StringMethods.ngrams_tokenize
      • cudf.core.column.string.StringMethods.normalize_characters
      • cudf.core.column.string.StringMethods.normalize_spaces
      • cudf.core.column.string.StringMethods.pad
      • cudf.core.column.string.StringMethods.partition
      • cudf.core.column.string.StringMethods.porter_stemmer_measure
      • cudf.core.column.string.StringMethods.repeat
      • cudf.core.column.string.StringMethods.removeprefix
      • cudf.core.column.string.StringMethods.removesuffix
      • cudf.core.column.string.StringMethods.replace
      • cudf.core.column.string.StringMethods.replace_tokens
      • cudf.core.column.string.StringMethods.replace_with_backrefs
      • cudf.core.column.string.StringMethods.rfind
      • cudf.core.column.string.StringMethods.rindex
      • cudf.core.column.string.StringMethods.rjust
      • cudf.core.column.string.StringMethods.rpartition
      • cudf.core.column.string.StringMethods.rsplit
      • cudf.core.column.string.StringMethods.rstrip
      • cudf.core.column.string.StringMethods.slice
      • cudf.core.column.string.StringMethods.slice_from
      • cudf.core.column.string.StringMethods.slice_replace
      • cudf.core.column.string.StringMethods.split
      • cudf.core.column.string.StringMethods.startswith
      • cudf.core.column.string.StringMethods.strip
      • cudf.core.column.string.StringMethods.swapcase
      • cudf.core.column.string.StringMethods.title
      • cudf.core.column.string.StringMethods.token_count
      • cudf.core.column.string.StringMethods.tokenize
      • cudf.core.column.string.StringMethods.translate
      • cudf.core.column.string.StringMethods.upper
      • cudf.core.column.string.StringMethods.url_decode
      • cudf.core.column.string.StringMethods.url_encode
      • cudf.core.column.string.StringMethods.wrap
      • cudf.core.column.string.StringMethods.zfill
    • CharacterNormalizer
    • WordPieceTokenizer
    • TokenizeVocabulary
    • List handling
      • cudf.core.column.lists.ListMethods.astype
      • cudf.core.column.lists.ListMethods.concat
      • cudf.core.column.lists.ListMethods.contains
      • cudf.core.column.lists.ListMethods.index
      • cudf.core.column.lists.ListMethods.get
      • cudf.core.column.lists.ListMethods.leaves
      • cudf.core.column.lists.ListMethods.len
      • cudf.core.column.lists.ListMethods.sort_values
      • cudf.core.column.lists.ListMethods.take
      • cudf.core.column.lists.ListMethods.unique
    • Struct handling
      • cudf.core.column.struct.StructMethods.field
      • cudf.core.column.struct.StructMethods.explode
    • Options and settings
    • Extension Dtypes
      • cudf.core.dtypes.CategoricalDtype.categories
      • cudf.core.dtypes.CategoricalDtype.construct_from_string
      • cudf.core.dtypes.CategoricalDtype.deserialize
      • cudf.core.dtypes.CategoricalDtype.device_deserialize
      • cudf.core.dtypes.CategoricalDtype.device_serialize
      • cudf.core.dtypes.CategoricalDtype.from_pandas
      • cudf.core.dtypes.CategoricalDtype.host_deserialize
      • cudf.core.dtypes.CategoricalDtype.host_serialize
      • cudf.core.dtypes.CategoricalDtype.is_dtype
      • cudf.core.dtypes.CategoricalDtype.name
      • cudf.core.dtypes.CategoricalDtype.ordered
      • cudf.core.dtypes.CategoricalDtype.serialize
      • cudf.core.dtypes.CategoricalDtype.str
      • cudf.core.dtypes.CategoricalDtype.to_pandas
      • cudf.core.dtypes.CategoricalDtype.type
      • cudf.core.dtypes.Decimal32Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal32Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal32Dtype.deserialize
      • cudf.core.dtypes.Decimal32Dtype.device_deserialize
      • cudf.core.dtypes.Decimal32Dtype.device_serialize
      • cudf.core.dtypes.Decimal32Dtype.from_arrow
      • cudf.core.dtypes.Decimal32Dtype.host_deserialize
      • cudf.core.dtypes.Decimal32Dtype.host_serialize
      • cudf.core.dtypes.Decimal32Dtype.is_dtype
      • cudf.core.dtypes.Decimal32Dtype.itemsize
      • cudf.core.dtypes.Decimal32Dtype.precision
      • cudf.core.dtypes.Decimal32Dtype.scale
      • cudf.core.dtypes.Decimal32Dtype.serialize
      • cudf.core.dtypes.Decimal32Dtype.str
      • cudf.core.dtypes.Decimal32Dtype.to_arrow
      • cudf.core.dtypes.Decimal64Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal64Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal64Dtype.deserialize
      • cudf.core.dtypes.Decimal64Dtype.device_deserialize
      • cudf.core.dtypes.Decimal64Dtype.device_serialize
      • cudf.core.dtypes.Decimal64Dtype.from_arrow
      • cudf.core.dtypes.Decimal64Dtype.host_deserialize
      • cudf.core.dtypes.Decimal64Dtype.host_serialize
      • cudf.core.dtypes.Decimal64Dtype.is_dtype
      • cudf.core.dtypes.Decimal64Dtype.itemsize
      • cudf.core.dtypes.Decimal64Dtype.precision
      • cudf.core.dtypes.Decimal64Dtype.scale
      • cudf.core.dtypes.Decimal64Dtype.serialize
      • cudf.core.dtypes.Decimal64Dtype.str
      • cudf.core.dtypes.Decimal64Dtype.to_arrow
      • cudf.core.dtypes.Decimal128Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal128Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal128Dtype.deserialize
      • cudf.core.dtypes.Decimal128Dtype.device_deserialize
      • cudf.core.dtypes.Decimal128Dtype.device_serialize
      • cudf.core.dtypes.Decimal128Dtype.from_arrow
      • cudf.core.dtypes.Decimal128Dtype.host_deserialize
      • cudf.core.dtypes.Decimal128Dtype.host_serialize
      • cudf.core.dtypes.Decimal128Dtype.is_dtype
      • cudf.core.dtypes.Decimal128Dtype.itemsize
      • cudf.core.dtypes.Decimal128Dtype.precision
      • cudf.core.dtypes.Decimal128Dtype.scale
      • cudf.core.dtypes.Decimal128Dtype.serialize
      • cudf.core.dtypes.Decimal128Dtype.str
      • cudf.core.dtypes.Decimal128Dtype.to_arrow
      • cudf.core.dtypes.ListDtype.deserialize
      • cudf.core.dtypes.ListDtype.device_deserialize
      • cudf.core.dtypes.ListDtype.device_serialize
      • cudf.core.dtypes.ListDtype.element_type
      • cudf.core.dtypes.ListDtype.from_arrow
      • cudf.core.dtypes.ListDtype.host_deserialize
      • cudf.core.dtypes.ListDtype.host_serialize
      • cudf.core.dtypes.ListDtype.is_dtype
      • cudf.core.dtypes.ListDtype.leaf_type
      • cudf.core.dtypes.ListDtype.serialize
      • cudf.core.dtypes.ListDtype.to_arrow
      • cudf.core.dtypes.ListDtype.type
      • cudf.core.dtypes.StructDtype.deserialize
      • cudf.core.dtypes.StructDtype.device_deserialize
      • cudf.core.dtypes.StructDtype.device_serialize
      • cudf.core.dtypes.StructDtype.fields
      • cudf.core.dtypes.StructDtype.from_arrow
      • cudf.core.dtypes.StructDtype.host_deserialize
      • cudf.core.dtypes.StructDtype.host_serialize
      • cudf.core.dtypes.StructDtype.is_dtype
      • cudf.core.dtypes.StructDtype.serialize
      • cudf.core.dtypes.StructDtype.to_arrow
      • cudf.core.dtypes.StructDtype.type
    • Performance Tracking
  • 10 Minutes to cuDF and Dask cuDF
  • Comparison of cuDF and Pandas
  • Supported Data Types
  • Input / Output
    • Input / Output
    • Working with JSON data
  • Working with missing data
  • GroupBy
  • Overview of User Defined Functions with cuDF
  • Interoperability between cuDF and CuPy
  • Options
  • Performance comparisons
    • Performance comparison
  • Pandas Compatibility Notes
  • Copy-on-write
  • Memory Profiling
  • Breaking changes for pandas 2 in cuDF 24.04+
  • cuDF User Guide
  • API reference
  • CharacterNormalizer

CharacterNormalizer#

Constructor#

CharacterNormalizer(do_lower, special_tokens)

A normalizer object used to normalize input text.

CharacterNormalizer.normalize(text)

previous

String handling

next

cudf.core.character_normalizer.CharacterNormalizer

On this page
  • Constructor

This Page

  • Show Source

© Copyright 2018-2025, NVIDIA Corporation.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.