NLP
Tokenization
Splitting a string into segments and storing them in a list.
Tokenization based on white space, a hyphen, or any punctuation.
Bag of words representation
Count the numnber of times a particular token appears = bag of workds
It counts the number of times a word was pulled out of t he bag.
This approach discard information about word order.
A better approach is using n-grams.

In this exercise, you'll complete the function definition combine_text_columns(). When completed, this function will convert all training text data in your DataFrame to a single string per row that can be passed to the vectorizer object and made into a bag-of-words using the .fit_transform() method.
Now you will use combine_text_columns to convert all training text data in your DataFrame to a single vector that can be passed to the vectorizer object and made into a bag-of-words using the .fit_transform()method.
Last updated
Was this helpful?