# Tabular data

When we convert to categorical variables NaNs are converted to -1. But because we're going to use an embedding that doesn't recognize when values are -1, we add +1 to all the cats.&#x20;

fill\_missing - like proc\_df. The fact that something is missing helps predict our outcome.&#x20;

Selon fastai - you can replace the NaN values by almost any number because if it turns out that the missingness is important, it can use the interaction between the Na\_column and the initial column to make predictions. &#x20;

You need to tell it what your categorical/continous variables are and also which processing steps you want to use: fillmissing, categorify, normalize.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LuOQgUH2uL-hVoCeG4Q%2F-LuOSkYeCW00UqyGZDHS%2Fimage.png?alt=media\&token=b96ec4be-b469-4e38-a8f1-31787f1ee928)

We add day of the month for ex as a cat var, because if there is a different behavior for the 15th, 30th and 1st of the month, it's going to create an embedding matrix and those diff days of the months can get diff behaviors.&#x20;

Think carefully which things need to be where.&#x20;

If the cardinality is not too high, better to put it as a categorical variable. &#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LuOQgUH2uL-hVoCeG4Q%2F-LuOTVgYWdL4WUbH1p0I%2Fimage.png?alt=media\&token=d573b111-e3bf-4548-aa96-f48babd0d5c5)

We have to tell fast ai that the class of the labels we want is a list of flaots, not a list of categories. So now, this becomes a regression problem.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LuOQgUH2uL-hVoCeG4Q%2F-LuOUKCytCd1a2Vms4by%2Fimage.png?alt=media\&token=d70363ea-79a0-4599-88e3-c0eda4136fcf)

Take the max of the price, then the log of that, and that will be our y\_range. We multiply by 1.2 to have a range a little bit higher than the max to be able to reach it.&#x20;

### Architecture

For tabular, it's a simple NN architecture. Fully connected models.&#x20;
