From the course: Introduction to Machine Learning with KNIME

Explore data: Scatterplot

- Okay, let's continue with the explore data task of the data understanding phase, and we'll start with the scatter plot node. I want to use the auto data for this one, so what I can do is again, search for what I want. And you might think, why not just go through the folders. Well, you're going to see that you've got choices. So, whenever it comes to data visualization and NIME, you're going to want to favor the JavaScript options. That's really the direction they're going in for the most contemporary look in NIME at the moment. So, we'll drag that over, and why switch data sets? Well, I want to comment on that as well. When you do a scatter plot, you want to be looking at two scale variables. So this data set is going to have more scale variables to choose from. That's what's going to take some time when you do the explore data task of the data understanding phase is that all the different pairs of variables are going to require a different style visualizations. So once more, level of measurement is always a critical thing, so we'll configure this and we've got to tell what the two variables we want. We can do miles per gallon as the Y, actually. And, I can predict that with weight. Let's see how that looks. Execute in open views, and great, we've got a nice little scatter plot here. So, you're going to find that if you use the default scatter plot in NIME, it's not as nice as this, so you get a nice little effective scatter plot. Okay, so something that we can do in NIME which is kind of fun, is that we can do more of these without leaving and coming back in. So, I can change the X to horsepower, for instance, and I can apply that, but what you want to keep in mind now, we get a warning that there is a couple of missing data points for that variable. What we want to keep in mind though is we want true scale variables. If we use what are really ordinal variables, it's not going to be as effective as a scatter plot, and we might want to consider a different short type, so again, level of measurement is always, always important. So, if we go to cylinders, for instance, this doesn't really work very well as a scatter plot for a couple of reasons. We've got decimal places down at the bottom for the cylinders, and of course, you can't be in between cylinders, and we have dead space at seven, because there are no cylinders at seven. This just doesn't look right. So, you're going to always want to consider a different chart type when you have an ordinal and a scale. So, this cylinder experiment wasn't terribly successful. Be careful what variables you choose. You want those scale variables.

Contents