Lab 9 - Data Mining
|
 |
Objectives
- SQL or Data Mining?
- Decision Tree (example)
- Python Practice: Reading from files
SQL or Data Mining?
- Given records of hospital treatments we need to find out
how many of these took more than 2 days.
- Given records of patients check-ups we need to predict
the month (Jan, Feb, etc) when a patient will come in for a check up for
the next 12 months.
- Assuming that our predictions from 2 are correct, we
need to find the month for which the hospital will perform the most
check-ups.
- We have micro-array expression data of various genes. We
need to determine which genes lead to a certain genetic condition.
- We have micro-array expression data of various genes. We
need to find the amount of genes expressed more than a specified threshold
t over all our features.
- We want to discover relationships between products sold
by an e-store.
Decision Tree
Here are our training data:
Here is the decision tree:
Here are some test records:
[22 , high , no , fair , yes]
[45 , high , no , excellent , yes]
[32 , low , yes , excellent , yes]
How would our decision tree classify these records?
Python Practice: Reading from files
Download this excel
file. It contains information about the average number of children per
woman in many different countries for the years 1989 and 2009
Before we use the data in any data mining or visualization
procedure, we usually want to correct them, purge them or even transform
them into something new. As an example to that, the dataset you downloaded
has some missing values. One way to cope with them is the following: If
only one number is missing (i.e. for either 1989 or 2009 we don't have any
statistics for that country), give it the value of the other year. If both
are missing, do not include them in the final dataset.
- First, open this file in excel and save it as .csv (comma separated
values)
- Write a python program that does the preprocessing that we described
before
- Write the result to a new file
- Upload this file to Many-Eyes and
see what visualizations you can create to depict this information
CS105 CS105 Labs
|