Wednesday, March 21, 2012

Excel 2007 Data Mining Add-in Advance Create Mining Model Question

Hi,

I am trying to model data in analysis services with the Advance Create Mining Model function in the excel addin. I am having trouble creating an association model that works like the Associate button above the Advanced button.

The format of my data is like this

OrderID Product

100 Bike

100 Helmet

100 Shoes

200 Helmet

200 basketball

200 Bat

300 Shoes

300 Socks

The associate button works perfectly since it asks me which column is the transaction id (orderid) and which column I am trying to predict (product). The advanced create mining model asks me to determine what the columns are...

OrderID=key Product=Input+Predict?

When I run the advance create mining model associate, I get a browser that gives me no rules and the support for only one item itemset (each product but no combination of products).

Does anyone know what I have to do to get it to work like the associate button?

The Associate task performs a trick in defining the mining model. For data like yours, the model is defined as:

(

OrderID TEXT KEY,

Product_Table TABLE PREDICT

(

Product TEXT KEY

)

)

(it contains a nested Product_Table column, which contains all the products for one given order ID).

After that, in training, the Excel table is joined with itself to generate the shape required for model, which is :

OrderID Products (table)

100 Bike

Helmet

Shoes

The Advanced Create Mining Model functionality does not allow modeling of nested tables. Consequently, what you get by running the Advanced task is a model like:

(

OrderID TEXT KEY,

Product TEXT DISCRETE PREDICT

)

which cannot find any interesting rules.

Here is an easy workaround for this problem:

- start by using the Associate task to create a model with the appropriate modeling (nested table)

- use Advanced\Add Model To Structure to add a new model in the same mining structure (this allows you to customize your model, set different parameters etc.) on top of the same data as the first model.

- using the Manage Models task, delete the original model and use the newly created one

Alternatively, you could use DMX statements to define and train the model. The DMX statements would have to be entered manually (choose Query\Advanced then Edit Query to get to manual editing of the DMX statement). Please let me know if you need details for this

|||

Hello Bogdan,

I am still having difficulty connecting to analysis services data sourse. I have tried the easy workaround and I have done all of the steps above. I am a little confused as to what i should do after i have deleted the old model.

I went to manage models->Process this mining model with new data->Selected the data i wanted from analysis services

The dialog box prompted me to specify mapping between structure columns and input columns.

Mining Columns Table Columns

OrderID OrderID

Product_Table Choices: (OrderID, Product) No Product_Table

I tried only having the OrderID in the table column (leaving the other choice blank) and I ended up with "Error too many attributes for counting correlations"

Using OrderID and Product in the table column gave me: Error (Data Mining): INSERT INTO error: The '[Product_Table],[Product]' nested table key column is not bound to an input rowset column.

Can you give me the details for the DMX statements?

Thanks,

Davy

|||

First -- creating a new model in the same mining structure will process the new model with the structure data. There is no need to re-process it with new data (the model is already processed). Once you delete the (sibling) model generated by the Association task, you have the model you just created and you can start using it.

Now, the dialog for Process Model with new Data, like most of the features in the Excel add-in, is designed mostly for the tabular data in Excel (i.e. not for nested tables). If you need to process the model with new data, the easiest way is to run the workaround again (use Association task to get the model and structure, then add your own model to the structure, using the algorithm and parameters of your choice).

Do you want to get DMX statements to do this task manually?

|||

Dear Bogdan,

I am interested in using the association algorithm on data that has not been imported into Excel. The association algorithm is the only algorithm that doesn't allow me to mine data from a nested table on a server. The other algorithm buttons (Classify,Estimate,Forecast,Cluster) allow me to mine data that hasn't been imported into excel. (Will future versions of the add-in include this option for the associate algorithm?)

Right now, I am importing 1MM transactions into Excel (Due to the 1MM row limit) and using the association algorithm button to mine/browse the data. I want to test my hypothesis from this 1MM transactions with 100MM new transactions from a nested table in analysis services to confirm my hypothesis (My company has millions of transactions daily).

Is there a DMX code where I can input:

The Connection

The Command Query ie.
Select Top 1000000 "OrderID","Product","DateID"
From "dwMBA"."dbo"."vOrderItems"

and get the browse window (Support,Rules,Network)

I am unfamiliar with DMX

Davy

|||

Now I understand.

One solution is to create the model once using BI Developer Studio (the development tool coming with Analysis Services). Then, the model can be re-processed at any time even from Excel. You can use Managed to Clean the structure and reprocess the structure with original data. Further more, in your BI Development Studio solution you could:

- create a data source view with a named query which returns only the transactions in the last 10 days

(e.g SELECT * FROM vOrderItems WHERE DateID > DATEADD(day, -10, GETDATE()) )

- define a mining structure on top of this named query. The mining structure will use the named query as both case and nested table

- build an Association Rules mining model

- periodically, use Manage from Excel add-ins to clear the mining structure data (unprocess), then process with original data (effectively using the most recent data) or Browse to inspect the model's rules

Another possible solution, using only Excel add-ins: use the Associate task to create a model with necessary flags and parameters. After that, clear the mining structure/model and reprocess it using data from the relational database (any number of rows, because the data will nto be copied to Excel).

Here is the step-by-step solution

1. Make sure on your Analysis Services server, in the current database, there is a data source object pointing to your relational table (the one containing the transactions)

Creating such a datasource only happens once and establishes a connection between Analysis Services and your database. You could create such a data source object using the BI Dev Studio. If your database is SQL Server 2005, you could also create such a datasource from Excel add-ins. Anyway, let me know if you need help on this.

2. Just as you started, fetch a small sample of data (number of rows does not matter at this point) and copy this data in Excel. The data must contain, based on your query, at least the OrderID (transaction identifier) and Product (transaction item) columns

3. Use Associate to create an Association Rules model on top of your sample data. Let's assume you create a model named ARM in a structure called ARS.

4. Once the model is deployed on the server, use Manage Models\Clear this mining structure on the ARS structure to unprocess it.

5. Click Query

6. Click Advanced ... (bottom left of the dialog)

7. Click Edit Query (Click Yes when a warning asks you if you want to continue)

8. Paste the query below, after modifying it to include your data source name

INSERT INTO MINING STRUCTURE [ARS]
(

[OrderID],
[Product_Table]
(
SKIP,
[Product]
)
)
SHAPE {
OpenQuery([Adventure Works DW], 'SELECT OrderID
FROM vOrderItems ORDER BY OrderID') }
APPEND
( {
OpenQuery([Adventure Works DW], 'SELECT OrderID, Product
FROM vOrderItems ORDER BY OrderID') }
RELATE
[OrderNumber] TO
[OrderNumber]
) AS
[Model_Table]

Note the ORDER BY fragments on the relational queries (required for the statement to work correctly) and the SKIP keyword which indicates the statement that OrderID from the second query is only used to shape the rowsets

9. Click Finish

10. Choose New Worksheet or Existing Worksheet -- it does not matter (this query does not return results)

11. After the execution completes, use Browse to check your patterns

|||Do you have an email I can private message you?|||Sure: bogdanc at microsoft dot comsql

No comments:

Post a Comment