Supercharge Your GLMs with Scalable Feature | Qatabase
Limited Time
Limited Time
Get all courses with 100% off without any limitation
View Courses

Supercharge Your GLMs with Scalable Feature

Created by Praveen Polu in Oracle Database 8 Feb 2025
Share

In the world of predictive analytics, there's a constant tug-of-war between data richness and model efficiency. We crave vast datasets, teeming with features that promise deeper insights. Yet, all too often, we find ourselves wrestling with "the curse of dimensionality," where extra information actually hinders our ability to build effective, reliable models. Oracle Data Mining (ODM) offers a potent solution to this challenge through its Generalized Linear Models (GLMs) coupled with scalable feature selection. I will demonstrate the ways to build and select better features.

This article serves as your field guide to navigating the complex landscape of feature selection within Oracle Data Mining's GLM framework. We'll journey together through the following crucial topics:

  • Understanding the problem and addressing the curse of dimensionality;

  • Explanation of scalability with the PL/SQL language that ODM gives;

  • Analyzing the models and its results in the real world;

  • Implementing code and error handling;

  • How to deploy.

The Feature Flood: When Abundance Becomes a Burden

The modern data landscape is often characterized by an embarrassment of riches, but a great amount of data can be a problem. Every single feature only adds to the noise and the complications, which increase the model and risk it performing badly.

The effect of having a great amount of features is that more and more time is needed to make the models. This is especially important if the data's are not clean and it requires to perform in the data base. Lastly and most importantly, relationships can become too complex and this may lead to models that are inappropiate with high noise.

Taming the Chaos: GLM Feature Selection to the Rescue

Here is where Oracle's features shine. By using Oracle's robust and in-built feature selection methods, you can build the most accurate and efficient models. You're the one to select and identify the features. By doing this the key steps to create high quality data are achieved.

  1. You begin by cleaning the data so there are no errors. You must test all the different errors. Run the models for different configurations to understand more what is best. The appropriate actions are taken to build a model.

  2. In your model, test your connection and settings. You must ensure they're setup.

  3. Create tables and build your PL/SQL procedures. This is where all models can be implemented. With this you can find and create all the models with high quality.

  4. Now that you are all setup, begin to test by creating tests of the models and implement models. Now the magic starts with data mining. You begin the process.

Putting Feature Selection to Work: A Hands-On Example

Let's walk through a practical example of building a GLM with feature selection. We'll use PL/SQL to define the model settings, build the model, and examine the selected features. Here's how to do it. You are creating the data and are creating tables and then, using it for your own models.

-- Step 1: Create the settings table

CREATE TABLE glm_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(4000)
);

-- Step 2: Configure feature selection

-- Enable automatic data preparation (recommended)
INSERT INTO glm_settings (setting_name, setting_value)
VALUES (DBMS_DATA_MINING.PREP_AUTO, DBMS_DATA_MINING.PREP_AUTO_ON);

-- Enable feature selection:
INSERT INTO glm_settings (setting_name, setting_value)
VALUES (DBMS_DATA_MINING.FEAT_SELECTION, DBMS_DATA_MINING.FEAT_SEL_ON);

-- Choose a feature selection criterion (e.g., RIC)
INSERT INTO glm_settings (setting_name, setting_value)
VALUES (DBMS_DATA_MINING.FEAT_SELECTION_CRITERION, DBMS_DATA_MINING.FEAT_SEL_CRIT_RIC);

-- Optionally, set a maximum number of features
INSERT INTO glm_settings (setting_name, setting_value)
VALUES (DBMS_DATA_MINING.FEAT_MAX_FEATURES, '50');

--Step 3 Enable GLMS FEATURE PRUNE

INSERT INTO glm_settings (setting_name, setting_value)
VALUES (DBMS_DATA_MINING.GLMS_FEATURE_PRUNE,'TRUE');

COMMIT;

-- Step 4: Create and build the GLM model

BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'my_glm_model',
mining_function => DBMS_DATA_MINING.REGRESSION, -- Or DBMS_DATA_MINING.CLASSIFICATION
data_table_name => 'my_training_data',
case_id_column_name => 'customer_id',
target_column_name => 'target_variable',
settings_table_name => 'glm_settings'
);
END;
/

-- Step 5: Now, test with PREDICTION

SELECT PREDICTION(my_glm_model USING *) from test;

Navigating Errors and Building Success

During the creation of a model, you can be stopped with these messages:

      *    "Model \"MY_GLM_MODEL" completed.";*

This means your model is successful.

A negative example and what to look for when you make a mistake is:

 ORA-20000: Mining: Invalid setting name:  ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.DBMS_DATA_MINING", line 2921
ORA-06512: at line 2.

When things come crashing down, look into these solutions:

  • You used invalid parameters.

  • There was the wrong connection and setup.

  • The test did not run correctly.

  • Also, if you use GLM too high you will see: ORA-12801: error signaled in parallel query server P000

By using these techniques you are prepared for all events that will happen and build a higher quality model.

Additional Resources/ Refrences 

If you read this far you must be ready to jump into the power of Data Mining! Share and Explore new possibilities. What are you hoping to find with this?

This article is meant for educational and demonstration purposes and reflects one writer’s experience with it. You must seek the best methods for your use case.

Remember to consult the official documentation and experiment with various algorithms and settings to achieve the best results.

Share

Share this post with others

Sales Campaign

Sales Campaign

We have a sales campaign on our promoted courses and products. You can purchase 150 products at a discounted price up to 50% discount.