Supercharge Your GLMs with Scalable Feature | Qatabase

Supercharge Your GLMs with Scalable Feature

Created by Praveen Polu in Oracle Database 8 Feb 2025
Share

In the world of predictive analytics, there's a constant tug-of-war between data richness and model efficiency. We crave vast datasets, teeming with features that promise deeper insights. Yet, all too often, we find ourselves wrestling with "the curse of dimensionality," where extra information actually hinders our ability to build effective, reliable models. Oracle Data Mining (ODM) offers a potent solution to this challenge through its Generalized Linear Models (GLMs) coupled with scalable feature selection. Let's explore how to build and select better features.

This article serves as your field guide to navigating the complex landscape of feature selection within Oracle Data Mining's GLM framework. We'll journey together through the following crucial topics:

  • Understanding the problem and addressing the curse of dimensionality
  • Explanation of scalability with the PL/SQL language that ODM provides
  • Analyzing models and their results in the real world
  • Implementing code and error handling
  • Deployment strategies


The Feature Flood: When Abundance Becomes a Burden

The modern data landscape is often characterized by an embarrassment of riches. However, a great amount of data can be problematic. Each feature adds to the noise and complexities, increasing the model's risk of poor performance. The more features there are, the more time is needed to develop models, especially if the data is unclean, necessitating additional preprocessing. Complex relationships can lead to inappropriate models with high noise.

Taming the Chaos: GLM Feature Selection to the Rescue

This is where Oracle's features shine. By using Oracle's robust and in-built feature selection methods, you can build the most accurate and efficient models. You can select and identify the most relevant features, thereby creating high-quality data models. Here are the key steps:

  1. Data Cleaning: Ensure the data is free of errors, testing for various inconsistencies.
  2. Model Configuration: Run models with different configurations to determine the best setup.
  3. Connection and Settings Testing: Verify setup accuracy.
  4. Table Creation and PL/SQL Procedures: Implement models using PL/SQL to find and create high-quality models.
  5. Model Testing and Implementation: Begin the process of data mining by testing and implementing models.

Putting Feature Selection to Work: A Hands-On Example

Let's walk through a practical example of building a GLM with feature selection. We'll use PL/SQL to define model settings, build the model, and examine the selected features.

Step-by-Step Implementation

  • Step 1: Create the Settings Table


CREATE TABLE glm_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(4000)
);

  • Step 2: Configure Feature Selection


Enable automatic data preparation (recommended)

  • INSERT INTO glm_settings (setting_name, setting_value)
    VALUES (DBMS_DATA_MINING.PREP_AUTO, DBMS_DATA_MINING.PREP_AUTO_ON);
    • Enable feature selection:
  • INSERT INTO glm_settings (setting_name, setting_value)
    VALUES (DBMS_DATA_MINING.FEAT_SELECTION, DBMS_DATA_MINING.FEAT_SEL_ON);
    • Choose a feature selection criterion (e.g., RIC)
  • INSERT INTO glm_settings (setting_name, setting_value)
    VALUES (DBMS_DATA_MINING.FEAT_SELECTION_CRITERION, DBMS_DATA_MINING.FEAT_SEL_CRIT_RIC);
    • Optionally, set a maximum number of features
  • INSERT INTO glm_settings (setting_name, setting_value)
    VALUES (DBMS_DATA_MINING.FEAT_MAX_FEATURES, '50');
    • Enable GLMS Feature Prune
  • INSERT INTO glm_settings (setting_name, setting_value)
    VALUES (DBMS_DATA_MINING.GLMS_FEATURE_PRUNE,'TRUE');
    • Commit changes
  • COMMIT;


Step 3: Create and Build the GLM Model

BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'my_glm_model',
mining_function => DBMS_DATA_MINING.REGRESSION, -- Or DBMS_DATA_MINING.CLASSIFICATION
data_table_name => 'my_training_data',
case_id_column_name => 'customer_id',
target_column_name => 'target_variable',
settings_table_name => 'glm_settings'
);
END;
/

  • Step 4: Test with Prediction


SELECT PREDICTION(my_glm_model USING *) FROM test;

Navigating Errors and Building Success

During model creation, you might encounter messages like:

  • Success Message: "Model "MY_GLM_MODEL" completed."
  • This indicates your model was successful.
  • Error Example:
  • ORA-20000: Mining: Invalid setting name: ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
    ORA-06512: at "SYS.DBMS_DATA_MINING", line 2921
    ORA-06512: at line 2.


  • Solutions:
    • Check for invalid parameters.
    • Verify the connection and setup.
    • Ensure the test runs correctly.
  • High GLM Error: ORA-12801: error signaled in parallel query server P000

By employing these techniques, you are prepared for various scenarios, ensuring the development of high-quality models.

Additional Resources

  • Oracle Data Mining Concepts Guide
  • Oracle Data Mining PL/SQL API Reference


If you've read this far, you're ready to harness the power of Data Mining! Share and explore new possibilities. What are you hoping to discover? This article is meant for educational and demonstration purposes and reflects one writer’s experience. It's crucial to seek the best methods for your use case. Remember to consult the official documentation and experiment with various algorithms and settings to achieve optimal results.

If you read this far you must be ready to jump into the power of Data Mining! Share and Explore new possibilities. What are you hoping to find with this?

This article is meant for educational and demonstration purposes and reflects one writer’s experience with it. You must seek the best methods for your use case.

Remember to consult the official documentation and experiment with it to achieve the best results.

Share

Share this post with others