Online Learning From Incomplete and Imbalanced Data Streams
Interesting online learning paper that solve the whole problem using pure math.
Nice intro to the whole field.
Conclusion
- For incompelet features: divied x_t to x_t^v (vanished), x_t^c (common), x_t^n (new) , update their weights w_t^v, w_t^c, w_t^n and confidence p_t^c … …
- For imbalanced data streams: use dynamically adjusting param c_t to avoid bias toward the majority class
- For model sparsity: use L1 ball projecting
Pros and Cons
Pros:
- Handle the full task using basic mathmatic tools
- Doing well in the experimental results
- Has theoretical performance guarantees
Cons:
- Caculation is complex, in some really fast data stream scenario it may failed
- Hyper params may affect the algo’s performance greately, though this method has a theoretical performance bound, it still depends on C and c_t
- Only fits -1/+1 classification
- Not compared with deep learning methods
Definitions
Data stream , is a vector of dimensions, is the true label of . Each time step you can only observe one pair
For incomplete data stream D, define be the feature set that is carried by .
Let be the universal feature set till iteration . Some , the missing feature ratio .
Solutions for Incomplete Feature Spaces
For imbalanced data stream, define as the number of majority class instances of y, for minority class. D has .
Optimize Object:
Here represents learner, is learner’s weight, as loss function, as regularization term. is the latent representations of .
Define confidence on a common feature space , as informativeness of the -th feature, calculate by Var.
Use hinge loss .
Use KKT method
To keep model change slightly, use a soft-margin:
. Here is soft margin.
Define Lagrangian function
Solve it to get:
Solutions for Imbalanced Data Stream
Create a dynamic cost c_t
Here is scaling param.
Apply to loss function, we finnally get:
Solutions for Model Sparsity
L1 ball projection, dot product weight and uncertainty
denotes the relative uncertainty vector of the universal feature space at the th iteration, which is composed of the informativeness of all the features that have been observed. is a regularization parameter.
Concepts Conclusion
1. Concept Drift
- Definition: Concept drift refers to the change in data distribution or the target variable over time.
- Explanation: In online learning, when data evolves, models trained on older data may become outdated. Models need to adapt to changing data distributions.
2. Online Learning
- Definition: Online learning is a machine learning paradigm where the model learns incrementally from incoming data points, rather than from a fixed dataset.
- Explanation: Suitable for large, continuous, and evolving data streams. The model updates continuously as new data arrives.
3. Imbalanced Dataset
- Definition: A dataset where the classes have significantly different numbers of samples.
- Explanation: In imbalanced datasets, models may favor the majority class, neglecting the minority class. Solutions include weighted loss functions and resampling techniques.
4. Hinge Loss
- Definition: A loss function used in classification tasks, especially in Support Vector Machines (SVM), that encourages correct classification with a margin.
- Formula:
- Explanation: Ensures that samples are correctly classified and also that the distance to the decision boundary is maximized. It penalizes misclassifications and small margins.
5. Lagrange Multipliers
- Definition: A method used to solve optimization problems with constraints by incorporating the constraints into the objective function.
- Formula:
- Explanation: By introducing a new term (Lagrange multiplier ( \lambda )), it turns a constrained optimization problem into an unconstrained one, making it easier to solve.
6. Hessian Matrix
- Definition: A matrix of second-order partial derivatives that describes the curvature of a function.
- Formula:
- Explanation: Used to understand the shape of a function. A positive definite Hessian indicates a local minimum, while a negative definite Hessian indicates a local maximum.
7. G-Mean (Geometric Mean)
- Definition: A metric used to evaluate classification performance, particularly in imbalanced datasets. It is the geometric mean of sensitivity and specificity.
- Formula:
- Explanation: G-Mean balances performance on both positive and negative classes, making it ideal for imbalanced data where both types of classification errors are important.
8. Dynamic Cost
- Definition: A cost that is adjusted dynamically based on the distribution of classes in the data stream.
- Formula:
- Explanation: The dynamic cost is used to adjust the weight of different classes during learning. For imbalanced data, the cost for minority class samples is increased to ensure they receive adequate attention.
9. Feature Evolvable
- Definition: The concept that the feature space may evolve over time, with new features emerging or old features becoming irrelevant.
- Explanation: In real-world scenarios, feature distributions may change, requiring the model to adapt and select the most relevant features dynamically.