Wizards often face curses, so do witches and even regular people, unless you don’t believe in that kind of magic of course. For those of you who don’t - the mathematicians, scientists and realists, I present to you a different kind of curse, one backed by substantial evidence; the ‘Curse of Dimensionality’.
The curse of dimensionality (CoD henceforth), refers to a statistical phenomenon where local smoothing methods begin to perform poorly in escalation of multivariate data sets.
Simply, as the number of dimensions (features) in your data set increases, certain methods that rely on local smoothing begin to perform meagerly.
Suppose you were working with a dataset containing just 2 dimensions; height and weight. As you can imagine it would be fairly simple to analyse the variables. However, we often require multivariate datasets that contain a greater number of variables. Imagine a case where in addition to weight and height we were to add income, age, location, gender, etc. Such datasets become difficult to visualise and comprehend as the relationships between variables become blurred. This is because in high-dimensional spaces, data points and observations become more spread out and any implication of ‘closeness’ is long gone.
Since traditional methods struggle with accuracy, there is a new caution when working with high dimensional data. To gain a better understanding, we turn to discuss CoD in machine learning, a relatively advanced framework, but nevertheless one which grows in popularity daily as we enter an ever-evolving world of reliance on artificial intelligence.
Machine learning refers to the broad branch of artificial intelligence and computer science, focused on using data algorithms to develop AI in imitation of the way humans learn, aiming to improve overall accuracy. Machine learning models are built from datasets containing differing dimensions, models that contain higher dimensions, often have greatly accurate information to offer because the number of features within the dataset is comparatively higher. To reiterate, the model’s accuracy is increasing in the number of features. However, past a threshold value, the model’s accuracy begins to diminish. This is because the model becomes force fed too much information, and the AI becomes ‘incompetent’ at training using correct information as higher dimensions create equidistant separation amongst data points, abstracting from sample randomness, depleting all and any useful insights gained from the training of data.
If the model fails to generalise new data, it won’t be able to perform intended tasks, ultimately deeming the algorithm useless.
Though not all hope is lost. There remain few possible solutions to our shared dilemma of the CoD. The remaining minutes of this article aim to briefly introduce and review one possible solution that may solve CoD within machine learning.
Hughes Phenomenon
This concept suggests that as you increase the number of features used to train a classifier (a machine learning algorithm trained to assign input data to predefined classes), its overall performance tends to improve…initially.
The improvement duration occurs because a wider source of information provides the classifier with opportunities to capture complex patterns in data. The progression lasts until an optimum point is reached where increments of features no longer enhance performance, but begin to degrade it due to the existence of irrelevant information which may mislead the model entirely, or at least confuse it (known as overfitting).
Overfitting arises when a model fails to generalise unseen data, and so within the Hughes Phenomenon, the model may begin to memorise new information rather than ‘learn’ meaningful patterns from it, once the threshold value has been surpassed.
Whilst overfitting is a challenge in itself, a secondary solution exists within Hughes Phenomenon - the idea of dimensionality reduction.
Dimensionality reduction refers to the conversion of data from a high-dimensional setting to a low-dimensional one. The main intention behind this is to enable the low dimensional space to hold significantly equal properties of data, which are almost identical to the data’s natural dimensions. By reducing the number of features in the dataset, whilst simultaneously retaining as much critical information as possible, the performance of the learning algorithm can be somewhat preserved.
There doesn’t exist one right way to complete dimensionality reduction as several techniques have been successfully developed, and although we won’t go into detail about them, it may satisfy your curiosities to note some down to research in your own time. The following remain widely used techniques in machine learning dimensionality reduction: principal component analysis (PCA), singular value decomposition (SVD), and linear discriminant analysis (LDA).
Generalisation of machine learning models becomes difficult as features of a data set expand, demanding greater use of computational efforts for the algorithm’s processing, and the briefly mentioned techniques above (which fall under the umbrella of dimensionality reduction), enable us to reach as accurate an analysis as possible with existing statistical tools at our disposable. Although it must be noted that the efficiency of the Hughes Phenomenon, and its corresponding dimensionality reduction, differs in success based on the type of machine learning model (machine learning classification vs machine learning regression) as well as the nature of application.
With the increase in the number of dimensions, the analysis and generalisation of a machine learning model becomes difficult. It demands more computational efforts for its processing. The solutions discussed above can depend on the type of machine learning model and the type of application, but nonetheless, are a step closer to comprehending the intricacy of the curse of dimensionality.