Classify four imagined objects with EEG signals

EEG signals contain information directly related to cognitive activity. This paper presents a method to classify the images a person imagines via the information provided by the EEG signals. The images relating to the objects ‘tree’, ‘house’, ‘plane’ and ‘dog’ have been reconstructed. We have used a convolutional neural networks to obtain the reconstruction of the images and a genetic algorithm to find the parameters of this network. The results obtained have been evaluated by means of a Chebychev metric to compare the images, and it shows that the reconstruction is performed with a success of 57% over chance, with an accuracy in the classification of 60% and a kappa value of 0.40, demonstrating that the classification of five mental states where four of them come from the visual imagery, is possible.

However, there are few studies related to visual imagery, and even fewer related to image reconstruction based on EEG signals [16]. Nonetheless, some studies of great importance should be pointed out, as for example the case in which the reconstruction of faces [17] is conducted based on EEG signals, or others that focus on classifying imagined objects [16,18,19,42]. In this context, visual imagery can be understood as the ability to generate images not from perception but from memory [20]. This capability has been studied using different systems such as functional magnetic resonance imaging, but the use of this cognitive capacity has not yet been extensively explored in the context of BCI systems.
On the other hand, it was brought to our attention the work by Rami et al. [42], which is one of the most recent works on the classification of visual imagination. In the manuscript, it has been carried out an extensive study, reaching some of the same conclusions as this work, but in different ways. The authors also have been able to show that it is possible to classify the visual imagination, and that it could be used to expand the boundaries of BCI systems. In our work we have used convolutional networks together with genetic algorithms, and we have proposed a different form of classification based on the reconstruction of the images from the EEG signals.
However, in spite of this extensive research, the classification and consequently the reconstruction of visual imagery is not a task that has been solved, and not all studies have obtained positive results when classifying the visual imagery. For instance, in the work [16], the classification of visual perception and visual imagery is studied, but the classification problem is not resolved and the door is still open to continue studying the classification of visual imagery.
In this study we have reconstructed visual imagery using EEG signals and implementing Deep Learning (DL) techniques and genetic algorithms (GA) in order to create a functional BCI system able to help in future creativity-related applications, such as CAD systems. In Fig. 2 we can see a scheme of the pipeline used to achieve the described objective.

Overview
The objective of this work is to carry out a reconstruction similar to the one in the work [21], where a reconstruction of imagined or visually perceived images is performed. The main objective is to use fMRI technology to record brain activity and use deep learning techniques for the reconstruction of the imagination and visual perception. The whole convolutional system is fed with the extraction of characteristics described in the work [42]. See Fig. 1.
Although the work [21] demonstrates the possibility of reconstructing visual perception and imagination, from their results it was not clear if something similar could be carried out using EEG signals. This is why in this work we aim to demonstrate that it is possible to do the same through EEG signals and to be able to perform non-invasive BCI systems that can use the cognitive capacity of imagining images.
To carry out our objective, electrodes positioned on the occipital area have been used to record the EEG signal. Once recorded, we extracted the Power Spectral Density (PSD) from the signals, and we converted the resulting array into an image to be processed by a Convolutional Neural Network (CNN). This type of networks have been widely used in the processing of images to find spatial relationships in the signals. However, when using neural networks, one is faced with the problem of not knowing a priory what neural network structure we should use. For this purpose, we used the Genetic Algorithm (GA) technique to find a suitable configuration that allows to Fig. 1 An overview of a geneic deep image reconstruction pipeline, as described in [21]. The pixel values of the input image are optimized so that the Deep Neural Network (DNN) features of the image are similar to those decoded from fMRI activity. A deep generator network (DGN) is optionally combined with the DNN to produce natural-looking images, in which optimization is performed at the input space of the DGN. Reproduced with permision Fig. 2 Schematic description of the study presented in the paper: After reading a word on screen, the user imagines the corresponding object while we record the corresponding EEG signal. This signal is processed by a Convolutional Neural Network optimised by a Genetic Algorithm, to reconstruct the object imagined by the user perform the task of reconstruction of the images from the EEG signals in an efficient way. See Fig. 2.

Materials and methods
In this work we used the g.Nautilus device, developed by the g.Tec company. This device has eight wet electrodes, which we positioned following the international 10-10 system at the P3, P4, PO3, POz, PO4, PO7, PO8 and Oz positions, with the AFz location as reference. We have chosen these electrodes because they obtain the best results when classifying visual imagery [18]. All electrodes had an impedance lower than 10kΩ . The sampling rate of the device was 250 Hz and the resolution was 16 bits.
Four people took part in our study (three men and a woman). The average age was 43.5 ± 8,2 years. From a statistics point of view, it is known that four subjects do not provide a significant sample, but it is also well known that registering large numbers of people is not easy. Also, we can observe that the results between the four subjects are very similar, which allows us to say that these results can probably be extended to more subjects. This work is an initial work to find out if it is possible to use visual imagery for the construction of BCI systems and to see if convolutional neural networks are a good strategy for it. We believe that the results shown are important enough to be shared by the scientific community and that, put together with other works, will bring attention to the use of visual imagery for the construction of BCI systems and, in this way, extend the capabilities of pattern recognition.
The EEG data was registered in a room with dim ligths and away from external noises. At the time of the registration, the person was alone, so no noises or movements could distract the subject from whom the EEG signals were registered. The subject stood in front of a 17" monitor, about 80 cm from it. We asked subjects to be as comfortable as possible and to avoid making any muscle movement, such as moving their feet or their hands during the registration process.
After placing the g.Nautilus device and calibrating it to lower the impedances, we registered the EEG signals with the protocol shown in Fig. 3: during the first two seconds, a white cross on a black background appeared on the monitor. This indicated that the trial was going to start. Then, it appeared the word (in Spanish) describing the object to be imagined during 3 seconds. Immediately after, the monitor turned black during 10 seconds, when the subject had to imagine the object. This procedure was repeated 40 times for each object.
We used Matlab by MathWorks, EEGLAB toolbox [22] and the Keras library [23] for signal processing and for programming the deep learning arquitectures. We used the OpenVibe [24] software to present the stimulus employed. We used the Keras library because it is a deep learning open-source library, it is user-friendly, and is extensively employed in research. We used the OpenVibe software for the same reasons, because it is open-source, it has a great number of features ready to use, and it has a graphic environment which allows quick and simple creation of prototypes and record protocols of EEG signals.

Preprocessing
Once the signals are registered, they have to be pre-processed to eliminate corrupted data. First, we applied a Notch filter between 48 and 52 Hz to eliminate the artifacts produced by the power line. Right after, we applied a sixth order band-pass filter between 0.01 and 100 Hz (filter implemented on the g.Nautilus device itself). Once we have filtered the registered signals, we partitioned them into segments of one second and applied a baseline elimination and a fourth order Butterworth filter between 1 and 40 Hz. We have used a CAR filter to eliminate muscular artifacts [25]. Then, the data was partitioned into segments of one second, but we have only considered the last 10 seconds of the trial (the part where the subject imagines the object). However, we have omitted the first second of these parts because it could still contain information about the visualisation of the word. This way we ensure we are detecting exclusively the part of the visual imagery without any interferences from the visual perception.
To get the characteristics (a set of representative data from the signal), we have extracted the spectral power between 8 and 12 Hz using Welch's method, which is widely used in many BCI studies. The frequency range selected is where other studies using visual imagery have obtained their best results [16,18].
Once we have obtained the features of the signals, we classified them to get the probability vector. This vector will indicate the odds of an EEG signal pertaining to one of the five categories we have. Once we have this vector, we multiply each probability P i by its corresponding image I i , to make a lineal reconstruction of the image I ′ i : with as we can see, the more probable the image, the more weight it will carry in the final reconstruction.

Convolutional neural networks
A Convolutional Neural Network (CNN) is a type of feedforward neural network that is widely used for recognising images [26][27][28][29], where its use has provided very good results. These networks have also been used to classify EEG signals with promising results [30][31][32]. In the study [33], they have used CNN networks too to reconstruct the images based on visual perception. In this work we have converted the feature vector of the EEG signals, of dimension 40 (the frequency range consists of 5 frequencies and there are 8 channels), into an 5x8 matrix, and this matrix is the input of our CNN network (Fig. 4). By transforming the signals into images, we can represent the frequency information regarding the electrode and, in this way, find spatial patterns in the frequencies.
CNN networks are composed of a set of layers that are called convolutional, because these layers are intended to apply the convolutional operator to the input. A convolution is a linear operation that involves the multiplication of a set of weights by the input, where basically a filtering operation is done, see  4 Transformation of the EEG signals in images for the input of the CNN networks. Here, three different trials are observed. On the x-axis there is the spectral power for the frequencies from 8 to 12 Hz, and on the y-axis there are the channels used mathematical formula for performing a 2D convolution operator from an input x and a filter y. See Fig. 5.
The number of filters that must be applied is not predefined, and it is not an easy value to find because, depending on the type of input image and how complex it may be, more or less convolutional layers will be needed. On the other hand, it is also important to determine the size of the filters to use.
Pooling is another important layer in CNN networks, which aims to summarise the characteristics of a given region, and thus reduces the resolution of the input. Two pooling operators are usually used, maximum, that computes the maximum value of a given window; and average, that computes the average of the values of the window. Once we have applied the set of convolutional and pooling layers, we use a conventional neural network to perform the classification. In our work we have used one where the last layer is a softmax layer that returns a vector of probabilities, where the vector dimension coincides with the number of classes (5 in our case). The probabilities that the softmax function has are the probabilities that the network assigns to each class, that is, the probability that it is the class associated with the input. See Eqs. 1 and 2.
Once we have the model for each subject, we multiply each value of the probability vector by its corresponding image, as seen in Eq. 1. ( CNN networks were initially designed for image classification. Therefore, their grid access are matrices. That is why we have had to adapt the obtained vector to a matrix in the characterization stage of EEG signals using a reshape. Our characteristics vector is formed by 40 elements (5 frequencies for each channel).
CNN networks contain many parameters. Depending on the kind of parameters we use the network will give a more or less correct answer. The parameters we have considered are the following: the size of the filters to use, the type of function for each layer, and the quantity of neurons for each layer. To select these parameters we have chosen a genetic algorithm, mainly because they have already been widely used in studies related to neuronal networks [34][35][36], providing good results. See the next Section.
Finally, it is worth mentioning that we have built a CNN network for each subject, mainly because finding a generalized pipeline for all subjects does not fall within the scope of this study, although it can be interesting to take into account for future related studies. This point will be commented in the Discussion Section.

Genetic algorithm
Genetic algorithms are optimisation algorithms, usually used to perform searches. These algorithms are based on the theory of Charles Darwin's evolution. According to this theory, the individuals that most adapt to a certain medium are those that are more likely to transmit their genetic information to Fig. 5 Schematic diagram of the convolutional layer. A 3 × 3 convolutional filter is applied to the input matrix the following generations and in this way transmit certain genetic traits that are good for a given medium.
Supporting this concept, genetic algorithms allow to perform a search for a solution within a given search space. While these algorithms cannot ensure we will find the best solution, they can guarantee a convergence towards an optimal solution. This type of search algorithms have certain advantages and disadvantages. As advantages we can find: • hey do not need knowledge about the problem to solve. • They are highly parallel algorithms. • They are easily implementable.
And, as disadvantages, we can find the following: • They may never get to converge on an optimal solution. • They can fall in local optima. • They may need a lot of computational resources.
Chromosome definition: The fundamental concept on which this type of algorithm relies is on the chromosome, which is an encoding of a possible solution to the problem. Chromosomes (individuals) will be evaluated, sorted, and part of their information will be transmitted to the next generation.
The main problem when using GA is to make a representation of the possible solutions in chromosomes, each chromosome encoding a different neural network. We have chosen to encode the topology of the network, where each chromosome represents the structure if a CNN network, and is represented by a vector of integer numbers: The first position of the chromosome array indicates how many layers of neurons the structure will have, the numbers that follow indicate the number of neurons per each layer, and then the activation functions for each layer. In this work, we used four different activation functions: elu, relu, tanh and sigmoid. For a more detailed explanation of each function, the interested reader can consult the Keras library documentation [23].
Population: The number of individuals that we create is a variable with which the user can select. As logical, the more individuals within a population, the larger the search space, but if we have fewer individuals, the search space is located in a very closed area and this may cause convergence problems towards an optimal solution.
We create a random initial population, where the chromosomes can obtain only bounded values. For instance, the gene of a chromosome cannot have a negative value, therefore some restrictions were imposed, such as that the number of neurons was in the interval [1,100], and that the number of layers should be between 1 and 15. These are parameters that can be modified depending on the computing capabilities available, as an increase in the number of layers and neurons implies more computing power. In the whole process of the genetic algorithm, we kept the populations constant to 15 chromosomes.
Operators: In the genetic algorithms there are two types of operators to consider. A first operator is crossover and the second one is mutation.
Crossover: The purpose of this operator is to generate new chromosomes from two input chromosomes. Usually, the best chromosomes of a population are selected for the operation. The aim of this operator is to introduce changes in the population trying to maintain the properties of a given chromosome that make it better. It is the most difficult operator to define, since neural networks are systems highly sensitive to changes, that is, adding a neuron or changing a single weight can generate a drastic change in the behaviour of the network. We have defined the crossover operator in the following way: given two chromosomes, our operator generates two different children. First, from the first chromosome we take the part that encodes the number of layers and neurons per layer. On the other hand, the part of the gene that mark the batch, the number of epochs, the loss function, and the optimiser are taken from the second chromosome. For more information about loss and optimizer functions, we recommend the interested reader to consult [23]. For the second child the order is reversed. Figure 6 illustrates this process. To choose which chromosomes will be used for this operation, the roulette method is used, where each chromosome has a probability of being chosen in relation to its fitness value, the higher the fitness value, the more likely to be chosen.
Mutation: The second operator is the mutation operator, which randomly modifies a gene from within the chromosome. This operator allows jumping in the search space and thus avoid local minimums or maximums. The mutation rate that has been used is 0.2, i.e., there is a probability of twenty percent of a chromosome mute some of its genes. In our representation we introduced a specific restriction when it comes to mutation: we do not allow changes in the number of layers of a neural network, since this would imply deeper structural changes or the disappearance of the chromosome because it would not be valid (a genetic incoherence).
Evaluation: To evaluate how good a chromosome is, we reconstruct the CNN network represented by the chromosome Fig. 6 Description of the crossover operation and we test it with a set of EEG data to classify five mental states (imagination of "Tree", "House", "Dog" and "Plane", as well as the relaxation state). As a results, we obtain the classification accuracy, which we multiply by Cohen's Kappa:

Cohen's Kappa
Cohen's Kappa is a statistic indicator that has a range between -1 and 1, and that is used to assess the classification in various BCI studies [37,38]. We use it to reward those chromosomes with not only a high accuracy value, but also with a representative value, avoiding to reward a chromosome just because it may be biased and it may tend to classify only a single category. The accuracy is defined as the percentage of correct classifications: where p 0 is the relative observed agreement and P e is the hypothetical probability of chance agreement. Values of greater than 0 mean that the classifier can be considered better than chance. Landis and Koch [37] give an interpretation where a value between 0.0 ∼ 0.20 as mild, 0.21 ∼0.40 as fair, 0.41 ∼ 0.60 as moderate, 0.60 ∼ 0.80 as substantial and 0.81 ∼ 1 as an almost perfect agreement.
During the process of generation and evolution of chromosomes, we save the best one in a variable to create the model of the CNN network. Once we have got the model, we assess its efficacy using a set of test data which has not been previously used in the training stage. The obtained results can be seen in Table 1 and are explained in the next section.

Results and discussion
The results have been obtained in offline, executing the genetic algorithm to find the best CNN network. To find the best convolutional network, the PSD is extracted in the range (this forms part of the features used for our EEG signals) and 15 chromosomes are generated that encode different topology of (4) fitness = Accuracy ⋅ convolutional networks. Then, the fitness function is calculated for each chromosome. The network is trained with a train set and tested with another, disjoint set of tests, and the resulting value is the one that we will use as fitness. Once the fitness value of the chromosome is calculated, we apply a selection of the elite for each generation, obtaining the three best chromosomes, and we iterate again. The whole process it is repeated until we reach 25 generations, and we keep the chromosome with the best fit of the last generation.
The results of this process can be observed in Table 1, where the accuracy obtained and the Kappa value of the classification of the five mental states are reported.
In Fig. 7 we can see the topographic maps of user B, who is the one that obtains the best classification results. In the figure We can see the differences in the distribution of the frequency power, in the range of 5 to 12 Hz, observing differences between the different tasks in the PO3, PO8, POz and Oz electrodes.

3
From the topographic map, we can observe the most difficult tasks to identify are relaxation and "house", as very similar maps are obtained. The same happens with "plane" and "dog". It is useful to determine which objects should be used together and which ones should not, if our goal is to create BCI systems based on visual imagination. For this reason, we should use images that have different frequency distributions. Figure 8 shows the average of the evolution of the value of the fitness function in relation to the iteration (generation) of the genetic algorithm, as the algorithm aims to maximise the fitness function.
To assess the quality of the reconstructions of the system, we have decided to use a Chebychev metric [39,40], Equation 6, to be able to compare the output image (reconstruction) with the real image, which represents the word the subject had to imagine: where x and y are the vectors that represent the images. This metric can be derived from the Euclidean distance, but it has a slower processing time. Before applying the Chebychev metric, we have normalised the images to the [0, 1] range. If we obtain a value close to 0, it means the images are very similar. However, if the value is close to 1, it means the images are very different.
On Table 2 we can see the results obtained when applying the metric, shown in Eq. 6, to the reconstructed images. In an ideal model, with infinite tests, we would have got a value of 1 for the random distance, but in practice, for a finite number of tests, this is not the case [41]. To be certain that we are reconstructing correctly, we have calculated the real value of a random selection. In order to do that, we have calculated 100 times the distances if these measurements were completely random, and we have averaged the result for all subjects. The obtained result has been of 0.70. We have taken this value as a reference to assess the system performance with each subject. As we can see, subject A obtains a result  . 9 Results of diverse image reconstructions of 0.63, meaning the system doesn't work since the metric indicates it is close to the random value. However subject B obtains a result of 0.16, which is a value close to 0 and distant from 0.70. Therefore, the system reconstructs correctly with the data of this subject. The average of the four subjects is 0.30. This value is a 57% below chance. Therefore, we can say that, in general, the system reconstructs images correctly. At the confusion matrix in Table 3, we can observe how the classifier behaves. As we can see, the matrix is practically diagonal, showing it has good predictive results, with an average of 60% success, although the class "plane" cannot be classified correctly because it is confused with the class "dog".
In Fig. 9 we can see some results of the system, where we can see some examples of good reconstruction, such as the case of the dog or the tree, and cases where the reconstruction is bad, as in the case of the house.
As we can see, the results obtained are satisfactory to perform a classification of five mental states, since the accuracy is high and the kappa value indicates that the results obtained are above chance. It should also be noted that, with these results, the utility of using convolutional networks for the classification and subsequent reconstruction of imagined images has been proven in practice and that the genetic algorithm shows to be a very useful tool when looking for the best topology to train a neural network.

Conclusions
This work has investigated the possibility of reconstructing the visual imagery from the EEG signals and using CNN networks, demonstrating that these networks are very useful as they have an adequate classification rate. Five different mental states have been used and results of 60% have been obtained, with a level of reconstruction that is above chance. This shows that this technique can be used in the future for imaginationbased BCI systems. This work also provides an initial assessment about the possibility of classifying the visual imagery and using this cognitive ability in BCI systems.