EXTENDING THE ROUGHNESS OF THE DATA VIA TRANSITIVE CLOSURES OF SIMILARITY INDEXES

One main assumption in the theory of rough sets applied to information tables is that the elements that exhibit the same information are indiscernible (similar) and form blocks that can be understood as elementary granules of knowledge about the universe. We propose a variant of this concept defining a measure of similarity between the elements of the universe in order to consider that two objects can be indiscernible even though they do not share all the attribute values because the knowledge is partial or uncertain. The set of similarities define a matrix of a fuzzy relation satisfying reflexivity and symmetry but transitivity thus a partition of the universe is not attained. This problem can be solved calculating its transitive closure what ensure a partition for each level belonging to the unit interval [0,1]. This procedure allows generalizing the theory of rough sets depending on the minimum level of similarity accepted. This new point of view increases the rough character of the data because increases the set of indiscernible objects. Finally, we apply our results to a not real application to be capable to remark the differences and the improvements between this methodology and the classical one.

Following the judgements and ideas of several authors in the area of social sciences it is widely accepted that the rough sets theory has some advantages in economical forecasting research with regard to traditional mathematical tools as mathematical functions or statistical models.This theory does not need any external information because it works with the original data, it is capable to analyze qualitative attributes, the information is given by the natural language of decision rules, and the results are easy to understand without requiring an interpretation of technical parameters as is the case of credit scoring, utility function or outranking relation.
The knowledge is usually described by qualitative valuations in an information table.This methodology has been extensively applied in economical fields.Interested results are usually reported in business failure prediction ( [4], [5], [19], [20]), database marketing ( [6], [11], [15]) and financial investment ( [2]).Summarising, in models intending to predict bankruptcy ( [9]), the elements (or objects) are the enterprises, the conditional attributes can be the Current Ratio and the Net income, and the decision attribute the fact that there is or not bankruptcy.In applications to Sightseeing Expenditures ( [1]) objects are guests and attributes some characteristics about the hotel reservation (room type, number of nights stayed, payment method, computerized reservation system, and so on).
Knowledge and perception of the elements of the universe is a basis of the definition of a set in the rough set theory.From this point of view is possible that two different elements were seen as the same so they are indiscernible, which happens when all the attribute values are the same.In other terms means that the similarity between the objects is one.That is a consequence that in fact objects are only represented by their attribute values.What we propose is to softer this condition allowing to be indiscernible elements with similarity smaller than the unit.This new approach gives as a result different partitions depending on the level of similarity accepted by the experts.

SIMILARITY INDEXES
In the methodology that we propose it is essential to determine how similar the objects are.The theory of rough sets includes this point of view in the sense that if all the attribute values that define two objects are identical then the similarity between them is one.Many ways for calculating the similarity have been proposed, discussed, analyzed and used.Selecting an appropriate index of similarity will be fundamental to achieve suitable results.
Literature about this topic suggests different kind of measures of similarities according to types of scales in measuring data.Fuzzy data belong to the unit interval then a model based on the metric space can be taken in account.Associating to each object a m-dimensional vector and calculating their similarity by means of a decreasing function ) ) of their normalized distance d seems a very logical procedure that provides a mathematical tool with appropriate geometrical properties.
On the other hand, for binary data the vectors associated to each object are successions of 0 and 1.In these conditions is preferable a set-theoretical model based on the cardinality of the common and distinctive features.Some wellknown similarity measures are the simple matching coefficient, the Jackard coefficient and the Rao's coefficient.In our context we deal with fuzzy data which can be interpreted as a generalization of binary data.Several fuzzy similarity measures have been introduced from those concerning binary data simply generalizing the cardinal as the addition of the different values of the membership function and making some crisp simplifications.To deal with the fuzzy character of the data a weaker definition (in comparison of the metric space model) of fuzzy similarity measure is usually taken in account in some applications: [ ] is a fuzzy similarity measure if and only if for any pair of elements.The reflexive ) is strongly recommended and used in almost all theoretical and applied papers.These kind of fuzzy relations are very important in fuzzy clustering for fuzzy relational data what is not surprising because of the closeness between the clustering and similarity concepts.We will refer to all them as fuzzy similarity indexes ( [10]).These indexes define a symmetric similarity square matrix with the unit element in the main diagonal therefore is the matrix of a fuzzy proximity relation.
For applications we have chosen the fuzzy simple matching coefficient because is the generalization of the simple matching coefficient for binary data and is associated to the normalized distance of Hamming (the normalized 1  L -distance in functional analysis) so verifies both approaches defined above.Moreover, is very often used in fuzzy economical applications.This index is defined as follows:

ROUGH SETS DEPENDING ON A LEVEL OF SIMILARITY
One main assumption in the theory of rough sets applied to information tables is that the elements that exhibit the same information are indiscernible (similar) and form blocks that can be understood as elementary granules of knowledge about the universe.In fact, from a mathematical point of view, two elements of the universe are related if their values for all attributes are the same.This relation is an equivalence relation called indiscernibility relation.Equivalence classes of the indiscernibility relation are referred to as elementary sets.
Following this idea we can interpret that in certain conditions of imprecision about the language or the seizure of data, two elements can be thought as similar if their similarity is greater that a certain value belonging to the unit interval [ ] 1 , 0 and not only if their similarity is equal to one.

VALUATION OF THE INFORMATION TABLE
In order to find the similarities between the elements of the universe we need to establish a valuation for all the elements of the information table.That means to define . Obviously, ν has to be a "monotone" function of the attributes defined by an expert namely, if the categorical values are ordered in an increasing sequence, for instance: no, very low, low, regular, high, very high and yes, then, the numerical values have to be an increasing list of numbers between 0 and 1.This valuation depends on the experts but as is applied to the entire table the differences of criteria are quite irrelevant.The substitution of attribute values for numbers between 0 and 1 defines a fuzzy subset for each element of the universe.All the values define the matrix of a fuzzy relation between the universe and the set of attributes.From now on we identify elements with their fuzzy subsets defined in the valuation.seems a logical procedure to deal with our objective.Even though this crisp relation is reflexive and symmetric unfortunately is not transitive, so is not an equivalence relation.Therefore we can not calculate directly with the primary similarities between elements because the elementary sets could not form a partition.

Let
) r ( R ij = be the similarity matrix defined by the selected similarity index so ) , ( This relation is a proximity fuzzy relation in the universe of objects but an equivalence fuzzy relation because does not verify transitivity.In order to achieve an equivalence fuzzy relation we calculate its max-min transitive closure.This strategy depends on the selection of the fuzzy similarity index and allows finding a partition of the universe depending on the level of similarity considered.A very important theorem proves that the partition obtained from the transitive closure is the same that with the hierarchical method of single linkage and the fuzzy connected components of the fuzzy graph defined by the matrix ( [10]).The transitive closure of R is defined by: { } > , it is easy to prove that where m is the minimum value that verifies . From this point of view: Notice that, in applications, it is necessary that an expert defines the minimum accepted level for considering two elements as similar in order to select the appropriated partition.We will call this parameter lower threshold of similarity.
Summarising, the first step is to assign to each element of the information table a number in the unit interval.The second step is selecting an index of similarity that allows calculating similarities between the elements of the universe.These similarities define a fuzzy proximity relation.After that, we calculate its transitive closure which provides a partition of the universe depending on the alpha level belonging to the unit interval.Once fixed the lower threshold of similarity we deduce a unique partition (its elements are the equivalence classes).It is relevant to remark that when the lower threshold of similarity is equal to one we obtain the same partition (or another less fine) that we would obtain with the usual theory of rough sets what means that this new approach includes the classical one.
We would define in a similar way other concepts concerning the reduction of attributes: reduct at level α, reduct D-α-indispensable, attribute D-α-dispensable and attribute D-dispensable for an element of the universe.Future researches will study in depth on these concepts.

EXAMPLE
Finally, we apply our results to a not real application to be capable to remark the differences and the improvements between this new methodology and the usual one, namely, without and with the introduction of the similarities between the elements of the universe, the calculus of the transitive closure of the matrix of similarities, the selection of the lower threshold of similarity parameter and the generalization of the main concepts of rough sets.
Suppose we are given some data about six economical subjects (companies, guests…) so { } and therefore:  what means flu, we obtain the lower-approximation, upper-approximation, boundary region and accuracy following definitions (4), ( 5), ( 6) and (8).B-Lower approximation at threshold α : a { } are available to find the transitive closure; the best known is the power's method which consists in finding when stabilizes the powers of the matrix that defines the fuzzy relation.Defining the powers of the the previous matrix we obtain the elementary sets at different levels as defined in (3).
The different values of the data are shown in the information table represented in Table1and the corresponding valuation in Table2.