Symbolic Data Analysis Workshop 2025
Varaždin, 9 - 11 June 2025
The workshop will be organized in cooperation with 27th International Scientific Symposium on Biometrics, BIOSTAT 2025
About Symbolic Data Analysis
What is Symbolic Data Analysis?
Increasingly, datasets are so large they must be summarized in some fashion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that the data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. ... It quickly becomes clear that the range of methodologies available draws analogies with developments before 1900 that formed a foundation for the inferential statistics of the 1900s, methods largely limited to small (by comparison) datasets and classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalog (so to speak) of new symbolic methodologies along with rigorous mathematical and statistical foundational work for these methods.
Billard, L., Diday, E. (2003). From the statistics of data to the statistics of knowlwdge, Journal of American Statistical Association, 98: 462, 470:487,doi: 10.1198/016214503000242
Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data that comprehends inherent variability. While in Data Mining and classical Statistics the data to be analyzed usually presents one single value for each variable, that is no longer the case when the entities under analysis are not single elements, but groups gathered on the basis of some given criteria. Then, for each variable, variability inherent to each group should be taken into account. Also, when analysing concepts, such as botanic species, disease descriptions, car models, and so on, data entail intrinsic variability, which should be explicitly considered. To this purpose, new variable types have been introduced, whose realizations are not single real values or categories, but sets, intervals, or,more generally, distributions over a given domain. SDA provides methods for the (multivariate) analysis of such data, where the variability expressed in the data representation is taken into account, using various approaches.
Brito, P. (2014). Symbolic data analysis: Another look at the interaction of data mining and statistics. WIREs Data Mining and Knowledge Discovery, 4 (4), 281–295. doi: 10.1002/widm.1133