Abstract de la publi numéro 11038

The paper presents a preliminary investigation of potential methods for extracting semantic views of text contents under the form of structured sets of words, which go beyond standard statistical indexing. The aim is to build kinds of fuzzily weighted structured images of semantic contents. A preliminary step consists in identifying the different types of relations (is-a, part-of, related-to, synonymy, domain, glossary relations) that exist between the words of a text, using some general ontology such as WordNet. Then taking advantage of these relations, different types of fuzzy clusters of words can be built. Moreover, apart from its frequency of occurrence, the importance of a word may be also evaluated through some estimate of its specificity. A degree of "centrality" is also computed for each word in a cluster. The size of the clusters, the frequency, the specificity and the centrality of their words are indications that enable us to build a fuzzy set of sets of words that progressively "emerge" from a text, as being representative of its contents. The ideas advocated in the paper and their potential usefulness are illustrated on a running example and on two experiments. It is expected that obtaining a better representation of the semantic contents of texts may help in particular to give indications of what the text is about to a potential reader.