Repository logo
 

Search Results

Now showing 1 - 7 of 7
  • Data mining process models: a roadmap for knowledge discovery
    Publication . Mendes, Armando B.; Cavique, Luís; Santos, Jorge M. A.
    Extracting knowledge from data is the major objective of any data analysis process, including the ones developed in several sciences as statistics and quantitative methods, data base \ data warehouse and data mining. From the latter disciplines the data mining is the most ambitious because intends to analyse and extract knowledge from massive often badly structured data with many specific objectives. It is also used for relational data base data, network data, text data, log file data, and data in many other forms. In this way, is no surprise that a myriad of applications and methodologies have been and are being developed and applied for data analysis functions, where CRISP-DM (cross industry standard process for data mining) and SEMMA (sample, explore, modify, model, assessment) are two examples. The need for a roadmap is, therefore, highly recognised in the field and almost every software company has established their own process model.
  • A feature selection approach in the study of azorean proverbs
    Publication . Cavique, Luís; Mendes, Armando B.; Funk, Matthias; Santos, Jorge M. A.
    A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, we present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee, based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and we believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs.
  • Clique communities in social networks
    Publication . Cavique, Luís; Mendes, Armando B.; Santos, Jorge M. A.
    Given the large amount of data provided by the Web 2.0, there is a pressing need to obtain new metrics to better understand the network structure; how their communities are organized and the way they evolve over time. Complex network and graph mining metrics are essentially based on low complexity computational procedures like the diameter of the graph, clustering coefficient and the degree distribution of the nodes. The connected communities in the social networks have, essentially, been studied in two contexts: global metrics like the clustering coefficient and the node groups, such as the graph partitions and clique communities.
  • An algorithm to discover the k-clique cover in networks
    Publication . Cavique, Luís; Mendes, Armando B.; Santos, Jorge M. A.
    In social network analysis, a k-clique is a relaxed clique, i.e., a k-clique is a quasi-complete sub-graph. A k-clique in a graph is a sub-graph where the distance between any two vertices is no greater than k. The visualization of a small number of vertices can be easily performed in a graph. However, when the number of vertices and edges increases the visualization becomes incomprehensible. In this paper, we propose a new graph mining approach based on k-cliques. The concept of relaxed clique is extended to the whole graph, to achieve a general view, by covering the network with k-cliques. The sequence of k-clique covers is presented, combining small world concepts with community structure components. Computational results and examples are presented.
  • Introduction to data envelopment analysis
    Publication . Santos, Jorge M. A.; Negas, Elsa Rosário; Cavique, Luís
    This chapter introduces the basics of data envelopment analysis techniques, with a short historical introduction and examples of the constant returns to scale model (CRS) and the variable returns to scale (VRS) model. The ratio models are linearized and for both orientations primal and dual models are presented.
  • Multiplier adjustment in data envelopment analysis
    Publication . Santos, Jorge M. A.; Cavique, Luís; Mendes, Armando B.
    Weights restriction is a well-known technique in the DEA field. When those techniques are applied,weights cluster around its new limits, making its evaluation dependent of its levels. This paper introduces a new approach to weights adjustment by Goal Programming techniques, avoiding the imposition of hard restrictions that can even lead to unfeasibility. This method results in models that are more flexible.
  • An algorithm to condense social networks and identify brokers
    Publication . Cavique, Luís; Marques, Nuno C.; Santos, Jorge M. A.
    In social network analysis the identification of communities and the discovery of brokers is a very important issue. Community detection typically uses partition techniques. In this work the information extracted from social networking goes beyond cohesive groups, enabling the discovery of brokers that interact between communities. The partition is found using a set covering formulation, which allows the identification of actors that link two or more dense groups. Our algorithm returns the needed information to create a good visualization of large networks, using a condensed graph with the identification of the brokers.