Browsing by Author "Mendes, Armando B."
Now showing 1 - 10 of 18
Results Per Page
Sort Options
- An algorithm to discover the k-clique cover in networksPublication . Cavique, Luís; Mendes, Armando B.; Santos, Jorge M. A.In social network analysis, a k-clique is a relaxed clique, i.e., a k-clique is a quasi-complete sub-graph. A k-clique in a graph is a sub-graph where the distance between any two vertices is no greater than k. The visualization of a small number of vertices can be easily performed in a graph. However, when the number of vertices and edges increases the visualization becomes incomprehensible. In this paper, we propose a new graph mining approach based on k-cliques. The concept of relaxed clique is extended to the whole graph, to achieve a general view, by covering the network with k-cliques. The sequence of k-clique covers is presented, combining small world concepts with community structure components. Computational results and examples are presented.
- A bi-objective feature selection algorithm for large omics datasetsPublication . Cavique, Luís; Mendes, Armando B.; Martiniano, Hugo F. M. C.; Correia, LuísFeature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.
- Big data in SATA Airline: finding new solutions for old problemsPublication . Mendes, Armando B.; Guerra, Hélia; Gomes, Luís; Oliveira, Ângelo; Cavique, LuísWith the rapid growth of operational data needed in airlines and the value that can be attributed to knowledge extracted from these data, airlines have already realized the importance of technologies and methodologies associated with the concept of big data. In this article we present the case study of SATA Airlines. The operational and the decision support systems are described as well as the perspectives of using these new technologies to support knowledge creation and aid the solution of problems in this specific company. The proposed system provides a new operational environment.
- Bringing underused learning objects to the light: a multi-agent based approachPublication . Behr, André; Cscalho, José; Mendes, Armando B.; Guerra, Hélia; Cavique, Luís; Trigo, Paulo; Coelho, Helder; Vicari, RosaThe digital learning transformation brings the extension of the traditional libraries to online repositories. Learning object repositories are employed to deliver several functionalities related to the learning object’s lifecycle. However, these educational resources usually are not described effectively, lacking, for example, educational metadata and learning goals. Then, metadata incompleteness limits the quality of the services, such as search and recommendation, resulting in educational objects that do not have a proper role in teaching/learning environments. This work proposes to bring an active role to all educational resources, acting on the analysis generated from the usage statistics. To achieve this goal, we created a multi-agent architecture that complements the common repository’s functionalities to improve learning and teaching experiences. We intend to use this architecture on a repository focused on ocean literacy learning objects. This paper presents some steps toward this goal by enhancing, when needed, the repository to adapt itself.
- Business intelligence no suporte a decisões sobre comunicações : descrição de um casoPublication . Mendes, Armando B.; Alfaro, Paulo Jorge; Ferreira, AiresO projecto descrito tem o objectivo de apoiar decisões de investimento em infraestruturas de comunicação na Electricidade Dos Açores (EDA), a empresa responsável pela geração, transporte e venda de corrente eléctrica na Região Autónoma dos Açores. A decisão imediata a apoiar consistia em saber se as comunicações entre ilhas deveriam passar para tecnologias Voice over IP (VoIP), um serviço actualmente contratado em regime de outsourcing. Foi estabelecido um projecto de business intelligence, usando tecnologias OLAP do Microsoft SQL Server, para ler e pré-processar os ficheiros CSV de grande dimensão, combinar esses dados com bases de dados existentes e apresentar os resultados sobre a forma de cubos multidimensionais. Posteriormente, implementaram-se igualmente algoritmos de data mining, integrando na metodologia CRISP-DM as duas técnicas utilizadas. Construindo vários modelos foi possível, além de apoiar a decisão pretendida, identificar situações ineficientes e mesmo fraudulentas. Os modelos construídos foram ainda disponibilizados aos decisores estratégicos e de controlo, assim como toda a estrutura de reutilização, manutenção e realimentação que suporta o OLAP e os modelos de data mining. Abstract:This project addresses decisions of investment on communication infrastructures in Electricidade dos Açores (EDA), the local Electric Company in the Azores Islands. The main decision was that EDA communications should be moving to Voice over IP (VoIP) from present telephone lines, outsourced to an external communications company. At the beginning, a business intelligence project was set, with the objective of getting data from the communications company and analyzing it in order to offer useful information to decision makers. The system uses Microsoft SQL server technologies to establish an OnLine Analytical Processing (OLAP) application. It translates big CSV flat files in a ROLAP infrastructure and presents the results as multidimensional data cubes. Latter some data mining models were implemented and both techniques were incorporated in the CRISP-DM process model. Different models identified several inefficient procedures and even fraud situations as long as supporting the investment decision. These models as long as all the technology developed for gathering data, maintain an manage the OLAP cubes and data mining models were made available to control and strategic decision makers.
- Clique communities in social networksPublication . Cavique, Luís; Mendes, Armando B.; Santos, Jorge M. A.Given the large amount of data provided by the Web 2.0, there is a pressing need to obtain new metrics to better understand the network structure; how their communities are organized and the way they evolve over time. Complex network and graph mining metrics are essentially based on low complexity computational procedures like the diameter of the graph, clustering coefficient and the degree distribution of the nodes. The connected communities in the social networks have, essentially, been studied in two contexts: global metrics like the clustering coefficient and the node groups, such as the graph partitions and clique communities.
- Data mining process models: a roadmap for knowledge discoveryPublication . Mendes, Armando B.; Cavique, Luís; Santos, Jorge M. A.Extracting knowledge from data is the major objective of any data analysis process, including the ones developed in several sciences as statistics and quantitative methods, data base \ data warehouse and data mining. From the latter disciplines the data mining is the most ambitious because intends to analyse and extract knowledge from massive often badly structured data with many specific objectives. It is also used for relational data base data, network data, text data, log file data, and data in many other forms. In this way, is no surprise that a myriad of applications and methodologies have been and are being developed and applied for data analysis functions, where CRISP-DM (cross industry standard process for data mining) and SEMMA (sample, explore, modify, model, assessment) are two examples. The need for a roadmap is, therefore, highly recognised in the field and almost every software company has established their own process model.
- Data science maturity model: from raw data to pearl’s causality hierarchyPublication . Cavique, Luís; Pinheiro, Paulo; Mendes, Armando B.Data maturity models are an important and current topic since they allow organizations to plan their medium and long-term goals. However, most maturity models do not follow what is done in digital technologies regarding experimentation. Data Science appears in the literature related to Business Intelligence (BI) and Business Analytics (BA). This work presents a new data science maturity model that combines previous ones with the emerging Business Experimentation (BE) and causality concepts. In this work, each level is identified with a specific function. For each level, the techniques are introduced and associated with meaningful wh-questions.We demonstrate the maturity model by presenting two case studies.
- Enhancing learning object repositories with ontologiesPublication . Behr, André; Mendes, Armando B.; Cascalho, José; Rossi, Luiz; Vicari, Rosa; Trigo, Paulo; Novo, Paulo; Cavique, Luís; Guerra, HéliaIn this paper, we present a review on the use of ontologies in learning object repositories systems for searching and suggestion purposes, considering its adoption for the seaThings project that aims to promote the ocean literacy. We also describe the use case of the Cognix system and Agent-based Learning Objects - OBAA metadata standard for learning objects which is being implemented on a new learning objects repository. This system includes concepts from arti cial intelligence such as agents and ontologies that aim to improve the search and so making the system more responsive. This paper also sugests how an ontology can be implemented, using metadata in learning object repositories to provide relevant aspects, such as interoperability, reuse, and searching.
- A feature selection algorithm based on heuristic decompositionPublication . Cavique, Luís; Mendes, Armando B.; Martiniano, Hugo F. M. C.Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based feature selection is a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the feature selection algorithm LAID, Logical Analysis of Inconsistent Data, is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, a problem de-composition strategy associated with a set covering problem formulation is used. The algorithm is applied to artificial datasets with genome-like characteristics of patients with rare diseases.