Repository logo
 

Search Results

Now showing 1 - 10 of 108
  • Data mining process models: a roadmap for knowledge discovery
    Publication . Mendes, Armando B.; Cavique, Luís; Santos, Jorge M. A.
    Extracting knowledge from data is the major objective of any data analysis process, including the ones developed in several sciences as statistics and quantitative methods, data base \ data warehouse and data mining. From the latter disciplines the data mining is the most ambitious because intends to analyse and extract knowledge from massive often badly structured data with many specific objectives. It is also used for relational data base data, network data, text data, log file data, and data in many other forms. In this way, is no surprise that a myriad of applications and methodologies have been and are being developed and applied for data analysis functions, where CRISP-DM (cross industry standard process for data mining) and SEMMA (sample, explore, modify, model, assessment) are two examples. The need for a roadmap is, therefore, highly recognised in the field and almost every software company has established their own process model.
  • Bringing underused learning objects to the light: a multi-agent based approach
    Publication . Behr, André; Cscalho, José; Mendes, Armando B.; Guerra, Hélia; Cavique, Luís; Trigo, Paulo; Coelho, Helder; Vicari, Rosa
    The digital learning transformation brings the extension of the traditional libraries to online repositories. Learning object repositories are employed to deliver several functionalities related to the learning object’s lifecycle. However, these educational resources usually are not described effectively, lacking, for example, educational metadata and learning goals. Then, metadata incompleteness limits the quality of the services, such as search and recommendation, resulting in educational objects that do not have a proper role in teaching/learning environments. This work proposes to bring an active role to all educational resources, acting on the analysis generated from the usage statistics. To achieve this goal, we created a multi-agent architecture that complements the common repository’s functionalities to improve learning and teaching experiences. We intend to use this architecture on a repository focused on ocean literacy learning objects. This paper presents some steps toward this goal by enhancing, when needed, the repository to adapt itself.
  • A feature selection approach in the study of azorean proverbs
    Publication . Cavique, Luís; Mendes, Armando B.; Funk, Matthias; Santos, Jorge M. A.
    A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, we present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee, based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and we believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs.
  • Regular sports services: dataset of demographic, frequency and service level agreement
    Publication . Pinheiro, Paulo; Cavique, Luís
    This article describes a dataset of different services acquired by users during the period in which they are active in a sports facility as well as their behavior in terms of frequency of the sport facility itself and the type of classes they prefer to attend. Each observation in the dataset corresponds to one user, including the features of subscriptions and frequency. Data were collected between June 1st 2014 and October 31st 2019 from a database of an ERP solution operating in a sports facility in Lisbon, Portugal. From this database, it was possi- ble to perform operations of extraction, transformation and loading into the dataset. The dataset with real data can be useful for research in ar- eas such as customer retention, machine learning, marketing, actionable knowledge and others. Although we present real data from users of a sports facil- ity, in order to comply the GDPR legislation, the attributes that could identify the users were removed making the data anonymized.
  • Ramex-Forum: a tool for displaying and analysing complex sequential patterns of financial products
    Publication . Tiple, Pedro; Cavique, Luís; Marques, Nuno C.
    Financial data provides a valuable up‐to‐date knowledge of the world economy. However, it is presented in extremely large data volumes, in diverse formats, and is constantly being updated at a high speed. The Ramex‐Forum algorithm is oriented to guide financial experts in finding new and relevant information.We present a sensitivity analysis and newvisualizations using an improved version of the Ramex‐Forum algorithm. The proposed algorithm is applied to two case studies – the petroleum production chain and the European financial institutions risk analysis. Different combinations of parameters and new ways to visualize data are used. Results highlight the importance of Ramex‐Forum for analysing relevant relationships in price variations in financial markets.
  • Clique communities in social networks
    Publication . Cavique, Luís; Mendes, Armando B.; Santos, Jorge M. A.
    Given the large amount of data provided by the Web 2.0, there is a pressing need to obtain new metrics to better understand the network structure; how their communities are organized and the way they evolve over time. Complex network and graph mining metrics are essentially based on low complexity computational procedures like the diameter of the graph, clustering coefficient and the degree distribution of the nodes. The connected communities in the social networks have, essentially, been studied in two contexts: global metrics like the clustering coefficient and the node groups, such as the graph partitions and clique communities.
  • Big data e data science
    Publication . Cavique, Luís
    Neste artigo foram apresentados os conceitos básicos de Big Data e a nova área a que deu origem, a Data Science. Em Data Science foi discutida e exemplificada a noção de redução da dimensionalidade dos dados.
  • Improving information system design: using UML and axiomatic design
    Publication . Cavique, Luís; Cavique, Mariana; Mendes, Armando; Cavique, Miguel
    A unified view of the Information System (IS) design is essential for dealing with complexity. However, the literature proposes many denominations, depending on the layer, methodology, framework, or tool. This multitude of approaches does not allow a holistic view of the system. Besides, in Information Systems, the search for good practices in design is still a relevant issue. A subset of essential Unified Modeling Language (UML) diagrams is chosen to create a broad view of the IS. CRUD matrix is one of the preferred approaches to articulate the sub-systems of applications and data. Axiomatic Design (AD) provides rules for the im provement of the IS design. This work presents a method to create object-oriented elements based on the CRUD matrix aligned with the business strategy. An integrated student-based case study on logistics is provided. In the discussion, a new IS architect role is proposed supported by the CRUD/AD method.
  • Data pre-processing and data generation in the student flow case study
    Publication . Cavique, Luís; Pombinho, Paulo; Tallón Ballesteros, Antonio J.; Correia, Luís
    Education covers a range of sectors from kindergarten to higher education. In the education system, each grade has three possible outcomes: dropout, retention and pass to the next grade. In this work, we study the data from the Department of Statistics of Education and Science (DGEEC) of the Education Ministry. DGEEC maintains those outcomes for each school year, therefore, this study seeks a longitudinal view based on student flow. The document reports the data pre-processing, a stochastic model based on the pre-processed data and a data generation process that uses the previous model.