Loading...
7 results
Search Results
Now showing 1 - 7 of 7
- Regular sports services: dataset of demographic, frequency and service level agreementPublication . Pinheiro, Paulo; Cavique, LuísThis article describes a dataset of different services acquired by users during the period in which they are active in a sports facility as well as their behavior in terms of frequency of the sport facility itself and the type of classes they prefer to attend. Each observation in the dataset corresponds to one user, including the features of subscriptions and frequency. Data were collected between June 1st 2014 and October 31st 2019 from a database of an ERP solution operating in a sports facility in Lisbon, Portugal. From this database, it was possi- ble to perform operations of extraction, transformation and loading into the dataset. The dataset with real data can be useful for research in ar- eas such as customer retention, machine learning, marketing, actionable knowledge and others. Although we present real data from users of a sports facil- ity, in order to comply the GDPR legislation, the attributes that could identify the users were removed making the data anonymized.
- Uplift modeling using the transformed outcome approachPublication . Pinheiro, Paulo; Cavique, LuísChurn and how to deal with it is an essential issue in the telecommunications sector. Within the scope of actionable knowledge, we argue that it is crucial to find effective personalized interventions that can lead to a reduction in dropouts and that, at the same time, make it possible to determine the causal effect of these interventions. Considering an intervention that encourages clients to opt for a longer-term contract for benefits, we used Uplift modeling and the Transformed Outcome Approach as a machine learning-based technique for individual-level prediction. The result is actionable profiles of persuadable customers that increase retention and strike the right balance between the campaign budget.
- Telco customer churn analysis: measuring the effect of different contractsPublication . Pinheiro, Paulo; Cavique, LuísCustomer retention is nowadays a challenge that requires concrete and personalized actions. Traditional data mining studies focused on predictive analytics, neglecting the business domain. This work aims to present an actionable knowledge discovery based on specific, actionable attributes and measuring of their effects. It is common to use matching, and propensity score approaches in healthcare to evaluate causality. After performing matching using the actionable attributes in this analysis, the causal effect is quantified. This work concludes that the difference between having a yearly contract versus having a monthly contract affects the churn of around 34%.
- A bi‐objective procedure to deliver actionable knowledge in sport servicesPublication . Pinheiro, Paulo; Cavique, LuísThe increase in retention of customer in gyms and health clubs is nowadays a challenge that requires concrete and personalized actions. Traditional data mining studies focused essentially on predictive analytics, neglecting the business domain. This work presents an actionable knowledge discovery system which uses the following pipeline (data collection, predictive model, retention interventions). In the first step, it extracts and transforms existing real data from databases of the sports facilities. In a second step, predictive models are applied to identify user profiles more susceptible to dropout, where actionable withdrawal rules are based on actionable attributes. Finally, in the third step, based on the previous actionable knowledge some of the values of the actionable attributes should be changed in order to increase retention. Simulation of scenarios is carried out, with test and control groups, where business utility and associate cost are measured. This document presents a bi-objective study in order to choose the more efficient scenarios.
- Large language model for querying databases in PortuguesePublication . Figueiredo, Lourenço; Pinheiro, Paulo; Cavique, Luís; Marques, NunoThis study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscription-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries.
- Data science maturity model: from raw data to pearl’s causality hierarchyPublication . Cavique, Luís; Pinheiro, Paulo; Mendes, Armando B.Data maturity models are an important and current topic since they allow organizations to plan their medium and long-term goals. However, most maturity models do not follow what is done in digital technologies regarding experimentation. Data Science appears in the literature related to Business Intelligence (BI) and Business Analytics (BA). This work presents a new data science maturity model that combines previous ones with the emerging Business Experimentation (BE) and causality concepts. In this work, each level is identified with a specific function. For each level, the techniques are introduced and associated with meaningful wh-questions.We demonstrate the maturity model by presenting two case studies.
- A machine learning framework for uplift modeling through customer segmentationPublication . Pinheiro, Paulo; Cavique, LuísIn uplift modeling, the goal is to identify high-value customers based on persuadable customers, those who make a purchase only if contacted. To achieve this, uplift modeling combines machine learning techniques with causal inference, allowing businesses to refine their customer targeting strategies and focus efforts where they are most profitable. This study proposes a practical and reproducible two-phase procedure for identifying highvalue customers. In the first phase, customers are segmented using decision trees, which offer a transparent and data-driven approach to grouping individuals with similar characteristics. This segmentation lays the groundwork for a meaningful interpretation of customer behavior. In the second phase, uplift is calculated for each customer segment by comparing the outcomes of the treatment and control groups. This enables the identification of customer groups with the highest uplift. A real-world use case further illustrates the value and applicability of the proposed method. To validate model performance, the procedure employs established metrics such as the Qini index and Cohen’s kappa, which provide insights into both the effectiveness and reliability of the uplift estimates. This work presents a decoupled procedure for uplift modeling that leverages well-established libraries, fostering transparency and a clear understanding of the analytical process. A key contribution to uplift modeling and causal inference is the use of decision trees for stratification, which enables the creation of meaningful segments and their evaluation through the average treatment effect. By integrating theory with practical implementation, this work offers a comprehensive framework for uplift modeling that bridges academic rigor and business usability.
