Open Source Toolbox PICI: Peer Innovation Community Indicators

Expansion possibility: The toolbox can be used to evaluate the forums of other peer communities. The data required can be loaded either from a static file or dynamically via web scraping, database or another interface (API). The static file must correspond to the specified data structure described in the online manual. In the three communities examined, the dynamic links have already been created. Thus, new data of the community forums can be loaded automatically from the internet on a regular basis. Appropriate adjustments must be made for each new data source. If the data of the forums of several communities are structured in the same way, e.g. because they use the same forum software (e.g. Discourse), they can be collected with a comprehensive query. How to integrate new data sources into the toolbox is documented in the online manual.

Expansion possibility: Furthermore, additional metrics can be added to the toolbox by extending the indicator library. The procedure is explained in the Examples for Creating a custom indicator. A new metric must then be defined at the respective observation level. For example, additional text-based indicators could focus on certain keywords in the contributions or mark certain features in the contributors' response behaviour.

Expansion possibility: The classification of forum content could also be based on completely different criteria, depending on the research interest and application context. As indications of innovation activities or innovation potential, other content-related criteria could be used for labelling as well, and labelling could also be carried out at other observation levels than the thread. For example, individual forum posts could also be evaluated. Furthermore, other innovation aspects than activities such as innovation inputs and outcomes could be focused on for identifying peer innovation.

The toolbox makes use of online forums as a data source for the investigation of peer communities. The forum data can be provided as a static file or loaded dynamically. Each forum post forms a data point, which is identified e.g. by timestamp, text content, contributor/ID and the associated thread. Basic principles: For further processing, the toolbox requires a certain data structure that records the time of creation, the text content, a unique creator and the associated thread for each individual forum post. The data model links the table of individual contributions (posts) with a table of contributors (authors) and a table with information on the discussion threads. The three data tables form the basis for the analysis of online interactions in the peer communities. They are available as static files or are dynamically extracted from publicly accessible internet forums after adaptation to the respective data source.

A part of the data set must be labelled so that the supervised classification algorithm can be trained with it. For this part of the data, the relevance of the forum content must be defined, because the performance of the indicators is measured by this ground truth. The training data set can be created in different ways.

The toolbox calculates interaction networks based on forum data considering directed and undirected network relationships between contributors. This analysis creates different graphs for the calculation of network metrics. Basic principles: Two different interaction networks are calculated from the forum data. The nodes of the networks form the users in the online forum, the edges represent their interactions. A distinction is made between contributors and commentators. The contributor network considers which forum users contribute to the same discussions. There is an undirected relationship between two contributors sharing participation in a thread. The strength of the relationship increases with the number of threads shared. In contrast, the commenter network maps the contributions to discussions initiated by others. There is a directed relationship between commentators and the initial contributor of a thread. The strength of a relationship between two forum contributors increases with the number of comments by one contributor to the other's initial contributions. Figure: Network relationships between contributors and commentators, Source: Peer Innovation, 2022

A comprehensive set of relevant measurement methods for various indicators calculated from the forum data is already implemented in the toolbox. The indicator library can easily be extended with further metrics. Basic principles: The indicators implemented in the toolbox are based on the literature analysis conducted in the research project on community indicators in the research context of open innovation and user innovation (Pohlisch et al. 2021). These metrics are each assigned to different observation levels. When adding further metrics, it must be taken into account to which level the new metric refers. An overview of the current implementations can be found here https://phihes.github.io/pici/indicators/ and https://phihes.github.io/pici/reference/pici/metrics/.

Using supvervised machine learning, the performance of the indicator metrics in classifying forum content is evaluated based on the training dataset. In this way, suitable indicators and their best possible combination are selected for the automated detection of relevant content in the community forum. The toolbox uses various machine learning techniques to train classification models that combine the most suitable metrics for classifying forum content. Basic principles: PICI uses various machine learning techniques to train classification models that combine the most appropriate metrics for classifying forum content.

The toolbox calculates key figures and network statistics from the forum data of the peer communities, such as the shares of occasional and core contributors or the network density. These values are used to compare and characterise the peer communities and their network structures. Basic principles: The forum data can be statistically evaluated with the toolbox for each peer community. The implemented metrics can be calculated and compared with the other peer communities to identify special features in the interaction structures. In addition, depending on the interest of the study, average values can be calculated for any period of time in order to investigate dynamic changes in the communities.

The integration of the selection of suitable estimation models and indicators allows for the classification of the whole forum content. In this way, the relationships existing in the partial data set between the measurable characteristics and the relevance of the forum content are transferred to the community level. Basic principles: In this step, the trained classification models are applied to the entire data set in order to classify all the forum content extracted. Thereby, the automated evaluation learned from the training data is applied to all forum content. The results of the automated classification can in turn be evaluated and compared for individual communities, different time periods or sub-forums.

Example from the PeerInnovation project: For the research project, data was collected from the online forums of three peer communities to analyse their innovation activities: Precious Plastic, OpenEnergyMonitor and OpenStreetMap. All contributions in the forums of the communities were scrapped and analysed for the period from 01/2017 to 12/2019. In total, the data set comprises more than 200,000 contributions (in 20,000 threads) from 12,000 users. Further informations

Example from the PeerInnovation project: The research project examines online forum discussion threads as measurable manifestations of knowledge exchange in peer communities. The co-contributor networks map which community members exchange information with each other. The pooling of distributed knowledge and the recombination of experiences from different fields is considered a basic mechanism for the emergence of innovative ideas. The commenter network, on the other hand, maps directed knowledge flows within the peer community. Here, the focus is on who receives information and through whom. From this, conclusions can be drawn about the position of certain members and the significance of their contributions to the community.

Example from the PeerInnovation project: The research project aims to validate suitable indicators for innovation activities in peer communities. Metrics that are associated with the occurrence of innovations in the research literature (Pohlisch et al. 2021) are therefore calculated at the different observation levels. These metrics then can identify threads that contain indications of relevant activities, such as presenting, evaluating, implementing, modifying and improving innovative ideas.

Example from the PeerInnovation project: In the research project, the classification algorithm was trained with the manually labelled threads. In this way, the previously generated metrics could be validated in terms of how well they can predict the occurrence of innovation activities in discussion threads.

Example from the PeerInnovation project: In the research project, the trained models were used to estimate innovation activities of the peer communities in the forum data. The proportion of threads that are related to innovation activities or show a certain innovation potential could be determined automatically for the communities studied. However, the transferability of the estimation models between communities could not be examined more closely within the framework of the project.

Example from the PeerInnovation project: In the research project, three peer communities were examined and described with the help of the implemented metrics. The investigations were limited to the period from 2017 to 2019 for better comparability of the communities. In the comparison of the values, clear differences between the communities became visible, which alligns with the qualitative research results from interviews with community members. For example, the network densities of the peer communities Precious Plastic, OpenEnergyMonitor and OpenStreetMap could be compared (based on the undirected graphs). One finding was that the Precious Plastic community has a comparatively high network density. The core of the community (the largest sub-group in which all have interacted with each other at least once) includes just under 10% of the participants. In contrast, however, there is also a large proportion of occasional users in the forum (more than 50%). More detailed examples as well as comparative tables can be found in the project's work report 3.

Example from the PeerInnovation project: The research project investigated how peer communities drive sustainable innovation. For this purpose, those processes were determined, which provide information about the further development and dissemination of sustainable technologies by the community. In the forum discussions, there is evidence of activities with which community members contribute to the innovation process: not only sharing blueprints and designs, but also testing and evaluating technical solutions, and pointing out alternatives or ways to overcome difficulties in implementation. So with the training dataset, the toolbox should detect indications of these multiple activities. The assessment of the prevalence of innovation activities and the innovation potential of the threads' content was manually conducted by the project team and outsourced using an online survey. For this purpose, rules were first defined (Heß & Gleu 2022), according to which the evaluators assigned labels to the threads. The evaluated threads were then used as training data for validating the suitability of the various metrics for identifying peer innovation. The quality of the training data can be assessed using interrator agreement metrics. In case of labeller disagreements, it was necessary to apply a rule on which labels to choose to finalize the shape of the training data.

Expansion possibility: There are other rules of network formation possible. For example, the respective positions of the posts in the threads could be taken into account. Furthermore, the contributors could be differentiated between established members and newcomers based on their previous history to examine community entry (e.g., Paxton et al. 2022). Expanding the toolbox to include other network types is possible, but requires modification of the code.

Expansion possibility: Extensions of the community comparisons are easily possible by adding forum data from other communities or defining alternative metrics. In addition, comparing the average values of different time periods offers another possibility for evaluation that was not pursued in the research project. Thus, on the one hand, the dynamic development of the communities could be examined or the effect of certain events or interventions on the communities could be evaluated.

Expansion possibility: The toolbox could easily be used to make comparative estimates for different time periods or sub-forums of the same community. In the future, the robustness of the estimates should be tested when transferring the models to different communities in order to enable automated classification without prior labelling.

Open Source Toolbox PICI: Peer Innovation Community Indicators

Over 30 million people build interactive content in Genially.

ARE MOUNTAIN TRAILS CONDUITS FOR PLANT INVASIONS?

WEB DU BOIS

MOON INTERACTIVE IMAGE

OBJECTICS

AKTEOS HELPS YOU WISH PEOPLE AROUND THE WORLD A HAPPY NEW YEAR

OKMO FLOOR PLAN

DISCOVERY AT SUTTER'S MILL

Transcript