Asynchronous Assistance: A Social Network Analysis of Influencing Peer Interactions in PeerWise

This mixed methods, investigative case study explored student patterns of use within the online PeerWise platform to identify the most influencing activities and to build a model capable of predicting performance based on these influencing activities. Peerwise is designed to facilitate student peer-to-peer engagement through creating, answering and ranking multiple choice questions; this study sought to understand the relationship between student engagement in Peerwise and learning performance. To address the research question, various usage metrics were explored, visualized and modelled, using social network analysis with Gephi, Tableau and Python. These findings were subsequently analyzed in light of the qualitative survey data gathered. The most significant activity metrics were evaluated leading to rich data visualisations and identified the activities that influenced academic performance in this study. The alignment of the key qualitative and quantitative findings converged on answering questions as having the greatest positive impact on learner performance. Furthermore, from a quantitative perspective the Average Comment Length and Average Explanation Length correlated positively with superior academic performance. Qualitatively, the motivating nature of PeerWise community also engaged learners. The key limitation of the size of the data set within the investigative case study suggests further research, with additional student cohorts as part of an action research paradigm, to broaden these findings.


Introduction
PeerWise is an online platform which facilitates student peer-to-peer learning through multiple choice questions. The platform has been operational for over a decade with a significant level of practitioner use, as well as educational research, built around it. The PeerWise creators and the PeerWise community have detailed the system itself and their experiences of incorporating it within their pedagogical approach in a range of peer-reviewed publications (see Dynan and Ryan, 2019).
Student centered learning is core to PeerWise; students create, answer, rate and discuss multiple choice questions (MCQs) related to their course content. MCQs can be powerful learning enablers when incorporated into a clear pedagogical approach that is supplemented with coherent feedback (Nicol, 2007). However, enhancing student learning, through a peer-to-peer approach is dependent on the quality of the peer-generated content. All questions on the platform are multiple choice so when a student posts a question, they must also give a selection of wrong answers as well as an explanation for the correct answer. This process promotes deep engagement and learning, as the contributing students not only take course content into consideration, but plausible wrong answers must also be reflected upon (Denny, et al., 2008a). Denny and co-workers (2009b) investigated the quality of questions provided by the students on the PeerWise platform through analysis by academics of a random selection of questions in comparison to the MCQ ratings provided by students. The key finding of this study demonstrated that student and course instructor ratings concurred when judging the quality of peer-generated questions. Student centered contribution continues after answering a MCQ, as they must rate and comment on a question; a reflective process that encourages analytical, evaluative and critical thinking (Denny et. al, 2008a). Kay and co-workers (2018) found that providing peer feedback was, at times, more beneficial than receiving peer feedback due to the increased thought process involved.
The impact of the various modes of active engagement within PeerWise on student learning has been an area of rich research since the platform was first developed. Denny and co-workers (2008c) noted that students believed, based on qualitative analysis, the most beneficial aspect of the system was answering questions. However, subsequent quantitative analysis portrayed a conflicting picture that showed no correlation between the amount of questions answered and the student's exam performance. This study also investigated other PeerWise activities, with a weak correlation (34%) observed between a student's average comment length and their final exam performance in one of the four modules investigated.
At a broader level, Denny and colleagues (2008a) investigated student global activity within PeerWise and the resulting relationship to their final examination performance. The findings suggest that students who were most active within PeerWise did perform statistically significantly better in their final examination in comparison to those students who were less active. Conversely, the number of active days and the comment character length were more indicative of an impact on learning attainment as indicated by exam performance, than the number of questions and answers provided by students. Given the nature of pedagogical case-study research, a later study by the same authors contradicted this finding whereby the more questions a student answered, the greater their improvement in class rank (Denny et. al, 2010).
More recently, Kay and co-workers (2019) investigated over 3,000 students, across six courses and three academic years in three UK universities and found a significant positive correlation between the student PeerWise activity and their final course grade. In their study, Kay and colleagues classified activity as the summed score of four standardised values: number of questions created; number of questions answered; number of quality comments written; and the number of quality comments received, which was in contrast to Denny and co-workers student categorization by performance quartiles (Denny et al., 2008a).
Learning analytics has emerged as an important domain of educational research, particularly focusing on the area of student performance. Researchers have analyzed available data to predict student success and seek to identify 'at risk' students (those likely to fail) to permit for a timely intervention if required. In such learning analytics informed scenarios, the most common class variable is of binary classification depicting student success via a pass or fail format. Widely used algorithms within a predictive modelling tool in educational performance settings include Logistic Regression, followed by Naïve Bayes and Support Vector Machines (SVM) and each provide varying degrees of accuracy (Caison, 2007;Macfadyen & Dawson 2010;Palmer, 2013;Zhang et al., 2004). Lauria and co-workers (2012) developed predictive models using support vector machine (SVM; 82.6% accuracy) and logistic regression (LR; 87.7% accuracy) to distinguish between performing students and 'at risk' students, via a binary classification process. Macfadyen and Dawson (2010) achieved an 81% accuracy model based upon a logistic regression algorithm predicting a binary class variable of pass or fail. Barber and Sharkey (2012) built several models, one of which implemented a Naïve Bayes algorithm achieving 85% accuracy, validated using 10-fold cross validation. Barber and Sharkey's research observed that including prior academic achievement contributed to a better model performance, while attributes such as gender, age and financial aid proved insignificant. To diversify the input data into the predictive model, Romero and colleagues (2013) used data from student discussion forums to predict final grade performance in the form of a binary class of pass or fail and found the Naïve Bayes algorithm performed the best out of twenty algorithms applied in the study across four datasets.

Research Goal
This research sought to explore student usage patterns on the PeerWise platform to investigate if there is a link between these usage patterns and end of semester grade. This research aim was achieved through a mixed methods approach focusing on influential and disconnected members on PeerWise (quantitative) and perceived use of a sample from the research population (qualitative). The study also investigated the accuracy of the developed machine learning models in predicting a student's final grade based on their PeerWise activity. Recommendations on the most efficient use of the PeerWise platform, to maximise student benefit, are made.

Sample and Data Collection
The PeerWise data used for this study was sourced over two academic years, 2017/18 & 2018/19. It comprised of the activities of 914 students across four institutions; all students were in their 1st year of a four-year degree programme and taking their second 'Fundamentals of Chemistry' module. Academic grades were available for one institution only (Institute A), so analysis was conducted on PW courses from Institute A (n=194). Data cleaning reduced the dataset to 171 students for modelling. Participation in the PeerWise platform was not mandatory, but Institute A students were offered 4% of their overall module grade if they completed the task of asking four questions, answering four questions and commenting on four questions. The PeerWise student engagement data was catagorised as distinct PeerWise activity types as listed in Table 1. Server log files containing 174,405 lines of data on user navigation throughout the PeerWise platform provided additional attributes including the number of distinct days a user was active on the platform, the total and average character length of the text attributes. (i.e. comments, replies and explanations), and the ratio of correct answers recorded. Student grade data were also available and consisted of the student average semester 1 grade for all modules taken, their semester 1 chemistry grade and their semester 2 chemistry grade (i.e. the module associated with this PeerWise research). Table 2 lists all attributes used.
For the qualitative data collection, all Institute A students (n=194) were invited to voluntarily complete an online survey that explored their use and perceived benefits of PeerWise on their learning. The survey comprised open-ended questions and was completed by six students subsequent to their completion of the module. The research received ethical approval from the Research Ethics and Integrity Committee of the Technological University Dublin.

Data Analysis
Social Network Analysis (SNA) was implemented using Gephi 0.9.2. Both Force Atlas and Fruchterman Reingold layouts were considered. The less uniform Force Atlas layout was preferred ( Figure 2). It displayed the range of engagement by students on the PeerWise platform, there were no obvious subgroups. Each Node represents a student on the PeerWise course, and the Edges represented a question and/or answering interaction between two students. The directional flow of information is from the question author (out-degree) to the question answerer (in-degree). Nodes are divided into two colours, Institute A students and others. The size of the node was determined by influence, an estimated average student rating calculated by (the number of questions authored + the average answers per question) * (average rating + 1) * (number of followers + 1) Additional SNA derived attributes were added to the study dataset for predictive modelling; namely degree, closeness and betweenness centrality measures. Degree centrality is an index of communication, the number of connections associated with a node. Closeness is associated with the independence of a point and is a measure of connectedness to other nodes based on shortest paths. Betweenness is an indication of the potential for control of a community and is a measure to which a node interrupts other shortest paths (Freeman, 1978;Scott, 2000). Centrality measures were calculated with respect to students in the same academic year.
Predictive modelling algorithms Logistic Regression, Naïve Bayes and Support Vector Machines were trained using a binary class label split at the 70% grade mark, to distinguish between high achieving students (1 st class honours, 70%, n=86) and others (<70%, n=85) as based on the semester 2 module grade. Trained models were evaluated using an overall model accuracy calculated from a confusion matrix generated using 10-fold cross validation as implemented in Python.
Qualitative PeerWise survey data were evaluated as previously detailed by Dynan and Ryan (2019). In brief, these data were analysed through an inductive strategy where thematic analysis was used to identify, analyse and report different themes throughout the data set (Braun and Clarke, 2006). The actual coding process was heavily influenced by the approach outlined by Bree and Gallagher (2016). In order to ensure appropriate data validity and rigour, the raw survey responses were firstly open coded and subsequently axially coded. Finally, the codebook was reduced by merging same/similar codes to create the final codebook, which was consequently organised into themes. Triangulation was achieved through the methods of data collection, supplemented by researcher reflective diaries and the scholarly literature.

Findings / Results
PeerWise Activity  The study sought to determine whether there was a link between student PeerWise activity and their final grade. In this study, activity was defined as the number of answers given on the PeerWise course, the number of days since the last log into the platform and both the average comment, and explanation, character lengths. As Institute A students sat within a larger cohort of students from other institutes it was important to isolate Institute A students. The number of answers logged per Institute A student across both PeerWise courses (17/18 and 18/19), with the size of the circles representing the number of active days on PeerWise per student, is depicted in Figure 3a and only students who answered 100 questions or less are depicted in Figure 3b. The overall trend line on both graphs indicates a positive relationship with the end of module semester grade and the number of answers provided on the PeerWise platform. A similar trend was observed when the number of distinct active days, the average comment length and the average explanation length were plotted against the semester 2 grade. Students identified as 'most active', as defined by those whom answered the most questions, achieved a good final grade. Ten of the top twelve achieved a grade above 70%, with most achieving within -2 and +8 % of their semester 1 grade (Figure 4a). Of the ten least active students identified, six performed worse than their semester 1 grade, one did not complete the final exam, one achieved the same grade, one student was 1% higher in semester 2, and the tenth student achieved an 11% increase. This suggested that a low PeerWise activity was related to a lower grade, although a larger sample size is needed to confirm this.

students are labelled alphabetical (A-F). Part (a) denotes the twelve most active students by in-
degree, part (b) denotes the ten most influential students, and, part (c) denotes the ten least influential students

Influential and Disconnected Members
The top ten most influential and disconnected students are denoted in Figures 4b and 4c. There is no clear pattern linking influential students with good grades or disconnected students with low grades. Influence, as measured in this study ((the number of questions authored + the average answers per question) * (average rating + 1) * (number of followers + 1)), had limited or no influence on the student end of semester grade.   Weighting of attributes, using a chi squared test, gave SNA generated attributes and PeerWise activity based attributes the highest weighting, as illustrated in Table 3. Betweenness has the highest weighting, however, it is worth noting this measure is calculated relative to a peer group.

Predicative modelling
The Logistic Regression, Naïve Bayes and Support Vector Machines (SVM) algorithms were modelled using Institute A students only. Initial models excluded centrality measures and semester 1 grades. Logistic Regression had an accuracy of 65.4%, followed by the Naïve Bayes with 63.5% accuracy and SVM achieving 61.5% accuracy. Including measure of centrality increased the Logistic Regression accuracy marginally (67.3%), whereas introducing semester 1 grade attributes increased the accuracy to 82.3%. For this final Logistic Regression model, the most predictive PeerWise activity attributes were number of questions answered, the average comment and the average explanation lengths.

Qualitative Data Analysis
Following thematic analysis, four major themes emerged from the qualitative data set; understanding through content creation, usefulness of PeerWise as a revision tool, motivation to engage and the game-like features of PeerWise.

Discussion
This investigative case study interrogated if PeerWise interaction was aligned with academic performance in a first year fundamentals of chemistry module, with prior academic performance measures serving as a benchmarking standard. Social network analysis and predictive modelling were employed, in conjunction with qualitative analysis, to examine both learner analytics informed, and perceived learning impact, of student PeerWise activity. PeerWise activity comprised answering/authoring questions, the average comment, and explanation, character lengths and the number of days since the last log into the platform. Within the PeerWise dataset there were several different types of interactions; the most common interaction was one student answering another student's question. In this instance the student who authored the question was either imparting new information or reinforcing prior learned information to the student answering the question.
To follow the directional information flow and to enhance data visualization, and therefore comprehension within the social network analysis, appropriate optimization was required. For example, a number of options were considered for Node size including number of In-Degrees, Out-Degrees, Questions authored and number of Followers. However, only one measure can be represented via node size. In order to generate a more informative Node representation, four methods of combining activities were examined, each was considered a measure of influence. Previous studies indicated that the attributes most associated with influence were the number of questions posted and the number of followers a student has (Denny et al., 2008a). However, the number of answers received per posted question and the average rating of posted questions were also potentially valuable in determining the quality of a posted question and thus could be incorporated into the calculation to determine influence (Kay et al., 2019). All of these attributes were examined in this study (see Table 5), under four options. Option 1 summed the total number of questions authored, the average rating of those questions authored, and the total number of followers a student has. Option 2 was the same as option 1 with the addition of average answers per question. Option 3 placed greater emphasis on the number of followers a student has by multiplying the number of followers + 1 (to compensate for those with no followers) with the summed total of number of questions authored, the average answers per question and the average rating of those questions authored. Option 4 placed greater emphasis on both the number of followers and the average rating achieved by a student's questions. The option 4 calculation was (the number of questions authored + the average answers per question) * (average rating + 1) * (number of followers + 1), with this option avoiding multiplying by zero by adding 1 to both the average rating and the number of followers. While all the components of the calculations depict influence, option 4 produced a far greater range of values and the highest deviation, which resulted in a greater variety of node sizes in the SNA graph. Visually this was also more informative, and therefore, option 4 was selected as the measure of influence for the study dataset. Using this approach, two of the top five most influential nodes also had the highest outdegree (i.e. questions answered).
Three commonly used educational data modelling algorithms in the literature, Logistic Regression, Naïve Bayes and Support Vector Machines (SVM), were subsequently used to interrogate the processed dataset comprising the most important attributes pertaining to PeerWise activity (see Table 5) to question if activity and/or influence on the PeerWise platform was predictive of high performing students. Based on the findings of this investigative case study, PeerWise activity alone cannot be considered a reliable predictor of student final grade given the low model accuracies noted. The low model accuracy could be attributed to model underfitting due to the relatively small sample size.
In spite of the limitations of the model accuracy, the PeerWise attributes that were most predictive of academic performance were betweenness centrality, number of answers, in-degree centrality and average explanation length. These were also the attributes that were classified as 'activity' in this case-study. There was also a positive correlation between a student's final grade and the Answers, Average Comment Length and Average Explanation Length attributes. However, these findings do not align with past research where no quantitative correlation between answering questions and an improved exam performance (Denny et al., 2008c). From a qualitative perspective, students felt part of a community within PeerWise, specifically citing peer-based content creation as a common ground for personal development:  (2008c) report that students believed answering questions was of value. A subsequent study by Denny and colleagues observed the more questions a student answered, the greater the improvement in their class scores (Denny et al., 2010). Kay and co-workers (2019) also noted a significant positive relationship between performance in end of course exams and overall PeerWise activity (with activity defined as the number of questions authored, the number of questions answered, number of quality comments generated and the number of quality comments received).
Focusing on creating questions, Denny and colleagues (2008b) noted that creating high quality questions incorporates a deep learning process which could be more beneficial in terms of understanding and long-term retrieval; however, there was no evidence in this study to suggest authoring highly ranked questions related to an improved grade performance. Indeed, Nicol's (2007) suggestion that students do not need to produce high quality MCQs; the focus should be on the learning process rather than the output, echoes the findings of the current research, whereby answering questions is more beneficial to the students than authoring questions. However, to support a true studentcentered community of learning, underpinned by accurate and appropriate MCQs, a certain level of academic guidance is required (Galloway & Burns, 2015) which in turn could empower deep student ownership and motivation as evidenced in the qualitative data: " [We] had to do independent research and study to create valid questions" (Student A, 17/18) and "[PeerWise] helps you take charge of your own learning and revision" (Student D, 17/18).

Conclusion
PeerWise is a popular, student centred, platform that can be used to enable students to asynchronously engage and learn as part of an online learning community. Within PeerWise there are several potential interactions, whereby students actively contribute to the learning space. This investigative case study sought to identify the most influential activities in PeerWise and to use these most influential activities as elements of a predictive model for final assessment performance. Whilst the predictive models generated in this study were not sufficiently accurate to permit performance prediction, the most influential PeerWise activities were observed to be betweenness centrality, number of answers, in-degree centrality and average explanation length. At a practical level, the current SNA suggests that answering questions has more of an influence than authoring questions; an observation which was also supported by the attribute ranking list. To enhance the student experience within PeerWise, and to provide an initial MCQ benchmark standard, it may be beneficial if PeerWise courses were seeded by academics with questions to encourage student's initial activity to be question answering. Subsequent academic support could focus on developing student question authoring and commenting skills so as to empower a deep student ownership of their asynchronously peerassisted learning space.

Suggestions
Although the predictive models built in this this investigative case study did not provide sufficiently high accuracy to predict a student end of module grade based on their PeerWise activity, there were positive relationships between certain activities and academic performance which warrant further exploration with a larger dataset. One of the biggest challenges in this research lay in the relatively small data sample size. The knock-on effect impacted the predictive modelling as the limited training data for the predictive models resulted in low accurately definition of their respective decision boundaries. However, future predictive modelling should benefit from the best attribute ranking list developed in this study.

Limitations
The primary limitation of this research was the size of the data set; the quality of a predictive model is related to the quality and size of the dataset used. As an investigative case-study, the findings reported are context specific, and may not be generalizable to other settings and other uses of PeerWise.