Using predictive technology to foster constructive conversations

Nextdoor’s purpose is to cultivate a kinder world where everyone has a neighborhood they can rely on. We want to give neighbors ways to connect and be kind to each other, online and in real life. One of the biggest levers we have for cultivating more neighborly interactions is by building strategic nudges throughout the product to encourage kinder conversations.

Today, we use a number of mechanisms to encourage kindness on the platform, including pop-up reminders that slow neighbors down before responding negatively. Over the past few years, we’ve used machine learning models to identify uncivil and contentious content.

Nextdoor’s definition of harmful and hurtful content is anything containing uncivil, fraudulent, unsafe, or unwelcoming, including personal attacks, misinformation and discrimination. In partnership with key experts and academics, we identified various moments that add friction on the platform and implemented those findings with our machine learning technology. Our goal: to encourage neighbors to conduct more mindful conversations. What if we can be proactive and intervene before the conversations spark more abusive responses? Oftentimes unkind comments beget more unkind comments. 90% of abusive comments appear in a thread with another abusive comment, and 50% of abusive comments appear in a thread with 13+ other abusive comments.* By preventing some of these comments before they happen, we can avoid the resulting negative feedback loops.

Nextdoor’s thread model was built to identify potentially contentious conversations, and where intervention might prevent abusive content. The Kindness Reminder, introduced in 2019, and Anti-Racism Notification, launched in 2021, automatically detect offensive or racist language in written comments and encourage the author to edit before it is published. The new Constructive Conversations Reminder uses predictive technology to anticipate when a comment thread may become heated before a neighbor contributes. Below we will share details on how we build the model powering the intervention tools.

What is a thread? Nextdoor conversation threads occur inside a post. Once a neighbor creates a post, other neighbors can comment on the post or respond to each other’s comments.

Neighbors see the comments ordered sequentially and can reply to existing comments. Often, tagging was also used to clarify when the comment is replying to a previous comment further back in the conversation.

The multiple dimensions of this conversation created some complexity around how we define each comment’s parent node and traverse along the parent nodes to recreate the conversation thread. Based on our analysis, we found that the most predictive representation considers all the relationships: mentions or tags, reply to comment hierarchy, and sequential order.

Data

Labels

How do we identify when a conversation is becoming contentious? On Nextdoor, neighbors can report content, and volunteer community moderators help to review and remove content based on our Community Guidelines. For somes cases such as misinformation and discrimination, these reports are sent directly to our trained Neighborhood Operation Staff to review. For the purpose of this model, we decided to use reporting rather than removal as a signal because we wanted a tool to detect early signals of conversations going off the tracks. Regardless of the moderator’s decision to keep or remove a comment, we can assume if the conversation triggers a report it will warrant intervention to prevent potentially contentious responses from being created in that thread. Therefore, for each thread, we chose the creation of a subsequent reported comment as a positive label.

Sampling

Our training data was sampled across multiple months. Less than 1% of comments get reported, and the reported comments tend to cluster together, so this an imbalanced dataset.* To improve the model, we needed to oversample the positive labels, but doing so in a way that results in a representative distribution of comments from different threads. Comments, especially reported comments, tend to cluster around larger, trending threads, and if we sampled randomly we knew we may not get data representative of the many different types of conversations on Nextdoor. We took two steps in the sampling to create a balanced representation of comments from different threads with positive/negative labels. First we sampled by post, and then within the comment thread for each post, we sampled both negative and positive labels.

Model architecture

Features

The key feature in our model is the comment text. There are other features such as the number of reports on a neighbor’s previous content, and comment creation velocity that adds signal, but we found that most of the AUC gains can be made by picking the appropriate thread structure and generating text embeddings from that structure. In future iterations of the model, we aim to make the other features available to the model.

Embedding selection

The embedding, which creates a vector representation of the texts, is an important component of the features. We considered using two different technologies for the embeddings, Fasttext and Bert, comparing their pros and cons listed below:

One of the advantages of Bert is that we can leverage pretrained multilingual aligned embeddings that allows a task model trained on U.S. data alone to perform in other languages and countries. Although Bert model has higher latency than Fasttext, ultimately, we decided it was worth the trade-off because the new features and applications we’ll be running off the model can run asynchronously.

Below describes the architecture for the system. We’re able to tolerate the higher latency at inference time by pre-generating and storing the embedding features, caching scores to be consumed later and delaying downstream tasks dependent on the embeddings.

The classifier itself is a simple dense neural layer built on top of the concatenated embeddings:

We built the embedding features using the sBert API, and experimented with various fine-tuning approaches that might improve the performance. We will discuss the exploration of multilingual embeddings in future blog posts.

Model performance

This model was tasked with predicting whether a future comment on a thread will be abusive. This is a difficult task without any features provided on the target comment. Despite the challenges of this task, the model had a relatively high AUC over 0.83, and was able to achieve double digit precision and recall at certain thresholds.

Below are some examples of comments from threads, and how the model predicted the abusive risk level for these threads:

We can see the model is generally able to identify as comments become more contentious. There is an overall limit on the model’s precision rate, due to low incidence of reporting and challenge of predicting an unknown future comment. Not all abusive content gets reported, as the reporting is primarily driven by the norms of the community, and neighbor awareness of the reporting feature. As a result, we may see a higher false positive rate (unreported comments that model label as highly contentious). A human review of a random sample of this group suggests that these false positive comments are often similar in contentious levels as the true positives.

Internationalization

Once we were able to validate the model in the U.S., the next step was internationalization. The Bert model we selected includes multilingual-aligned embeddings, which was built using the teacher-student model. This model aligns embeddings with similar meaning in different languages to the same vector space. For example “Hello neighbors!” and “Hola vecinos!” would map to similar dimensions.

We found through both offline validation and online A/B testing that even in the U.S., tasks trained on multilingual-aligned embeddings can perform just as well as those trained on English-only embeddings.

We also tested the U.S. model on data from other languages and countries because as Nextdoor continues to expand to other countries and languages, we want our models to be available in each market. We evaluated the U.S.-trained model on multiple countries in Europe where we have a relatively higher adoption rate.

In all countries, with the exception of the Netherlands, we found that the AUC was quite close to U.S. levels. Even in the Netherlands, the performance provided enough signal to test on our products. We did notice a slightly lower precision and recall rate overall in Europe, which could be due to lower reporting rates as compared to the U.S.

Below are some examples of the texts from international threads that were flagged as potentially turning contentious (modified to protect privacy). As you can see, despite cultural and language differences, for the most part, the model was still able to pick up on contentious conversations.

  • “I don’t share your faith in Brexit …”
  • “Clearly ignorant of science and facts and harming their own business.”
  • “Your own comment contradicts itself…”
  • “La vérité fâche…” (the truth hurts)
  • “je draait om mijn vraag heen” (you’re avoiding my question)
  • “io ti ho scritto esattamente ció che avevo scritto in quei commenti che dici che ho cancellato…” (I wrote you exactly what I wrote in those comments that you say I delete)

Ultimately, the performance won’t match the same level as models trained directly on international data, but the current performance is sufficient signal for some of our moderation intervention tools. One caveat for this analysis is that we only evaluated on western countries where we had adequate data. Therefore, it is unclear whether or not these results will translate for non-western cultures and languages.

The ability to predict abusive threads in other languages means that even in countries with sparse data for training, we can transfer what we’ve learned from U.S. data for intervention signals. This will allow us to expand our moderation tools across the globe.

Impact

We found that this signal, when accurately predicted, can be be leveraged by a variety of different product tools to decrease uncivil content:

  1. Comment notification suppression: If a conversation is going awry, suppress notification on the triggering comment
  2. Constructive Conversations Reminder: Prompt neighbors to take an empathetic stance when they are about to comment in a contentious thread

3. Prompt author to close discussion: Remind the post authors they have the ability to close discussion

  • Once we deployed the model, we were able to start testing some of the intervention methods mentioned above. So far, our results have demonstrated that the model can perform quite well. Comment notification suppression has been rolled out, and Constructive Conversations Reminder has begun rolling out to neighbors in the U.S.

We hope to continue to leverage these findings to expand our toolbox and Nextdoor is a kind and welcoming platform for all neighbors. This work wouldn’t have been possible without the help of Sugin Lou, Karthik Jayasurya, the CoreML team, and the Moderation Team engineers who built the products that the model powers. We continue to partner with leading academics and experts in the fields of social psychology, equality, and civic engagement on our Neighborhood Vitality Advisory Board. Learn more about Nextdoor’s product and policy initiatives to foster a holistically inclusive platform on the Nextdoor Blog.

If you are passionate about solving problems that empower local communities and encourage civic engagement, please check our careers page and come join us!

*Source: Nextdoor internal data, based on Q3 2021 data primarily based in US

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-23 05:00
浙ICP备14020137号-1 $访客地图$