Quantifying social organization and political polarization in online platforms Nature

Mass selection into groups of like-minded individuals may be fragmenting and polarizing online society, particularly with respect to partisan differences1–4. However, our ability to measure the social makeup of online communities and in turn, to understand the social organization of online platforms, is limited by the pseudonymous, unstructured and large-scale nature of digital discussion. Here we develop a neural-embedding methodology to quantify the positioning of online communities along social dimensions by leveraging large-scale patterns of aggregate behaviour. Applying our methodology to 5.1 billion comments made in 10,000 communities over 14 years on Reddit, we measure how the macroscale community structure is organized with respect to age, gender and US political partisanship. Examining political content, we find that Reddit underwent a significant polarization event around the 2016 US presidential election. Contrary to conventional wisdom, however, individual-level polarization is rare; the system-level shift in 2016 was disproportionately driven by the arrival of new users. Political polarization on Reddit is unrelated to previous activity on the platform and is instead temporally aligned with external events. We also observe a stark ideological asymmetry, with the sharp increase in polarization in 2016 being entirely attributable to changes in right-wing activity. This methodology is broadly applicable to the study of online interaction, and our findings have implications for the design of online platforms, understanding the social contexts of online behaviour, and quantifying the dynamics and mechanisms of online polarization. A new method quantifies the social makeup of online communities, and applying it to 14 years of commenting patterns on Reddit shows increased polarization in 2016, driven by new users to the platform.

  • Isaac Waller ORCID: orcid.org/0000-0003-4283-25021 &
  • Ashton Anderson ORCID: orcid.org/0000-0003-3089-68831

Naturevolume 600, pages 264–268 (2021)Cite this article

  • 20k Accesses

  • 53 Citations

  • 438 Altmetric

  • Metrics details

  • Computer science

  • Interdisciplinary studies

  • Sociology

Mass selection into groups of like-minded individuals may be fragmenting and polarizing online society, particularly with respect to partisan differences1,2,3,4. However, our ability to measure the social makeup of online communities and in turn, to understand the social organization of online platforms, is limited by the pseudonymous, unstructured and large-scale nature of digital discussion. Here we develop a neural-embedding methodology to quantify the positioning of online communities along social dimensions by leveraging large-scale patterns of aggregate behaviour. Applying our methodology to 5.1 billion comments made in 10,000 communities over 14 years on Reddit, we measure how the macroscale community structure is organized with respect to age, gender and US political partisanship. Examining political content, we find that Reddit underwent a significant polarization event around the 2016 US presidential election. Contrary to conventional wisdom, however, individual-level polarization is rare; the system-level shift in 2016 was disproportionately driven by the arrival of new users. Political polarization on Reddit is unrelated to previous activity on the platform and is instead temporally aligned with external events. We also observe a stark ideological asymmetry, with the sharp increase in polarization in 2016 being entirely attributable to changes in right-wing activity. This methodology is broadly applicable to the study of online interaction, and our findings have implications for the design of online platforms, understanding the social contexts of online behaviour, and quantifying the dynamics and mechanisms of online polarization.

Access through your institutionChange institutionBuy or subscribeAccess Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Learn moreSubscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Learn moreBuy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Buy nowPrices may be subject to local taxes which are calculated during checkout

Fig. 1: Quantifying social dimensions on Reddit.Fig. 2: Macroscale social organization of Reddit communities.Fig. 3: Distribution of political activity on Reddit.Fig. 4: Political polarization of new and existing users.Fig. 5: Ideological asymmetry in online polarization.

Similar content being viewed by others

The role of the big geographic sort in online news circulation among U.S. Reddit users

ArticleOpen access25 April 2023

No echo in the chambers of political interactions on Reddit

ArticleOpen access02 February 2021

Chinese online nationalism as imaginary engagement: an automated sentiment analysis of Tencent news comments on the 2012 Diaoyu (Senkaku) Islands incident

ArticleOpen access04 April 2024

The role of the big geographic sort in online news circulation among U.S. Reddit users

ArticleOpen access25 April 2023

No echo in the chambers of political interactions on Reddit

ArticleOpen access02 February 2021

Chinese online nationalism as imaginary engagement: an automated sentiment analysis of Tencent news comments on the 2012 Diaoyu (Senkaku) Islands incident

ArticleOpen access04 April 2024ArticleOpen access25 April 2023ArticleOpen access02 February 2021ArticleOpen access04 April 2024

Data availability

All data are available from the pushshift.io Reddit archive28 at http://files.pushshift.io/reddit/. Source data are provided with this paper. Reddit community embedding, social dimension vectors and community scores are available at https://github.com/CSSLab/social-dimensions.

Code availability

All code is available at https://github.com/CSSLab/social-dimensions. Analyses were performed with Python v3.7, pandas v1.3.3 and Spark v3.0.

References

  1. Sunstein, C. #Republic: Divided Democracy in the Age of Social Media (Princeton Univ. Press, 2018).
  2. Iyengar, S. & Hahn, K. S. Red media, blue media: evidence of ideological selectivity in media use. *J. Commun.*59, 19–39 (2009).

Article Google Scholar 66. van Alstyne, M. & Brynjolfsson, E. Electronic communities: global villages or cyberbalkanization? In Proc. International Conference on Information Systems 5 https://aisel.aisnet.org/icis1996/5 (1996). 67. van Dijck, J. The Culture of Connectivity: A Critical History of Social Media (Oxford Univ. Press, 2013). 68. McLuhan, M. The Gutenberg Galaxy: The Making of Typographic Man (Univ. of Toronto Press, 1962). 69. Farrell, H. The consequences of the internet for politics. Ann. Rev. Pol. Sci. 15, 35–52 (2012).

Article Google Scholar 70. Conover, M. D. et al. Political polarization on Twitter. Proc. Intl AAAI Conf. Web Soc. Media133, 89–96 (2011).

Google Scholar 71. Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proc. Natl Acad. Sci. USA 115, 9216–9221 (2018).

Article CAS Google Scholar 72. Martin, T. community2vec: vector representations of online communities encode semantic relationships. In Proc. 2nd Workshop on NLP and Computational Social Science 27–31 (2017). 73. Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA115, E3635–E3644 (2018).

Article CAS Google Scholar 74. Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V. & Kalai, A. T. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29, 4349–4357 (2016). 75. Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).

Article CAS ADS Google Scholar 76. Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of culture: analyzing the meanings of class through word embeddings. *Am. Soc. Rev.*84, 905–949 (2019).

Article Google Scholar 77. Shi, F., Shi, Y., Dokshin, F. A., Evans, J. A. & Macy, M. W. Millions of online book co-purchases reveal partisan differences in the consumption of science. *Nat. Hum. Behav.*1, 0079 (2017).

Article Google Scholar 78. Del Vicario, M. et al. Echo chambers: emotional contagion and group polarization on Facebook. *Sci. Rep.*6, 37825 (2016).

Article ADS Google Scholar 79. Pariser, E. The Filter Bubble: What the Internet is Hiding from You (Penguin, 2011). 80. Flaxman, S., Goel, S. & Rao, J. M. Filter bubbles, echo chambers, and online news consumption. *Public Opin. Q.*80, 298–320 (2016).

Article Google Scholar 81. Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).

Article MathSciNet CAS ADS Google Scholar 82. DiMaggio, P., Evans, J. & Bryson, B. Have American’s social attitudes become more polarized? *Am. J. Sociol.*102, 690–755 (1996).

Article Google Scholar 83. Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: is online political communication more than an echo chamber? *Psychol. Sci.*26, 1531–1542 (2015).

Article Google Scholar 84. Adamic, L. A. & Glance, N. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd International Workshop on Link Discovery 36–43 (2005). 85. An Examination of the 2016 Electorate, Based on Validated Votershttps://www.pewresearch.org/politics/2018/08/09/an-examination-of-the-2016-electorate-based-on-validated-voters/ (Pew Research Center, 2018). 86. Hawley, G. Making Sense of the Alt-Right (Columbia Univ. Press, 2017). 87. Simmel, G. Conflict and the Web of Group Affiliations (Free Press, 1955). 88. Breiger, R. L. The duality of persons and groups. Social Forces 53, 181–190 (1974).

Article Google Scholar 89. Bourdieu, P. Distinction: A Social Critique of the Judgement of Taste (Routledge, 1984). 90. Crenshaw, K. W. On Intersectionality: Essential Writings (The New Press, 2017). 91. Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The Pushshift Reddit dataset. In Proc. International AAAI Conference on Web and Social Media14, 830–839 (2020). 92. Reddit privacy policy Reddithttps://www.redditinc.com/policies/privacy-policy (2021). 93. Kumar, S., Hamilton, W. L., Leskovec, J. & Jurafsky, D. Community interaction and conflict on the web. In Proc. 2018 World Wide Web Conference 933–943 (2018). 94. Waller, I. & Anderson, A. Generalists and specialists: using community embeddings to quantify activity diversity in online platforms. In Proc. 2019 World Wide Web Conference 1954–1964 (2019). 95. Levy, O. & Goldberg, Y. Dependency-based word embeddings. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics2, 302–308 (2014). 96. Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. *Adv. Neural Inf. Process. Syst.*27, 2177–2185 (2014).

Google Scholar 97. Schlechtweg, D., Oguz, C. & im Walde, S. S., Second-order co-occurrence sensitivity of skip-gram with negative sampling. Preprint at https://arxiv.org/abs/1906.02479 (2019). Download references

Acknowledgements

This research was supported by the National Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI) and the Ontario Research Fund (ORF).

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Isaac Waller & Ashton Anderson Authors105. Isaac WallerView author publicationsYou can also search for this author in PubMed Google Scholar 106. Ashton AndersonView author publicationsYou can also search for this author in PubMed Google Scholar

Contributions

I.W. performed the computational analysis. A.A. and I.W. designed the research, analysed the results and wrote the paper.

Corresponding author

Correspondence to Ashton Anderson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature thanks Kenneth Benoit, Kate Starbird and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Distribution of community scores.

Left: distributions of communities on the age, gender, partisan, and affluence dimensions. Right: the most extreme communities and words on those dimensions. Word scores are calculated by averaging community scores weighted by the number of occurrences of the word in the community in 2017. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 2 External validations of social dimensions.

Scatter plots of the external validations of the gender, partisan, and affluence axes. The gender scores for occupational communities are plotted against the percentage of women in that occupation from the 2018 American Community Survey. The partisan scores for city communities are plotted against the Republican vote differential for that metropolitan area in the 2016 presidential election. The affluence scores of city communities are plotted against the median household income for that metropolitan area from the 2016 US Census. The blue line is the best-fit linear regression for the data; the shaded area represents a 95% confidence interval for the regression estimated using a bootstrap. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 3 Further validations of social dimensions.

Clockwise from left: The gap between university and city communities on the age dimension. The distribution of university and city communities on the age dimension; age is strongly related to label (\(r=0.91\), two-sided \(p < {10}^{-58}\), \(n=150\), Cohen’s \(d=4.37\)). The distribution of left and right wing labelled communities on the partisan dimension; partisan is strongly related to label (\(r=0.92\), two-sided \(p < {10}^{-21}\), \(n=50\), Cohen’s \(d=4.89\)). The distribution of explicitly labelled left- and right-wing communities on the partisan-ness axis as compared to the general distribution; there is a large difference in their means (Cohen’s \(d=3.27\)). For violin plots, white dot represents median; box represents 25th to 75th percentile; whiskers represent 1.5 times the inter-quartile range; and density estimate (‘violin’) extends to the minima and maxima of the data. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 4 Distributions of age, gender and partisan scores by cluster.

Distributions of raw age, gender and partisan scores, separated by cluster. Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 5 Distributions of affluence, time, sociality and edgy scores by cluster.

Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 6 Relationships between online social dimensions.

The relationships between the partisan dimension and (a) gender, (b) age, (c) partisan-ness. Every bar represents a bin of communities with partisan scores a given number of standard deviations from the mean, and the distribution illustrates the scores on the secondary dimension (e.g. gender in (a)). From left to right, the bars represent highly left-wing, leaning left-wing, center, leaning right-wing, highly right-wing communities. The leftmost and rightmost bars are annotated with the number of communities, and examples of the largest communities, in each group. The hex-plot in (c) illustrates the joint distribution of partisan and partisan-ness scores. Labels correspond to the categorizations used in the polarization analysis.

Extended Data Fig. 7 Polarization robustness checks.

(a) The partisan distribution of deleted and non-deleted comments in political communities. (b) The proportion of activity that took place in very left-wing (\(z < -3\)) and very right-wing (\(z > 3\)) communities over time. (c) Alternate version of Fig. 3a generated using a dataset in which the authorship of all comments was randomly shuffled. Each individual bin distribution is extremely similar to the overall activity distribution, showing that the overall activity distribution is a useful reference point for what bin distributions would look like if there were no tendency for users to comment in ideologically homogeneous communities. (d) Average distributions of political activity for authors of comments in the 25 largest political communities on Reddit (by number of comments). (e) Correlation of users’ average partisan scores over time. Each \(\left(x,y\right)\) cell represents the correlation between scores of a user in month \({t}_{x}\) and that same user in month \({t}_{y}\), for all users active in both time periods. A user is only considered active if they make at least \(10\) comments in a month. (f) The relationship between the proportion of users who polarize and the polarization threshold. The polarization threshold is the number of standard deviations a user must increase in polarization to be considered polarized. Three lines are plotted corresponding to three pairs of months; the pairs of months with the minimum (blue), maximum (orange), and median (green) proportion of users polarized when using a threshold of \(1\). A threshold of \(1\) is used in all other calculations. (g) The relationship between the proportion of users who polarize and the comment threshold. The comment threshold is the value used to filter inactive users from the calculation. Users must have at least \(x\) comments in each of the two months to be included in the calculation of the proportion of users who polarize. The same three month pairs are plotted as in part (e). There are minimal differences between different thresholds. A threshold of \(10\) is used in all other calculations.

Extended Data Fig. 8 Distribution of political activity by user group.

The distribution of political activity on Reddit over time by partisan score. Each bar represents one month of comment activity in political communities on Reddit, and is coloured according to the distribution of partisan scores of comments posted during the month (the partisan score of a comment is simply the partisan score of the community in which it was posted.) The top plot includes all activity as in Fig. 3b, while the four following plots decompose this into the subsets of activity authored by particular groups of users. Users are classified based on the average partisan score of their activity in the month 12 months prior–into left-wing (having a score at least one standard deviation to the left), right-wing (one standard deviation to the right), or center. Users with no political activity in the month 12 months prior use the label of the most recent month more than 12 months prior in which they had political activity; if they have never had political activity before, they fall into the new / newly political category (bottom).

Extended Data Fig. 9 Additional measures of ideological asymmetry.

(a) Average polarization (absolute \(z\)-score) of activity in different ideological categories over time. (b) Volume of activity (number of comments) in different ideological categories over time. (c, d) Annual change in polarization in the two partisan activity categories, decomposed into the change attributable to new (\(\varDelta n\)) and existing (\(\varDelta e\)) users as done in Fig. 4.

Extended Data Fig. 10 Implicit polarization.

The relationship between explicitly partisan and implicitly partisan activity (left: left-wing activity; right: right-wing activity.) Of users who were first active in an explicitly partisan community at time \({m}_{E}\), the proportion of them who were first active in an implicitly partisan community at time \({m}_{I}\) is denoted by the colour in cell \(\left({m}_{E},{m}_{I}\right)\). The line graphs at the top show the total proportion of users who were active in implicitly partisan communities before they were active in an explicitly partisan community (i.e. the sum of each column below the diagonal back to 2005, or the total proportion of users for whom \({m}_{I} < {m}_{E}\)).

Extended Data Table 1 Social dimension seedsFull size table

Supplementary information

Supplementary Information

This file contains Supplementary Tables 1 and 2.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature600, 264–268 (2021). https://doi.org/10.1038/s41586-021-04167-x

Download citation

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable linkSorry, a shareable link is not currently available for this article.

Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

All data are available from the pushshift.io Reddit archive28 at http://files.pushshift.io/reddit/. Source data are provided with this paper. Reddit community embedding, social dimension vectors and community scores are available at https://github.com/CSSLab/social-dimensions.

All code is available at https://github.com/CSSLab/social-dimensions. Analyses were performed with Python v3.7, pandas v1.3.3 and Spark v3.0.

  1. Sunstein, C. #Republic: Divided Democracy in the Age of Social Media (Princeton Univ. Press, 2018).
  2. Iyengar, S. & Hahn, K. S. Red media, blue media: evidence of ideological selectivity in media use. *J. Commun.*59, 19–39 (2009).

Article Google Scholar 168. van Alstyne, M. & Brynjolfsson, E. Electronic communities: global villages or cyberbalkanization? In Proc. International Conference on Information Systems 5 https://aisel.aisnet.org/icis1996/5 (1996). 169. van Dijck, J. The Culture of Connectivity: A Critical History of Social Media (Oxford Univ. Press, 2013). 170. McLuhan, M. The Gutenberg Galaxy: The Making of Typographic Man (Univ. of Toronto Press, 1962). 171. Farrell, H. The consequences of the internet for politics. Ann. Rev. Pol. Sci. 15, 35–52 (2012).

Article Google Scholar 172. Conover, M. D. et al. Political polarization on Twitter. Proc. Intl AAAI Conf. Web Soc. Media133, 89–96 (2011).

Google Scholar 173. Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proc. Natl Acad. Sci. USA 115, 9216–9221 (2018).

Article CAS Google Scholar 174. Martin, T. community2vec: vector representations of online communities encode semantic relationships. In Proc. 2nd Workshop on NLP and Computational Social Science 27–31 (2017). 175. Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA115, E3635–E3644 (2018).

Article CAS Google Scholar 176. Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V. & Kalai, A. T. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29, 4349–4357 (2016). 177. Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).

Article CAS ADS Google Scholar 178. Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of culture: analyzing the meanings of class through word embeddings. *Am. Soc. Rev.*84, 905–949 (2019).

Article Google Scholar 179. Shi, F., Shi, Y., Dokshin, F. A., Evans, J. A. & Macy, M. W. Millions of online book co-purchases reveal partisan differences in the consumption of science. *Nat. Hum. Behav.*1, 0079 (2017).

Article Google Scholar 180. Del Vicario, M. et al. Echo chambers: emotional contagion and group polarization on Facebook. *Sci. Rep.*6, 37825 (2016).

Article ADS Google Scholar 181. Pariser, E. The Filter Bubble: What the Internet is Hiding from You (Penguin, 2011). 182. Flaxman, S., Goel, S. & Rao, J. M. Filter bubbles, echo chambers, and online news consumption. *Public Opin. Q.*80, 298–320 (2016).

Article Google Scholar 183. Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).

Article MathSciNet CAS ADS Google Scholar 184. DiMaggio, P., Evans, J. & Bryson, B. Have American’s social attitudes become more polarized? *Am. J. Sociol.*102, 690–755 (1996).

Article Google Scholar 185. Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: is online political communication more than an echo chamber? *Psychol. Sci.*26, 1531–1542 (2015).

Article Google Scholar 186. Adamic, L. A. & Glance, N. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd International Workshop on Link Discovery 36–43 (2005). 187. An Examination of the 2016 Electorate, Based on Validated Votershttps://www.pewresearch.org/politics/2018/08/09/an-examination-of-the-2016-electorate-based-on-validated-voters/ (Pew Research Center, 2018). 188. Hawley, G. Making Sense of the Alt-Right (Columbia Univ. Press, 2017). 189. Simmel, G. Conflict and the Web of Group Affiliations (Free Press, 1955). 190. Breiger, R. L. The duality of persons and groups. Social Forces 53, 181–190 (1974).

Article Google Scholar 191. Bourdieu, P. Distinction: A Social Critique of the Judgement of Taste (Routledge, 1984). 192. Crenshaw, K. W. On Intersectionality: Essential Writings (The New Press, 2017). 193. Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The Pushshift Reddit dataset. In Proc. International AAAI Conference on Web and Social Media14, 830–839 (2020). 194. Reddit privacy policy Reddithttps://www.redditinc.com/policies/privacy-policy (2021). 195. Kumar, S., Hamilton, W. L., Leskovec, J. & Jurafsky, D. Community interaction and conflict on the web. In Proc. 2018 World Wide Web Conference 933–943 (2018). 196. Waller, I. & Anderson, A. Generalists and specialists: using community embeddings to quantify activity diversity in online platforms. In Proc. 2019 World Wide Web Conference 1954–1964 (2019). 197. Levy, O. & Goldberg, Y. Dependency-based word embeddings. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics2, 302–308 (2014). 198. Levy, O. & Goldberg, Y. Neural word embedding as implicit matrix factorization. *Adv. Neural Inf. Process. Syst.*27, 2177–2185 (2014).

Google Scholar 199. Schlechtweg, D., Oguz, C. & im Walde, S. S., Second-order co-occurrence sensitivity of skip-gram with negative sampling. Preprint at https://arxiv.org/abs/1906.02479 (2019). Download references

This research was supported by the National Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation (CFI) and the Ontario Research Fund (ORF).

Authors and Affiliations

  1. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Isaac Waller & Ashton Anderson Authors205. Isaac WallerView author publicationsYou can also search for this author in PubMed Google Scholar 206. Ashton AndersonView author publicationsYou can also search for this author in PubMed Google Scholar

Contributions

I.W. performed the computational analysis. A.A. and I.W. designed the research, analysed the results and wrote the paper.

Corresponding author

Correspondence to Ashton Anderson.

  1. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Isaac Waller & Ashton Anderson Authors213. Isaac WallerView author publicationsYou can also search for this author in PubMed Google Scholar 214. Ashton AndersonView author publicationsYou can also search for this author in PubMed Google Scholar

Contributions

I.W. performed the computational analysis. A.A. and I.W. designed the research, analysed the results and wrote the paper.

Corresponding author

Correspondence to Ashton Anderson.

Competing interests

The authors declare no competing interests.

The authors declare no competing interests.

Peer review informationNature thanks Kenneth Benoit, Kate Starbird and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended Data Fig. 1 Distribution of community scores.

Left: distributions of communities on the age, gender, partisan, and affluence dimensions. Right: the most extreme communities and words on those dimensions. Word scores are calculated by averaging community scores weighted by the number of occurrences of the word in the community in 2017. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 2 External validations of social dimensions.

Scatter plots of the external validations of the gender, partisan, and affluence axes. The gender scores for occupational communities are plotted against the percentage of women in that occupation from the 2018 American Community Survey. The partisan scores for city communities are plotted against the Republican vote differential for that metropolitan area in the 2016 presidential election. The affluence scores of city communities are plotted against the median household income for that metropolitan area from the 2016 US Census. The blue line is the best-fit linear regression for the data; the shaded area represents a 95% confidence interval for the regression estimated using a bootstrap. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 3 Further validations of social dimensions.

Clockwise from left: The gap between university and city communities on the age dimension. The distribution of university and city communities on the age dimension; age is strongly related to label (\(r=0.91\), two-sided \(p < {10}^{-58}\), \(n=150\), Cohen’s \(d=4.37\)). The distribution of left and right wing labelled communities on the partisan dimension; partisan is strongly related to label (\(r=0.92\), two-sided \(p < {10}^{-21}\), \(n=50\), Cohen’s \(d=4.89\)). The distribution of explicitly labelled left- and right-wing communities on the partisan-ness axis as compared to the general distribution; there is a large difference in their means (Cohen’s \(d=3.27\)). For violin plots, white dot represents median; box represents 25th to 75th percentile; whiskers represent 1.5 times the inter-quartile range; and density estimate (‘violin’) extends to the minima and maxima of the data. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Extended Data Fig. 4 Distributions of age, gender and partisan scores by cluster.

Distributions of raw age, gender and partisan scores, separated by cluster. Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 5 Distributions of affluence, time, sociality and edgy scores by cluster.

Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Extended Data Fig. 6 Relationships between online social dimensions.

The relationships between the partisan dimension and (a) gender, (b) age, (c) partisan-ness. Every bar represents a bin of communities with partisan scores a given number of standard deviations from the mean, and the distribution illustrates the scores on the secondary dimension (e.g. gender in (a)). From left to right, the bars represent highly left-wing, leaning left-wing, center, leaning right-wing, highly right-wing communities. The leftmost and rightmost bars are annotated with the number of communities, and examples of the largest communities, in each group. The hex-plot in (c) illustrates the joint distribution of partisan and partisan-ness scores. Labels correspond to the categorizations used in the polarization analysis.

Extended Data Fig. 7 Polarization robustness checks.

(a) The partisan distribution of deleted and non-deleted comments in political communities. (b) The proportion of activity that took place in very left-wing (\(z < -3\)) and very right-wing (\(z > 3\)) communities over time. (c) Alternate version of Fig. 3a generated using a dataset in which the authorship of all comments was randomly shuffled. Each individual bin distribution is extremely similar to the overall activity distribution, showing that the overall activity distribution is a useful reference point for what bin distributions would look like if there were no tendency for users to comment in ideologically homogeneous communities. (d) Average distributions of political activity for authors of comments in the 25 largest political communities on Reddit (by number of comments). (e) Correlation of users’ average partisan scores over time. Each \(\left(x,y\right)\) cell represents the correlation between scores of a user in month \({t}_{x}\) and that same user in month \({t}_{y}\), for all users active in both time periods. A user is only considered active if they make at least \(10\) comments in a month. (f) The relationship between the proportion of users who polarize and the polarization threshold. The polarization threshold is the number of standard deviations a user must increase in polarization to be considered polarized. Three lines are plotted corresponding to three pairs of months; the pairs of months with the minimum (blue), maximum (orange), and median (green) proportion of users polarized when using a threshold of \(1\). A threshold of \(1\) is used in all other calculations. (g) The relationship between the proportion of users who polarize and the comment threshold. The comment threshold is the value used to filter inactive users from the calculation. Users must have at least \(x\) comments in each of the two months to be included in the calculation of the proportion of users who polarize. The same three month pairs are plotted as in part (e). There are minimal differences between different thresholds. A threshold of \(10\) is used in all other calculations.

Extended Data Fig. 8 Distribution of political activity by user group.

The distribution of political activity on Reddit over time by partisan score. Each bar represents one month of comment activity in political communities on Reddit, and is coloured according to the distribution of partisan scores of comments posted during the month (the partisan score of a comment is simply the partisan score of the community in which it was posted.) The top plot includes all activity as in Fig. 3b, while the four following plots decompose this into the subsets of activity authored by particular groups of users. Users are classified based on the average partisan score of their activity in the month 12 months prior–into left-wing (having a score at least one standard deviation to the left), right-wing (one standard deviation to the right), or center. Users with no political activity in the month 12 months prior use the label of the most recent month more than 12 months prior in which they had political activity; if they have never had political activity before, they fall into the new / newly political category (bottom).

Extended Data Fig. 9 Additional measures of ideological asymmetry.

(a) Average polarization (absolute \(z\)-score) of activity in different ideological categories over time. (b) Volume of activity (number of comments) in different ideological categories over time. (c, d) Annual change in polarization in the two partisan activity categories, decomposed into the change attributable to new (\(\varDelta n\)) and existing (\(\varDelta e\)) users as done in Fig. 4.

Extended Data Fig. 10 Implicit polarization.

The relationship between explicitly partisan and implicitly partisan activity (left: left-wing activity; right: right-wing activity.) Of users who were first active in an explicitly partisan community at time \({m}_{E}\), the proportion of them who were first active in an implicitly partisan community at time \({m}_{I}\) is denoted by the colour in cell \(\left({m}_{E},{m}_{I}\right)\). The line graphs at the top show the total proportion of users who were active in implicitly partisan communities before they were active in an explicitly partisan community (i.e. the sum of each column below the diagonal back to 2005, or the total proportion of users for whom \({m}_{I} < {m}_{E}\)).

Extended Data Table 1 Social dimension seedsFull size tableLeft: distributions of communities on the age, gender, partisan, and affluence dimensions. Right: the most extreme communities and words on those dimensions. Word scores are calculated by averaging community scores weighted by the number of occurrences of the word in the community in 2017. Community descriptions can be found in the glossary (Supplementary Table 1).

Scatter plots of the external validations of the gender, partisan, and affluence axes. The gender scores for occupational communities are plotted against the percentage of women in that occupation from the 2018 American Community Survey. The partisan scores for city communities are plotted against the Republican vote differential for that metropolitan area in the 2016 presidential election. The affluence scores of city communities are plotted against the median household income for that metropolitan area from the 2016 US Census. The blue line is the best-fit linear regression for the data; the shaded area represents a 95% confidence interval for the regression estimated using a bootstrap. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Clockwise from left: The gap between university and city communities on the age dimension. The distribution of university and city communities on the age dimension; age is strongly related to label (\(r=0.91\), two-sided \(p < {10}^{-58}\), \(n=150\), Cohen’s \(d=4.37\)). The distribution of left and right wing labelled communities on the partisan dimension; partisan is strongly related to label (\(r=0.92\), two-sided \(p < {10}^{-21}\), \(n=50\), Cohen’s \(d=4.89\)). The distribution of explicitly labelled left- and right-wing communities on the partisan-ness axis as compared to the general distribution; there is a large difference in their means (Cohen’s \(d=3.27\)). For violin plots, white dot represents median; box represents 25th to 75th percentile; whiskers represent 1.5 times the inter-quartile range; and density estimate (‘violin’) extends to the minima and maxima of the data. \(p\)-values for correlation coefficients computed using two-sided test of Pearson correlation assuming joint normality.

Distributions of raw age, gender and partisan scores, separated by cluster. Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

Outlier communities that lie more than two standard deviations from the mean are annotated. Dashed lines represent the global mean on each dimension. Community descriptions can be found in the glossary (Supplementary Table 1).

The relationships between the partisan dimension and (a) gender, (b) age, (c) partisan-ness. Every bar represents a bin of communities with partisan scores a given number of standard deviations from the mean, and the distribution illustrates the scores on the secondary dimension (e.g. gender in (a)). From left to right, the bars represent highly left-wing, leaning left-wing, center, leaning right-wing, highly right-wing communities. The leftmost and rightmost bars are annotated with the number of communities, and examples of the largest communities, in each group. The hex-plot in (c) illustrates the joint distribution of partisan and partisan-ness scores. Labels correspond to the categorizations used in the polarization analysis.

(a) The partisan distribution of deleted and non-deleted comments in political communities. (b) The proportion of activity that took place in very left-wing (\(z < -3\)) and very right-wing (\(z > 3\)) communities over time. (c) Alternate version of Fig. 3a generated using a dataset in which the authorship of all comments was randomly shuffled. Each individual bin distribution is extremely similar to the overall activity distribution, showing that the overall activity distribution is a useful reference point for what bin distributions would look like if there were no tendency for users to comment in ideologically homogeneous communities. (d) Average distributions of political activity for authors of comments in the 25 largest political communities on Reddit (by number of comments). (e) Correlation of users’ average partisan scores over time. Each \(\left(x,y\right)\) cell represents the correlation between scores of a user in month \({t}_{x}\) and that same user in month \({t}_{y}\), for all users active in both time periods. A user is only considered active if they make at least \(10\) comments in a month. (f) The relationship between the proportion of users who polarize and the polarization threshold. The polarization threshold is the number of standard deviations a user must increase in polarization to be considered polarized. Three lines are plotted corresponding to three pairs of months; the pairs of months with the minimum (blue), maximum (orange), and median (green) proportion of users polarized when using a threshold of \(1\). A threshold of \(1\) is used in all other calculations. (g) The relationship between the proportion of users who polarize and the comment threshold. The comment threshold is the value used to filter inactive users from the calculation. Users must have at least \(x\) comments in each of the two months to be included in the calculation of the proportion of users who polarize. The same three month pairs are plotted as in part (e). There are minimal differences between different thresholds. A threshold of \(10\) is used in all other calculations.

The distribution of political activity on Reddit over time by partisan score. Each bar represents one month of comment activity in political communities on Reddit, and is coloured according to the distribution of partisan scores of comments posted during the month (the partisan score of a comment is simply the partisan score of the community in which it was posted.) The top plot includes all activity as in Fig. 3b, while the four following plots decompose this into the subsets of activity authored by particular groups of users. Users are classified based on the average partisan score of their activity in the month 12 months prior–into left-wing (having a score at least one standard deviation to the left), right-wing (one standard deviation to the right), or center. Users with no political activity in the month 12 months prior use the label of the most recent month more than 12 months prior in which they had political activity; if they have never had political activity before, they fall into the new / newly political category (bottom).

(a) Average polarization (absolute \(z\)-score) of activity in different ideological categories over time. (b) Volume of activity (number of comments) in different ideological categories over time. (c, d) Annual change in polarization in the two partisan activity categories, decomposed into the change attributable to new (\(\varDelta n\)) and existing (\(\varDelta e\)) users as done in Fig. 4.

The relationship between explicitly partisan and implicitly partisan activity (left: left-wing activity; right: right-wing activity.) Of users who were first active in an explicitly partisan community at time \({m}_{E}\), the proportion of them who were first active in an implicitly partisan community at time \({m}_{I}\) is denoted by the colour in cell \(\left({m}_{E},{m}_{I}\right)\). The line graphs at the top show the total proportion of users who were active in implicitly partisan communities before they were active in an explicitly partisan community (i.e. the sum of each column below the diagonal back to 2005, or the total proportion of users for whom \({m}_{I} < {m}_{E}\)).

Supplementary Information

This file contains Supplementary Tables 1 and 2.

Reporting Summary

Peer Review File

This file contains Supplementary Tables 1 and 2.

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Reprints and permissions

Cite this article

Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature600, 264–268 (2021). https://doi.org/10.1038/s41586-021-04167-x

Download citation

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable linkSorry, a shareable link is not currently available for this article.

Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature600, 264–268 (2021). https://doi.org/10.1038/s41586-021-04167-x

Download citation

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable linkSorry, a shareable link is not currently available for this article.

Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Anyone you share the following link with will be able to read this content:

Get shareable link Sorry, a shareable link is not currently available for this article.

Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

How individuals’ opinions influence society’s resistance to epidemics: an agent-based model approach

  • Geonsik Yu

  • Michael Garee

  • Yuehwern Yih

BMC Public Health (2024)

Gaining a better understanding of online polarization by approaching it as a dynamic process

  • Célina Treuillier

  • Sylvain Castagnos

  • Armelle Brun

Scientific Reports (2024)

In-party love spreads more efficiently than out-party hate in online communities

  • Samuel Martin-Gutierrez

  • José Manuel Robles Morales

  • Rosa María Benito

Scientific Reports (2024)

Social Media and Political Polarization: A Panel Study of 36 Countries from 2014 to 2020

  • Jia Lu

  • Meiqi Sun

  • Zikun Liu

Social Indicators Research (2024)

Polarized collaboration benefits knowledge production: empirical analyses of the mediating effect of co-production pattern in Wikipedia articles on climate change

  • Kunhao Yang

  • Mengyuan Fu

Journal of Computational Social Science (2024)

How individuals’ opinions influence society’s resistance to epidemics: an agent-based model approach

  • Geonsik Yu

  • Michael Garee

  • Yuehwern Yih

BMC Public Health (2024)

Gaining a better understanding of online polarization by approaching it as a dynamic process

  • Célina Treuillier

  • Sylvain Castagnos

  • Armelle Brun

Scientific Reports (2024)

In-party love spreads more efficiently than out-party hate in online communities

  • Samuel Martin-Gutierrez

  • José Manuel Robles Morales

  • Rosa María Benito

Scientific Reports (2024)

Social Media and Political Polarization: A Panel Study of 36 Countries from 2014 to 2020

  • Jia Lu

  • Meiqi Sun

  • Zikun Liu

Social Indicators Research (2024)

Polarized collaboration benefits knowledge production: empirical analyses of the mediating effect of co-production pattern in Wikipedia articles on climate change

  • Kunhao Yang

  • Mengyuan Fu

Journal of Computational Social Science (2024)

  • Geonsik Yu
  • Michael Garee
  • Yuehwern Yih

BMC Public Health (2024)

  • Célina Treuillier
  • Sylvain Castagnos
  • Armelle Brun

Scientific Reports (2024)

  • Samuel Martin-Gutierrez
  • José Manuel Robles Morales
  • Rosa María Benito

Scientific Reports (2024)

  • Jia Lu
  • Meiqi Sun
  • Zikun Liu

Social Indicators Research (2024)

  • Kunhao Yang
  • Mengyuan Fu

Journal of Computational Social Science (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Advanced search

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

FAQ

Where have Downloads gone?

To install an app on your smartphone or tablet, use the Google Play Store (Android) or App Store (iPhone/iPad). If you have a Chromebook, you can install apps using the Play Store app on the Launcher.Aug 1, 2024

What is an online platform and examples?

The term “online platform” is used to describe a series of services available online, such as: Marketplaces (example: Fnac, Amazon…) Search engines (example: Bing, Yahoo, Google…) Social media (example: Facebook, LinkedIn, Youtube…)Oct 14, 2021

Buckshot Rouletteのクリア時間は?

白熱のゲームプレイ ゲーム所要時間は15分から20分。 アリーナに入り、ディーラーと3ラウンド戦い、賞金を手にして退出しよう…

「BuckshotRoulette」の読み方は?

BuckshotRoulette (ばっくしょっとるーれっと)とは【ピクシブ百科事典】

How to play fun roulette game?

All you need to do is place your bets onto the layout based on which number you think the ball is going to land in at the end of its spin. The roulette ball is spun by the croupier in the opposite direction of the wheel, and all players will be casting a keen eye over where it is likely to stop.

This site only collects related articles. Viewing the original, please copy and open the following link:Quantifying social organization and political polarization in online platforms Nature

🔥 🎳 Buckshot Roulette Free 🎊
    📺 Latest Articles 😘💋 Popular Articles 🎠
    🎲 Recommended Articles 🥳
    #Article TitleKeywordArticle LinkArticle Details

    It’s on Digital Platforms to Make the Internet a Better Place For years users of digital technology have had the sole responsibility to navigate misinformation negativity privacy risk and digital abuse to name a few. But maintaining digital well-being is a heavy weight to be put on an individual’s shoulders. What if we didn’t have to carry quite as much of the burden of maintaining our digital well-being? What if we expected a bit more of the digital platform providers that hosted our virtual interactions? There are three key responsibilities we should expect of all of our digital platform providers to help make more positive digital spaces. First establish meaningful norms and standards for participation in virtual spaces — and communicate them clearly to users. Second verify human users and weed out the bots. Third improve content curation by addressing posts that incite racism violence or illegal activity; identifying misinformation; and encouraging users to be content moderators.

    Mass selection into groups of like-minded individuals may be fragmenting and polarizing online society, particularly with respect to partisan differences1–4. However, our ability to measure the social makeup of online communities and in turn, to understand the social organization of online platforms, is limited by the pseudonymous, unstructured and large-scale nature of digital discussion. Here we develop a neural-embedding methodology to quantify the positioning of online communities along social dimensions by leveraging large-scale patterns of aggregate behaviour.

    09/04
    wpt global