The objective of this paper is to critically analyze the potential discriminatory and biased effects of the spread of algorithmic techniques for the interpretation of human behaviors, based on unsupervised machine learning methods, trained on uncontrolled data, produced by social networks users. I will introduce the epistemological general issue starting from the exam of the performance of two algorithms for textual analysis, such as GloVe and Word2vec. They are, at present, two of the most successful tools for textual analysis based on word embedding. Such algorithms, trained on public domain textual databases, tend to associate words by replicating gender and ethnic stereotypes, because they infer connections between words following the probability of their distances in the training sets. The meaning model that informs their judgments pushes them to rely on a stereotyped and biased representation of social, gender and ethnic categories, that are intertwined in the databases on which they are trained (Caliskan, Bryson and Narayanan 2017; Bolukbasi et al. 2016). The Natural Language Processing (NLP) techniques are just examples of the possible use of algorithms for profiling users of social networks with the aim of predicting their behaviors, or of suggesting their preferences, or of nudging them toward believing or desiring something predetermined by the system. The process is based on the analysis of past behaviors, preferences, or actions in order to create groups, clusters or categories. This complex system is founded on two equivocal premises. The first is that the huge increase of data allows the algorithms to work with such a quantity of information that prevents the distortive effects and potential mistakes to become relevant in terms of prevision. The second hypothesis is that algorithmic methods interpret more efficiently than human beings the meaning of available data on users, and consequently are capable of capturing its cognitive value for the scope of a trustworthy, univocal categorization and for the prediction of the probability of future events. Both hypotheses are not demonstrated or validated; they are just assessed and rhetorically supported by the major agents of the successful Big Data field. Moreover the combination of dirty, old or non-controlled data, of rigidity in the inferential capacity of learning algorithms and of the utilitarian orientations of their experimental design, could produce socially dangerous outputs in terms of interpretations and predictions in social sciences. It is desirabile to better understand rules and criteria adopted in machine learning algorithms used for socially sensitive analysis, with special regards to data acquired in social networks, in order to guarantee the respect of fairness and minority protections in judgments and predictions. It is also relevant to avoid secret methods in decisions taking processes that impact on people life and social justice, because the evaluation of algorithms only from their outputs is not enough, due to the asymmetry of power in terms of knowledge distribution between those who are and those who are not in control of data.
Numerico, T. (2019). Social network e algoritmi di machine learning: problemi cognitivi e propagazione dei pregiudizi. SISTEMI INTELLIGENTI, 31(3), 469-493 [10.1422/95085].
Social network e algoritmi di machine learning: problemi cognitivi e propagazione dei pregiudizi
Numerico, T.
2019-01-01
Abstract
The objective of this paper is to critically analyze the potential discriminatory and biased effects of the spread of algorithmic techniques for the interpretation of human behaviors, based on unsupervised machine learning methods, trained on uncontrolled data, produced by social networks users. I will introduce the epistemological general issue starting from the exam of the performance of two algorithms for textual analysis, such as GloVe and Word2vec. They are, at present, two of the most successful tools for textual analysis based on word embedding. Such algorithms, trained on public domain textual databases, tend to associate words by replicating gender and ethnic stereotypes, because they infer connections between words following the probability of their distances in the training sets. The meaning model that informs their judgments pushes them to rely on a stereotyped and biased representation of social, gender and ethnic categories, that are intertwined in the databases on which they are trained (Caliskan, Bryson and Narayanan 2017; Bolukbasi et al. 2016). The Natural Language Processing (NLP) techniques are just examples of the possible use of algorithms for profiling users of social networks with the aim of predicting their behaviors, or of suggesting their preferences, or of nudging them toward believing or desiring something predetermined by the system. The process is based on the analysis of past behaviors, preferences, or actions in order to create groups, clusters or categories. This complex system is founded on two equivocal premises. The first is that the huge increase of data allows the algorithms to work with such a quantity of information that prevents the distortive effects and potential mistakes to become relevant in terms of prevision. The second hypothesis is that algorithmic methods interpret more efficiently than human beings the meaning of available data on users, and consequently are capable of capturing its cognitive value for the scope of a trustworthy, univocal categorization and for the prediction of the probability of future events. Both hypotheses are not demonstrated or validated; they are just assessed and rhetorically supported by the major agents of the successful Big Data field. Moreover the combination of dirty, old or non-controlled data, of rigidity in the inferential capacity of learning algorithms and of the utilitarian orientations of their experimental design, could produce socially dangerous outputs in terms of interpretations and predictions in social sciences. It is desirabile to better understand rules and criteria adopted in machine learning algorithms used for socially sensitive analysis, with special regards to data acquired in social networks, in order to guarantee the respect of fairness and minority protections in judgments and predictions. It is also relevant to avoid secret methods in decisions taking processes that impact on people life and social justice, because the evaluation of algorithms only from their outputs is not enough, due to the asymmetry of power in terms of knowledge distribution between those who are and those who are not in control of data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.