With ChatGPT, is reading still useful?

Question

**Author:** Li Zi, Ph.D. in Sociology of Technology, Postdoctoral Fellow, Department of Medical Humanities and Ethics, Columbia University![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-c20cf94718-dd1a6f-1c6801) Image source: Generated by Unbounded AI‌In the first half of the year, ChatGPT was born, and the potential of artificial intelligence was revealed, which brought a discussion of survival crisis to many industries. GPT can pass bar and engineer qualifying exams, write college essays without failing grades, and even "understand" jokes. It can answer people's questions, organize vivid language structures, and imitate a variety of language styles; and the combination of large language models and image generation AI technologies, such as Midjourney, can allow people who have not received any artistic training to use a few words " Create" stunning artistic images.The essence of ChatGPT is actually a large language model (Large Language Model, LLM) superimposed generative artificial intelligence. A large language model, as the name suggests, is large, using a large number of morphemes to train a computer model with machine learning methods. The generative formula uses a predictive method to display the most likely morpheme connections during the dialogue.For knowledge "processors" and "consumers", the capabilities of large language models plus generative artificial intelligence are enormous. Massive morpheme data, deep neural network and huge computing power are equivalent to "flattening" the entire knowledge from the Internet, and then "assembling" it through human-computer interaction.**In terms of computational logic, ChatGPT is equivalent to a more powerful search engine. ** Ordinary search engines such as Google and Baidu "scrape" the information of the entire Internet through the crawler mode, and sort through complex algorithms. The method of artificial intelligence using machine learning is equivalent to sorting out the information in a predictive way in line with language logic. Knowledge processing has become more convenient and faster, and consumption has become more concise and clear—sometimes even too simple, giving opportunities for cheating on exam papers.In response to this, technological optimists believe that since the content that machines can generate from now on, it may not require most humans to use their brains to realize it, just like search engines replacing library cards and calculators replacing abacus. Indeed, even if AI does not intervene in the final decision-making, those tasks that require a lot of repetitive writing, or mechanical enumeration and sorting, can indeed provide a considerable degree of productivity and assist humans in the processing and consumption of knowledge.So, is reading useful? Can the personnel of major universities and research institutions also leave work?## **What can the machine "learn"**Big language models and generative artificial intelligence have brought an unavoidable topic to future knowledge "producers": what is knowledge? How to produce diverse, impartial and authentic knowledge?The "learning" ability of artificial intelligence is amazing. The existing large language model and the application of artificial intelligence cannot be separated from machine learning as its background. The word "learning" is essentially to use a large amount of data to train the prediction model, and to find a balance between the accuracy and universality of the prediction. This kind of prediction is actually based on existing knowledge, and the prediction of language model is also based on the connection between existing languages. For example, input "braised beef", and the machine predicts "meat"; then, based on more inputs, such as location, people, habits, etc., it will give more accurate predictions, such as "grandma's braised beef" and so on.How did this prediction come true? The coordinate system we are familiar with is two-dimensional. For example, in the whole population, there is a rough correspondence between height and weight. Given the height, the machine predicts an average weight, which is a prediction based on existing data. Add another dimension, such as gender, then it becomes a three-dimensional coordinate, and the predictions of men and women will be different. If this continues, the dimensions of data can be infinite, and the model of machine learning is to find such connections in a multi-dimensional space that the human brain cannot imagine, and constantly adjust the weights between various dimensions. For example, "how important" is the prediction of height to weight, which can be adjusted after a large number of data inputs.** Therefore, artificial intelligence based on machine learning will connect data of various dimensions in a higher-dimensional space, have the ability to discover potential connections between data, and will also "learn" some things that do not exist in reality. , but very likely linkages. **Used in the language model, artificial intelligence can also learn different language styles and dig out the "essence" and "problems" in the existing text.**The larger the data, the more mature the model, and the higher its computing and mining capabilities. **Similar to AI such as BERT and GPT, which were born in large institutions, many people believe that they have reached the "inflection point" of technology, and it is not unreasonable for quantitative changes to produce qualitative changes-this is a good thing for knowledge producers. However, large models also have inherent problems, and the larger the model, the more acute the problems, especially in relation to the diverse, fair, and truthful aspects of knowledge.## **How to produce real**## ** and unbiased knowledge? **New knowledge can be generated from connections and new models of existing knowledge, which is true whether it is at the human or machine level. However, is existing knowledge sufficient? Is it sufficient? Is it fair? If the basis of existing knowledge is insufficient or even biased, new knowledge built on it will also be biased.Since machine-learning AI was put into large-scale application, scholars have continuously revealed the biases inherent in these models: sexism, racism, unethical output, and so on. Developers use various patches and correction methods to make up for it, but most of the problems are hidden in the data production and training process, and the bias of AI is also a reflection and amplification of social prejudice.![](https://img-cdn.gateio.im/resized-social/moments-bab2147faf-422b1ca786-dd1a6f-1c6801) Another issue is the quality of the data. Machine learning involves not only the ability to train models, but also the quantity and quality of data. The existing development process puts more emphasis on the performance of the model and even superstition, but ignores the underlying problem of data sources. Most of the data today depends on manual cleaning and formatting, categorization, labeling and so on. Many times, this process of producing data is opaque, even scribbled. For example, behind the AI development of large companies, a large number of "dirty and messy" labor is outsourced to "AI factories" in underdeveloped areas. On the one hand, this process has labor ethics issues, and on the other hand, it also poses challenges to data quality.In the era of large models, this problem may be hidden deeper: not every researcher or team has the ability to develop AI models from scratch, especially large language and large image models, most of which are based on existing models Fine-tune on. The problems and deviations of the large model itself will be migrated to more application models. And the lower the deviation is, the more difficult it is to deal with it through fine-tuning and deviation correction.The prediction generation mode of the existing language model will even amplify the existing deviation of the data, resulting in the effect of "overfitting": for example, a certain disease has a high proportion of statistical data in a certain ethnic group, about 60% ; But if the language model is used to generate a portrait of a patient, then there is more than 90% possibility that the generated patient description will belong to this group.Now some AI model training adopts a "mutual combat" mode - the so-called "generative adversarial network" (generative adversarial network), allowing two models to continuously generate and correct each other. This method does improve the efficiency of model training, but any small deviation will be magnified in this "interaction". In the same principle, if a knowledge producer who works closely with a machine relies on this kind of "generation", then some biases from the model will be embedded in more new knowledge, and the new knowledge will be Absorbed as data, it further strengthens the bias of the model. **Knowledge producers must remain vigilant in this process.## **What is new knowledge? Could "generation" of AI represent new knowledge? **What is the so-called new knowledge? If AI is to be fully used to produce knowledge, then knowledge producers must think about this issue from the point of view of the combination of man and machine. Any information, as well as knowledge acquired by humans from the real world, needs to be "cleaned" and "formatted" into data. In addition to the data quality mentioned above, the process of data generation is also important. In short, what is the problem one wants to study? What kind of data is this question translated into? How are these data produced, and do they fully and fairly represent the issues that knowledge producers want to study?This problem is also true for "traditional" knowledge producers. Take history as an example. Although history studies past events, no past events can be 100% conclusive. Scholars are usually constantly looking for new historical materials to supplement their understanding of historical issues, and to unearth neglected perspectives and voices in the past. Interestingly, current historiography often resorts to a large amount of data, especially past economic, population, and climate data, and even relies on machine learning to bring new understandings and perspectives to history.Likewise, relying on machine-generated insights and opinions may amplify the importance of certain data sources. Today's knowledge producers rely too much on mainstream, Internet, and electronic information to create on things that have been "translated" into data by others. **In the AI era, the convenience and extensibility provided by AI will potentially make it easier for people to ignore non-mainstream and experiential knowledge that has not been digitized, electronicized, and thus miss the formation of new viewpoints and perspectives possibility. **On a deeper level, new knowledge often arises from the excavation of new materials, the collision of different viewpoints and perspectives, and the re-deconstruction of existing knowledge. The large language model provides many possibilities for the display of knowledge, but its internal logic and structure may be contrary to this production method.**Based on the training method of the large language model and the characteristics of the output generated by the model, the weight of the higher-ranked and higher-probability output content will become larger and the features will become more singular**. "AI-generated" has almost become an adjective to describe the featureless, repetitive, nonsense words that are said as if they were not. It is true that for knowledge consumers, the "most likely" answers greatly lower the threshold for understanding; but for knowledge producers, these things may become obstacles instead.## **Where should knowledge producers in the new era go? **Maybe many social science researchers like me have encountered this problem when using ChatGPT: ask it to explain a concept, and it is logical; An author never wrote a book, never published a paper. The narrower and more specialized the field, the greater the possibility of "nonsense".Going back to the principle of AI, this kind of "creation" is actually mining the "possible" connections of words and sentences in massive data, but these connections do not exist in reality. To put it bluntly, they just "sound alike". This new phenomenon is now called "hallucination". For knowledge producers, how to use artificial intelligence to mine the patterns and connections in the existing knowledge base, but remain vigilant against the "vision" of the machine, what exists and what is doubtful, is a very important skill."Dialogue" with AI will also become a new skill. The current AI is still a mysterious "black box" for most non-technical people (even technical people). **How to start from the bottom or middle level of technology to talk to machines more effectively, understand and fight against "vision" requires the cooperation of knowledge producers and technical practitioners**.For the research of new knowledge, new perspectives, and new materials, the unique structure and interpretation of each field is still very important at present. The predictive models of large language models and generative AI still tend to be single and repetitive, and the more limited the training material is, the more limited the capabilities will be. If you want to combine machine and human capabilities, you must start from the source of data production, use accurate, diverse, fair, and novel data to train AI models, and establish a benign human-computer interaction model.The advent of large language models and generative AI is just the beginning of the challenges for researchers. Instead of discussing "replacement", it is better to seek the possibility of running-in and development under a more prudent gaze.