Industry trends

You are here: Home » News » Industry trends

AI stumbling blocks in content distribution

2017-07-25 11:34

Since the commercial development of the Internet, whether it is a news client, a video website or an e-commerce platform... all platforms have defaulted to being an excellent breeder. They push content (feed) to users according to their own ideas.

These keepers are trained professionals, what is called in jargon --- site editors who set the agenda for users and select content according to the tastes of the majority of users.

Later, the editor was really too busy and used machines to help - the simplest machine method is "hot recommendations", such as sorting according to clicks or other data.

The biggest problem with the breeder model is that it does not know the appetite of diners, which will lead to two significant consequences: first, diners are dissatisfied and users' personalized needs cannot be met; second, their own resources are wasted, and a large number of long-tail resources cannot be exposed for a long time, increasing sunk costs.

Someone discovered the benefits of the machine. Machines can recommend content based on user characteristics. Just as a skilled chef can provide meals according to the taste of each diner, if the machine is smart enough, it can solve the personalized needs of all users to a certain extent. Isn’t this the C2M of the content industry?

To be precise, this is C2M for content distribution. It communicates with individual users and breaks out of the stereotype of mass communication/focus communication. Is it enough to revolutionize the lives of all search engines and portals?

This kind of intelligent content C2M has a profound historical background. Today, you are already standing on the edge of the times, watching AI technology ignite the fuse of IOT. Next, you will find yourself irresistibly entering the next era of information nuclear explosion: explosion of information terminals, explosion of information scale, explosion of information platform...

On the information highway, the rules of the cars you drive and the roads you walk have all changed, and the knowledge framework based on the breeder model that you are familiar with is facing subversion.

In this era, the breeder model has failed, and smart machines will become the biggest variable.

The first scenario that emerges is that humans produce content and machines distribute content.

The next scenario that emerges is that machines produce content and machines distribute content.

The content industry is facing the C2M revolution, is it okay?

"Of course not, the machine is stupid." If you think this way, then unfortunately, you are destined not to see the sun tomorrow.

"Of course." If you think so, then congratulations on falling into the pit.

The real situation may be unexpected.

1. The essence of content C2M is to move towards personalized communication

As an independent research direction, the origin of the recommendation system can be traced back to the collaborative filtering algorithm in the early 1990s. In the mid-term, it was represented by traditional machine learning algorithms, such as the latent semantic model promoted by the Netflix competition, and now it is a more complex deep learning model.

In recent years, deep learning has advanced by leaps and bounds, making machine recommendations the sun of the entire Internet. Driven by new technologies, personalized communication has become more feasible and is getting closer to single-user communication.

(1) Collaborative filtering gets off to a shaky start

According to the encyclopedia entry, collaborative filtering uses the preferences of user groups to recommend information of interest to you. These users either have similar interests or have common experiences. Then the website combines your feedback (such as ratings) to perform filtering analysis and then helps others filter information.

Of course, user preferences are not necessarily limited to information of particular interest. The recording of information that is particularly uninteresting is also very important. Collaborative filtering has shown excellent results and has begun to dominate the Internet industry.

Initially, collaborative filtering was applied to email filtering.

In 1992, Xerox scientists proposed the Tapestry system. This is the earliest design to apply collaborative filtering system, mainly to solve the problem of information overload at Xerox's research center in Palo Alto. The employees of this research center receive a lot of emails every day but have no way to filter and classify them, so the research center developed this experimental email system to help employees solve this problem.

Then, collaborative filtering ideas began to be applied to content recommendation.

In 1994, the GroupLens project team in Minnesota, USA, created a news filtering system. This system can help news readers filter the news content they are interested in. After the readers have read the content, they will give a rating score. The system will record the scores for future reference. The assumption is that the things the readers were interested in before will also be interested in reading in the future. If the readers do not want to reveal their identity, they can also rate anonymously. As the oldest content recommendation research team, GroupLens created the movie recommendation system MovieLens in 1997, as well as the similar music recommendation system Ringo, the video recommendation system Video Recommender, and so on.

Later, another milestone appeared - e-commerce recommendation system.

In 1998, Amazon's Lyndon and his colleagues applied for a patent for item-to-item technology. It was a classic algorithm used by Amazon in its early days and once became popular.

Is collaborative filtering considered artificial intelligence? From a technical point of view, it also belongs to the category of AI. However, it must be pointed out that the collaborative filtering algorithm is relatively weak. Whether it is user-based collaborative filtering or item-based collaborative filtering, the recommendation effect is always unsatisfactory.

How to guide the continuous optimization of recommendation systems through a systematic methodology? How can we incorporate complex realistic factors into recommended results? The siege lions were once very, very big, and there must be brave men under heavy rewards. Later, someone finally discovered a more flexible way of thinking.

(2) Traditional machine learning begins to accelerate

In 2006, Netflix announced the Netflix Prize. Netflix is an old online movie rental website. The purpose of holding the competition is to solve the machine learning and data mining problem of movie rating prediction. The organizers have spent a lot of money on this, claiming that those individuals or teams that can improve the accuracy of Netflix’s recommendation system Cinematch by 10% will be rewarded with US$1 million!

Netflix disclosed a lot of huge data on its blog, for example:

We have billions of user rating data and are growing by millions every day.

Our system generates millions of playback hits every day and includes many features such as playback duration, playback time, and device type.

Our users add millions of videos to their playlists every day.

Obviously, in the face of these massive data, we can no longer rely on classification standards established by purely manual or small systems to standardize user preferences on the entire platform.

A year after the competition began, Korbell's team won the first stage with an improvement of 8.43%. They put in more than 2,000 hours of effort and incorporated 107 algorithms. Two of the most efficient algorithms are matrix decomposition (often called SVD, singular value decomposition) and restricted Boltzmann machines (RBM).

Matrix decomposition is a supplement to collaborative filtering. The core is to decompose a very sparse user rating matrix R into two matrices: the matrix P of user characteristics and the matrix Q of item characteristics, and use known data to construct these vectors and use them to predict unknown items. While effectively improving calculation accuracy, this algorithm can also add various modeling elements to integrate more diversified information and make better use of large amounts of data.

However, matrix factorization also has shortcomings. The disadvantage is that matrix decomposition and collaborative filtering algorithms belong to the category of supervised learning. They are rough and simple and suitable for small systems. The problem facing Internet giants is that if a large recommendation system needs to be built, collaborative filtering and matrix factorization will take a long time. what to do?

As a result, some siege lions turned their attention to unsupervised learning. The essence of the clustering algorithm in unsupervised learning is to identify user groups and recommend the same content to users in this group. When we have enough data, it is best to use clustering as a first step to narrow down the selection of relevant neighbors in the collaborative filtering algorithm.

The latent semantic model uses a cluster analysis method. One of its major advantages is that it can not only predict ratings, but also model text content at the same time, which greatly improves the effect of recommendation based on content.

The traditional analysis method is not very accurate in the two steps of labeling users and mapping the labels to results. For example, the age filled in by the user may not be true, or not all teenagers like comics. The core of the latent semantic model is to transcend the dimensions of these superficial semantic labels and use machine learning technology to mine deeper potential associations in user behavior, making recommendations more accurate.

Under the call of the Netflix Prize million dollar martial arts competition, talents from all over the world appear frequently. It reached a peak in 2009 and became the most iconic event in the field of recommendation systems. This competition attracted many professionals to devote themselves to research in the field of recommendation systems, and also allowed this technology to penetrate from professional circles into the commercial field, triggering heated discussions and gradually arousing the covetousness of mainstream websites. Content-based recommendations, knowledge-based recommendations, hybrid recommendations, trust network-based recommendations, etc. have embarked on a path of rapid development.

These recommendation engines are different from collaborative filtering. For example, content-based recommendations are based on the content information of the item, and do not need to be based on the user's evaluation of the item. Instead, machine learning methods are needed to obtain the user's interest information from examples of content descriptions. Content filtering mainly uses technologies such as natural language processing, artificial intelligence, probability statistics, and machine learning for filtering.

Is a million dollars worth it? According to Netflix user data in 2016: there were 65 million registered members and a total of 100 million hours of video viewing per day. Netflix says the system saves $1 billion a year.

(3) Deep learning brings “unmanned driving”

In recent years, major pain points for users have emerged. The popularity of smartphones has made the huge amount of information and the small reading screen an irresolvable contradiction. User reading scenes are no longer stuck on the computer screen, but have shifted to mobile fragmentation. Search engines have failed, human recommendations are too busy, and machine recommendations are not enough. This change is a life and death test for large content platforms. If your needs are met, you live; if you don't, you die.

Faced with this problem, YouTube and Facebook have proposed a new solution: using deep learning to create smart machines. In the past ten years, deep learning has made huge leaps and is more advantageous for solving large data volumes.

If manual content recommendation is like a driver driving a car, then content recommendation brought about by deep learning is like a driverless car. In this technology, user data is used to "perceive" user preferences. Its recommendation system can basically be divided into data layer, trigger layer, fusion filtering layer and sorting layer. When the data generated and stored in the data layer enters the candidate layer, the core recommendation task is triggered.

Taking YouTube as an example, its latest public recommendation system algorithm consists of two neural networks, one for candidate generation and one for ranking. First, taking the user's browsing history as input, the candidate generation network can significantly reduce the number of videos that can be recommended and select a set of the most relevant videos from a huge library.

The candidate videos generated in this way have the highest correlation with the user, and the user ratings are further predicted. The goal of this network is simply to provide broader personalization through collaborative filtering. The task of the ranking network is to carefully analyze the candidate content and select a small number of optimal choices. The specific operation is to use the designed objective function to score each video based on the video description data and user behavior information, and present the video with the highest score to the user.

In this mode, the machine completely takes over the platform. With the continuous training of deep learning, the machine is getting smarter and smarter, and its IQ in dealing with people will gradually increase. In a sense, it will gradually assume the responsibility of a watchdog.

2. Will the content industry be subverted by C2M?

The world is full of surprises. An automated teller machine (ATM) at a bank in Corpus Christi, Texas, USA, spit out a note on the 11th that read "Save me." This news quickly spread across the Chinese Internet and became the headlines of many websites.

Do you need to see the same article from N websites?

This redundant information consumes your energy and traffic, just like you can see many instant noodles advertisements when you turn on any TV channel, making it difficult to quickly find what you want from a large amount of information.

How to solve the embarrassment of redundant user information?

There have been many unsuccessful technical solutions in the past. Personal portals were short-lived, RSS subscriptions were not popular, and cross-site tracking was not on the table. Only C2M can lead the future.

The C2M model can be applied to the entire network like Toutiao, or it can be based on giant platforms like Facebook. Its core lies in extracting, sorting and delivering massive amounts of information to users based on user behavior, characteristics and demands. This is the secret to overcoming pain points.

But there are also many voices of doubt. For example, some people believe that recommendations such as collaborative filtering can easily cause users to form information cocoons, cannot identify reading scenes, have poor immediacy, and are time-consuming. Models such as Toutiao are often criticized for these shortcomings. They also have to deal with multiple challenges such as difficult-to-capture user interests and the privacy and management of user data.

Supporters and doubters each hold their own sides, which one is right and which one is wrong? Although there are two major opportunities in the future, there are currently three mountains to overcome.

1. The reasons for support are as follows:

① Thousands of people have thousands of faces, and all opinions can be adjusted.

The personalized content recommendation mechanism can recommend information to users based on their preferences. Through various algorithms, by analyzing the user's historical behavior, comparing related users and related items to guess the content that the user may like, and listing candidate sets and verifying them, users can get more accurate content, so that information distribution can reach thousands of people and achieve precise connection between content and users, rather than a one-size-fits-all delivery in the traditional sense.

②Find the needle in the haystack and improve efficiency

Personalized recommendations eliminate the need for users to extract and search in massive amounts of information. Users do not need to dig through massive amounts of information. To a certain extent, some useless information is removed for users, the scope of user information search is narrowed, and users' reading efficiency is improved.

③Take what you like and enhance stickiness

Continuously recommending content suitable for users can increase user stickiness. Personalized recommendation technology uses algorithms to accurately recommend content that users are interested in, helping users quickly discover content that they are interested in. When you finish reading a piece of content, it will immediately recommend related things to you, which can increase user stickiness and improve user experience.

④ Dig into the long tail and break the two poles

Personalized recommendations can help users mine long-tail content through relevant algorithms and avoid the Matthew Effect of polarization. When user A likes relatively unpopular long-tail content, and user B has the same or similar interests and behavioral habits as user A, the system can recommend the unpopular content that user A likes to user B. This allows the unpopular content to get more exposure, helps users discover more long-tail content, and avoids the polarization of the content production ecosystem.

⑤Two-way communication, in-depth optimization

Personalized recommendations based on users are the result of in-depth analysis and communication with users, which improves the user's interactive experience. Traditional manual recommendation is to cast a net everywhere and recommend, without carefully classifying and filtering users. Machine recommendation is based on user characteristics and habits. Users can get two-way communication and communication, and user behavior can also have an impact on the next recommendation. Feedback is obtained to a certain extent, which improves the user's interactive experience.

⑥Category and refine operations

Personalized recommendations also help the platform classify content, thereby facilitating refined management and operation of the platform. The information era has led to the continuous emergence of platforms, and various forms of content are becoming more and more abundant. The display area on users' mobile phones is limited. Personalized recommendations can enable merchants to better classify content for different customers, which is conducive to refined operations.

2. The main points of questioning are:

① Draw the ground as a prison and set limits in thinking

Personalized news experiences can easily stall the mind. The results of personalized recommendations are based on the user's historical data and historical behaviors, and recommendations based on similar users or similar items. To a certain extent, the content that the user is interested in is fixed in a specific closed loop. While filtering information for the user, it also blocks a lot of information for the user. Personalized recommended content is collected from and determined by your interests. Therefore, being unable to be exposed to “new” things naturally makes it impossible to cultivate new interests, and it is easy for users to become more and more narrow-minded.

②How can machines understand the changing hearts

Machine recommendations cannot identify changes in demand caused by changes in reading scenarios, cannot sense why users need to read, and cannot match the complexity of human emotions. For example, at a certain stage, we pay attention to something because everyone is discussing it, but this does not mean that we are all interested in similar things.

③Aesthetics are offline, it’s hard to distinguish between good and bad

The difficulty of personalized recommendations poses a challenge to the quality of recommended content. In the past, it was not easy for editors to evaluate the quality of an article. Nowadays, it is easy for machine recommendations to ignore the dimension of quality. Inaccurate machine algorithms will cause mixed headline content. Machine recommendations may recommend a worthless article highly, or bury truly valuable articles. Machine recommendations can only measure the value of your article from external data. Currently, there is no way to analyze the value of your article from the nature of the content.

④It takes a long time and is always half a beat

Personalized recommendation behavior based on massive data takes a long time and has poor immediacy. For example, news recommendations have timeliness issues and need to be constantly updated. Data analysis work such as analyzing users' historical behaviors and comparing similar users takes a long time, and it is difficult to form recommendation results in the first time. Moreover, methods such as collaborative filtering still have the problem of cold start, that is, at the beginning of the user experience, when mature historical data has not been formed, it takes a long time to collect user click log data to generate recommendations.

⑤Common hot spots and convergence of individuals

Not all users are equal to each other, but collaborative filtering methods do not take into account individual differences between users. For example, we observed that entertainment news is continuously recommended to most users even if users do not click on entertainment stories. The reason is that entertainment news is generally very popular, so there are always enough clicks from entertainment stories from a user's "neighbors" to be recommended.

3. Where are the opportunities in the future?

Future opportunities lie in two major driving forces: the industry's commercial motivation for long-tail gold mines; and the push by users' strong personalized needs.

①Long tail gold mine

Personalized recommendations can help users discover more high-quality long-tail content and increase the commercial value of the platform. Generally, platform users only access about 10% of the popular content. Many niche and unpopular content are buried in the database and are not easily discovered. We call this long-tail content.

According to the long-tail theory, due to cost and efficiency factors, when the venues and channels for storage, circulation and display of goods are wide enough, the production cost of goods drops sharply so that individuals can produce them, and the sales cost of goods drops sharply, almost any product that seemed to have extremely low demand in the past will be bought by people as long as it is sold. Personalized recommendations can spread the long-tail content that niche users like through the user-based recommendation technology in collaborative filtering, fully explore the long-tail content, and generate long-tail gold mines.

②The times are urgently needed

The times we live in have changed. After 20 years of development, the Internet has turned into a mobile Internet. Now it is about to integrate AI and enter the IOT era. Terminals and information are expanding rapidly in a nuclear explosion. It will become increasingly difficult for users to find the information they need in massive data. In this case, traditional search engines are no longer capable. In the past, the most representative ones were Yahoo, a classified directory, and Google, a search engine. They have reached a dead end. If you want to use search engines to learn knowledge in an unfamiliar field, it is extremely inefficient!

To meet the urgent needs of the times, hope lies in personalized recommendations. The machine needs to understand the user as much as possible and proactively recommend information that interests the user and needs based on the user's data. In the past 20 years, although some achievements have been made, it is only the first step taken by Tang Monk in learning Buddhist scriptures, and there is still a long way to go.

4. The three mountains that need to be overcome now

In the development process, personalized recommendations face many problems, such as the difficulty in predicting user interests, the privacy involved in user-related data, and the difficulty of data processing, all of which bring great threats and challenges to personalized recommendations.

The first mountain, to be exact.

Users' interests are susceptible to constant changes influenced by multiple factors, which is an inevitable challenge for personalized recommendations. The basic part of the personalized recommendation system is user interest modeling. The quality of user interest modeling directly determines the quality of personalized recommendations. However, user interests will be affected by multiple factors such as social, scene, and environment at any time. The constant changes in user interests make it difficult to predict users’ future tendencies based on past data, and will also affect the accuracy of recommendation results.

The second mountain, privacy.

For personalized recommendations based on user data, how to protect user privacy is a big problem. Traditional content recommendation systems conduct data mining on users' page access records to find out users' access habits, and then screen information based on user needs on the server side in an attempt to provide users with information recommendation services and spam filtering services. However, how to provide users with more accurate content recommendation services while protecting user privacy is a big challenge.

The third mountain is values.

In addition to the three mountains, there is another issue that deserves attention. Today's machine recommendations are equal to "no three views" and "no aesthetics". Operating in the Chinese circle will definitely encounter considerable challenges due to well-known reasons.

Traffic fraud and cheating are obvious examples. For example, some netizens told the author: I often see some videos on the Internet with tens or hundreds of thousands of people studying. The numbers are so big that they make us doubt our lives. After a test, the number of people increased by three when the page was refreshed, and dozens were added for new courses. It was instantly clear. I tested some live video broadcasts in the middle of the night and filmed them against the wall. Ten minutes after starting the live broadcast, the number of live broadcast fans continued to increase. When I got a real fan, the number of people increased again. I felt cheating for a while, but I felt uneasy.

Some companies once placed some very vertical large-scale advertisements on smart recommendation clients. Some of them were really effective, and some were too obviously fake. The traffic they brought in when the reading volume exceeded 10,000 in an instant was not as good as the effect of reading over 1,000 accounts by yourself. In this way, whether the data is serious depends on whether the people who use it are serious or not.

In the future, how personalized recommendations will continue to innovate in technology and management, and whether the participation of artificial intelligence factors can improve many existing problems and produce better recommendation results for users will become an important issue.

3. The technical routes being developed by giants

In fact, no matter how strong the support or doubts are, personalized recommendations have attracted countless giants to give in.

At present, in the market, old and new technologies still occupy their own territory. New deep learning technology is rising rapidly and aggressively; old-school technology is also constantly being optimized to prevent unexpected events. The dispute between old and new technologies is a hot topic at the moment, and it is also the two major routes that will determine future development.

(1) Old-school technology believes that traditional recommendation technology can improve itself

1. Google News’ routines are constantly optimized

Google News is an online information portal that aggregates news reports from thousands of information sources (after grouping similar news) and presents them to logged-in users in a personalized way. Due to the huge number of articles and users, as well as the given response time requirements, a purely memory-based approach is not applicable and a scalable algorithm is required, so Google News uses a combination of model-based and memory-based techniques.

The routine of Google News is still the basis of collaborative filtering. It uses collaborative filtering technology that combines model-based and memory-based technologies for personalized recommendations. According to the introduction of the book "Recommended Systems", the model-based part relies on two clustering techniques:

① Probabilistic Latent Semantic Index (PLSI): The "second generation" probabilistic technology of collaborative filtering. In order to identify clusters of users and related items with similar ideas, hidden variables are introduced, corresponding to the limited set of states of each user-item pair, which can adapt to situations where users may be interested in multiple topics at the same time.

②MinHash: Put the two users into the same cluster (hash bucket) based on the intersection of the items browsed by the two users. To make this hashing process scalable, a special method is used to find nearest neighbors and Google's own MapReduce technology is used to distribute computing tasks among several clusters.

Memory-based methods mainly analyze "companion views". "Companion views" means that an article has been viewed by the same users within a predefined period of time. When making predictions, it is necessary to traverse the recent historical data of active users and obtain adjacent articles from memory. At runtime, the comprehensive recommendation score of the candidate items in the preset set is the calculated value of the linear combination of the scores obtained by these three methods (MinHash, PLSI and companion browsing), and then the recommendation result is output based on the calculated value.

2. A system developed by Linkedin for four scenarios

Linkedin mainly implements personalized recommendations through Browsemap, a collaborative filtering recommendation platform independently developed and designed. Browsemap is a generalized platform developed by Linkedin that implements the item collaborative filtering recommendation algorithm. The platform can support the recommendation of all entities in Linkedin, including job seekers, job postings, companies, social groups (such as schools, etc.), search terms, etc. To implement collaborative filtering recommendations for a new entity through this platform, the developers have to do only simple tasks such as accessing relevant behavior logs, writing Browsemap DSL configuration files, and adjusting relevant expiration parameters.

The paper points out that the Browsemap platform is most commonly used on Linkedin for four recommendation scenarios: recommending companies to job seekers, recommending similar companies, recommending similar resumes, recommending search terms, etc.

① Recommend companies to job seekers: implement item-based collaborative filtering through Browsemap, calculate the similarity value between users and potential companies, and obtain relevant company characteristics; analyze related company characteristics and user/company content characteristics (including user location, work experience; company products, related descriptions) together to obtain the final preference score.

② Similar company recommendation: There are two differences from recommending companies to job seekers: first, the similarity of content features becomes the similarity between company portraits; second, a browsermap is constructed based on a variety of user behaviors.

③Recommendation of similar resumes (users): This part of the recommendation is realized through the company details page browsing behavior and user portrait characteristics. At the same time, the attributes of similar resumes are used to supplement the missing attributes of the resume to obtain the user's virtual resume.

④ Search term recommendation provides four correlation methods: First, collaborative filtering: time and space factors are added when calculating the correlation between search terms; second, based on the click-through rate of the search results of recommended search terms; third, based on the overlap between search terms; fourth, based on the click-through rate of recommended search terms. However, the experimental results show that the results of collaborative filtering are the best, even better than the results of combining these four methods.

3. The three stages of Toutiao

As a popular personalized recommendation product in China, Toutiao technology has gone through three stages:

In the early stage, non-personalized recommendations were mainly used, focusing on hot article recommendations and new article recommendations. At this stage, the granularity of describing users and news was relatively coarse, and recommendation algorithms were not used on a large scale.

In the mid-term stage, personalized recommendation algorithms will be the main ones, mainly based on collaborative filtering and content recommendation. There is no difference between the technical ideas of collaborative filtering and those introduced previously. The content-based recommendation method first describes the news, and then uses the user's positive feedback (such as clicks, reading time, sharing, favorites, comments, etc.) and negative feedback (such as not interested, etc.) to establish the connection between the user and the news tags, thereby performing statistical modeling.

At the current stage, large-scale real-time machine learning algorithms are mainly used, and the features used reach hundreds of billions, and the model can be updated in minutes. The architecture is divided into two layers: the retrieval layer, which has multiple retrieval branches to pull out news candidates that users are interested in; and the scoring layer, which uses real-time learning to perform modeling and scoring based on three major categories of user characteristics, news characteristics, and environmental characteristics. It is worth mentioning that the actual sorting is not entirely based on model scoring. There will be some specific business logic that is combined together for final sorting and spit out to the user.

Why is Toutiao successful? According to the article analysis, many people will say that Toutiao’s personalized recommendation technology is good, but this is not necessarily the case. The reason is that Toutiao's personalized recommendations are also undergoing a complex evolution process: from manual recommendations to machine recommendations and finally to continuous iteration of algorithms and technologies, repeated verification, and increasingly perfected.

(2) New technology believes that deep learning is the wise choice

New technology mainly refers to personalized recommendation systems that use deep learning.

Deep learning is a method in machine learning based on representation learning of data. An observation (such as an image) can be represented in a variety of ways, as a vector of intensity values for each pixel, or more abstractly as a sequence of edges, a region of a specific shape, etc. Tasks (e.g., face recognition or facial expression recognition) are easier to learn from examples using certain representations. The benefit of deep learning is to replace manual feature acquisition with efficient algorithms for unsupervised or semi-supervised feature learning and hierarchical feature extraction.

When conventional recommendation algorithms are no longer able to analyze and process large volumes of data in a timely manner and accurately make recommendations for independent users, companies with corresponding technical levels are beginning to use deep learning to solve the pain points of massive content analysis and recommendation. We take YouTube and Facebook, which introduced deep learning earlier, as examples for analysis.

1. YouTube’s neural network

YouTube's recommendation system is one of the largest and most complex recommendation systems in the world. YouTube has more than one billion users worldwide, and hours of video are uploaded every second. The video "corpus" inventory is growing day by day, which requires a recommendation system to continuously recommend videos of interest to users in a timely and accurate manner.

Compared with other commercial recommendation systems, the Youtube recommendation system faces three main challenges:

①Scale. Most of the existing feasible recommendation algorithms are unable to handle the massive amount of YouTube-level videos.

②Freshness. YouTube's video "corpus" library is not only huge, but also has a constant stream of new videos uploaded every moment. The recommendation system must analyze and model the content uploaded by users in a timely manner, while taking into account the balance of existing videos and newly uploaded videos.

③Noise. Due to the sparsity of user behavior and unobservable influencing factors, user history is inherently difficult to predict.

In order to solve these problems, the YouTube recommendation system has shifted its research focus to deep learning, using the TensorFlow system developed by Google Brain (a second-generation artificial intelligence learning system developed by Google) to bring flexibility in development and testing to the recommendation system.

The YouTube recommendation system mainly consists of two deep neural networks: the first neural network is used to generate a list of candidate videos; the second neural network is used to score and rank the input video list in order to recommend top-ranked videos to users.

Candidate video generation relies on collaborative filtering algorithms to generate a broad candidate list of personalized recommendations for users. The ranking neural network is based on the list of first candidate generation networks, providing finer discrimination refinement and always achieving a higher recommendation hit rate. By defining an objective function to provide a series of features that describe videos and users, the ranking network scores each video based on the objective function. The set of videos with the highest scores is recommended to the user.

It is the massive amount of YouTube videos that creates the need for deep learning, which effectively makes up for the long-time data processing problems of collaborative filtering.

2. A big step forward for Facebook

Facebook has been using its Newsfeed feature to implement personalized recommendations for nearly 10 years. In September 2006, NewsFeed (information flow) came out, and MiniFeed (personal news) also came out at the same time. NewsFeed is a system that automatically integrates and generates content information flows. It determines by itself which news, updates, and events we read. Its coverage, the accuracy of its information push, and its influence far exceed our imagination. It can be said that NewsFeed is a big step forward for Facebook in artificial intelligence.

How does Facebook use deep learning to evaluate content and users?

First, in terms of viewing text, Facebook uses "natural language processing" technology to scan the "statuses" and "blogs" posted by each person in order to "truly understand the semantics of the text" and not only rate them. During the log scanning process, the system will automatically identify "excessive headline-making" or "excessive commercial" content, and such content is increasingly rare in NewFeed.

Second, in terms of content translation, when dealing with non-English languages, Facebook engineers have specially developed a deep learning platform that analyzes and translates texts written in more than 100 languages every day. For example, when a friend posts an update in German, NewsFeed will be reflected in English to an American friend, creating a digital virtual environment that can overcome language barriers and enable everyone to be interconnected.

Third, in terms of identifying objects, Facebook is also using deep learning technology to identify objects in photos and videos. Not only that, it can also further explore who may be interested in these photos, or which users these photos are associated with, so as to recommend them to target users.

(3) Dilemma of deep learning

Can deep learning defeat all the invincible opponents in the world?

At least for now, deep learning is only effective in relatively "shallow" intelligent problems such as Speech and Image, but it is a bit less effective for problems such as language understanding and reasoning. Maybe future deep neural networks can solve this problem more "intelligently", but it is still not ready yet.

The research and application of deep learning in the field of recommendation systems is still in its early stages. Even though deep learning is considered to be able to solve the problems of cold start and slow data processing of collaborative filtering, it also has its own unspeakable hiddenness.

First, the cost is too high. Data is critical to the further development and application of deep learning. However, over-reliance on labeled big data is also one of the limitations of deep learning. Data collection is costly, and labeling costs have begun to rise, making deep learning too expensive. And for many small companies with smaller bodies and less data, even if they have the ability to use deep learning to improve personalized recommendation results, they still face the embarrassing situation of having no data support.

Second, are there any plans to reduce costs? Yes, but difficult to achieve. Deep learning is divided into supervised learning and unsupervised learning. The cost of obtaining a large amount of unsupervised data is minimal. At present, supervised learning is generally used, but in essence, most recommendation models based on supervised learning are difficult to completely avoid existing problems and improve recommendation quality. Unsupervised learning is less expensive than supervised learning because it does not require labeling of data. However, the current deep learning ability to learn unsupervised data is seriously insufficient, so the application of deep learning in recommendation systems is still in its early stages.

The two major forces, the old and the new, compete with each other, promote each other, but blend with each other. Traditional recommendation technology continues to improve under the impact of deep learning. Deep learning continues to innovate with strong momentum to catch up with traditional recommendation technology, but it also faces development dilemmas. But it is in this process of self-development and innovation of multiple platforms that the boundaries between the old and the new are becoming increasingly blurred and increasingly integrated. Even companies that insist on perfecting traditional recommendation technology are slowly getting involved in the field of deep learning. The new school with more mature deep learning development has not completely abandoned the old school technology. So, which school will be king in the future?

Four. Who will win in the future?

Content C2M is essentially an insight into and prediction of people’s hearts. The battle between technology and people's hearts does not work out overnight. The fundamental characteristic of human thought is "consciousness", which is the ability of individuals to understand the psychological states of themselves and others, including emotional intentions, expectations, thoughts and beliefs, etc., and to use this information to predict and explain the behavior of others.

However, there is a serious problem in the current field of artificial intelligence: people misunderstand the working mechanism of deep learning models and overestimate the capabilities of network models.

Through deep learning, we can train a model that can generate text descriptions based on image content. This process is seen as the machine "understanding" the image and the text it generates. When there is a slight change to an image that causes the model to start producing rather ridiculous captions, the results are surprising—the model malfunctions. The machine can spot a cat, but the machine still can't identify everything about the cat.

Looking back at history, it is not difficult to find that the goal that technology has always pursued is not so much to let machines replace humans, but to create smart machines to improve efficiency. The development of collaborative filtering technology is an obvious example.

In recent years, Internet giants have been extremely enthusiastic about manufacturing "smart machines", which is also due to efficiency. According to estimates from Microsoft Research, about 30% of page views on Amazon's website come from the recommendation system; Netflix's chief product officer claims that more than 80% of movie viewing comes from the recommendation system, and claims that the value of the Netflix recommendation system is as high as one billion US dollars per year; according to figures disclosed by Alibaba, the total transaction volume directly guided by recommendations on that day in 2013 was 5.68 billion yuan. Toutiao has built the company's core business on a recommendation engine and is one of the companies that attaches most importance to recommendation technology today...

In the development process of content C2M, although deep learning has many shortcomings, it is a high probability that deep learning will dominate the future. We see that the old and new schools representing traditional recommendation technology and deep learning are promoting and integrating each other. Among the top 20 global traffic platforms, although many companies still use collaborative filtering technology, such as Google News, LinkedIn, etc., some of them are also preparing or even using deep learning and other technologies to improve their own shortcomings. Pioneers such as YouTube and Facebook have begun to enjoy the dividends of deep learning.

From the breeder model to smart machines, C2M in the content industry has become a trend, and the day of disruption is not far away.

We can believe that although there are still some constraints in deep learning, with the strong development of AI technology and industry, technical bottlenecks will eventually be broken through.

What needs to be vigilant is that after C2M has crossed the two mountains of accuracy and privacy, humans have mastered new power through AI, and the desires and ambitions of the masters should also be controlled to a certain extent. In particular, issues of values will become more and more important.