Extracting Innovative Buyers by Scoring Using Innovator Theory

For companies that want to sell a high volume of products, it is important to identify innovative buyers to help with product marketing efforts. The purpose of this paper is to construct a model extracting whether users are innovative buyers or not from their purchase histories at physical stores and from access logs from an online-to-offline (O2O) site. Innovative buyers are users who influence other users’ product purchases, also known in innovator theory as innovators and early adopters. They purchase products quickly, visiting physical stores such as supermarkets and convenience stores. In other words, innovative buyers are known to have high cosmopolite natures. In extracting innovative buyers, we estimated the speed of user product purchases and their cosmopolite natures. This estimation index can also be referred to as innovator scores. We went on to verify this method with socioeconomic status points, personality points and communication points (SPC points), using consciousness data and profile data collected from a panel on an O2O site. Thus, we showed that innovative buyers could be extracted using this new method, and the accuracy was higher than that of traditional methods measuring only the speed from product sale start to user purchase.


RESEARCH BACKGROUND
In marketing consumer products with a short life cycle, such as foods, it is important for products to effectively penetrate into a market. Saito (1994) conducted a survey of the market penetration of Rao, released in the early 1990s, and found that the market penetration with innovators and early adopters steadily increased through the first 24 weeks. These demographics served as market leaders, so that the penetration of late markets, such as late majority and laggards, increased from the 17 th week. Note that Kirin Ichiban Shibori, Kirin Fine Pilsner, and Suntory Jias were all released at the same time; Kirin Ichiban Shibori penetrated to innovators and early adopters and is still a standard product, but the others have withdrawn from the market. From this, we conclude that products that penetrate the market with innovators and early adopters are more likely to penetrate other market segments. In the future, it is expected that data regarding these demographics will be analyzed and used effectively by companies that want to leverage the data in their marketing.
In addition, CyberAgent and Digital InFact (2019) reported that the market size of stores utilizing digital advertising (O2O advertising) will reach 258.6 billion yen in 2024, approximately 6.4 times that of 2019. Because of this, it is important for companies to develop a method to capture innovators and early adopters from buyers (O2O-type users) who go between online and offline shopping, such as web sites and physical stores.
In innovator theory, Rogers (2003) found a way to specify innovators and early adopters. His innovator theory showed that users can be classified into five groups, according to their level of early adoption of innovation: innovators, early adopters, early majority, late majority, and laggards. There are differences in lifestyle and personality in each group.
The purpose of this study is to use the innovator theory advanced by Rogers (2003) to extract innovative buyers using purchase histories and access logs from an O2O site where online and This Journal is licensed under a Creative Commons Attribution 4.0 International License offline users gather. By utilizing this information about innovative buyers who are interested in a product, marketing activities for a new product can be more effectively developed and products can quickly penetrate into the market.
The objective of this paper is to propose a method of extracting innovative buyers, focusing on differences in behavior data and consciousness data for users who are active shoppers in physical stores and O2O sites. In this paper, Chapter 2 explains traditional innovative buyers extraction methods and the purpose of this research. Chapter 3 explains our proposed new extraction method. Chapters 4 and 5 explore the estimate of a proposed method using actual data, results, and considerations. Finally, Chapter 6 offers a conclusion.

TRADITIONAL METHODS AND PURPOSE OF THIS RESEARCH
Previous research demonstrated a method of analyzing buyers' consciousness data and using a questionnaire as a method of extracting innovators and early adopters. For example, Saito (1994) prepared 700 variables for the purpose of constructing a psychological scale, categorized user value criteria from questionnaires, and produced groups that corresponded to the classifications of innovator theory.
On the other hand, more recent extraction methods take into account the propagation rate and order of product adoption between users and items for the purpose of utilizing it for marketing (Bass, 1969;Ichikawa et al., 2012;Ishikawa et al., 2007;Kawamae et al., 2009;Mahajan and Muller, 1998;Mahajan et al., 1995;Mahajan et al., 1990;Menjo and Yoshikawa, 2008;Muller et al., 2009;Peres et al., 2010;Rusmevichientong et al., 2004;Song et al., 2007;Song et al., 2006). Many studies have attempted to extract users who adopt products earlier as innovative buyers. For example, Song et al. (2006) carried out an experiment on information network flow created from past purchase histories and discovered a method for predicting future purchases. Song et al. (2007) also found a method for identifying opinion leaders in innovator theory using the InfluenceRank algorithm. Rusmevichientong et al. (2004) extracted users' innovation level by focusing on the relative order of users who adopted the same products on Amazon.com. As a real-world example, Mahajan et al. (1990) conducted a study using data on eleven kinds of durable consumer products, and found a way to extract users who adopted quickly as innovative buyers. Mahajan and Muller (1998) also proposed a diffusion modeling framework for companies to identify conditions that target the majority of innovator theory.
These studies showed that there is no need for user-intensive input such as consciousness analysis using a questionnaire. Although there are studies that analyze the behavior data of users who make purchases only online or only at physical stores, there is no study examining the O2O market that is expected to grow rapidly in the future. The purpose of this research is to construct an extraction method that can be applied to O2O-type users without requiring user-intensive input. Rogers (2003) showed that socioeconomic status variables, personality variables, and communication variables differ depending on users' innovation adoption behavior. This suggests that O2Otype users have different online and offline behavior properties, and may also have different socioeconomic status variables, personality variables, and communication variables. This study considers whether it is possible to extract whether users are innovative buyers or not by using differences in behavior data collected from purchase histories at physical stores and access logs from an O2O site. Therefore, to identify innovators, characteristic hypotheses were set for socioeconomic status variables, personality variables, and communication variables (Figure 1). We hypothesize that there are no differences in socioeconomic status variables between innovative and non-innovative buyers (Hypothesis 1). We also hypothesize that there are differences between personality variables and communication variables (Hypothesis 2 and Hypothesis 3).

Overview of the Model
From Hypothesis 3, it can be determined whether a certain user is an innovative buyer by comprehensively estimating the speed of the user's product purchase and their cosmopolite nature. By definition, cosmopolite nature is the degree to which a person is oriented toward an external social system. Generally speaking, cosmopolitan innovators travel extensively and are involved in matters that exceed regional system boundaries. For example, Ryan and Gross (1950) showed that hybrid corn innovators in Iowa traveled to the state capital of Des Moines more than the average farmer. Furthermore, Rogers (2003) showed that innovative physicians who adopted new drugs participated more frequently in specialist meetings in different regions than those who did not. The same is true of innovators among O2O users. If we assume that they go to physical stores such as supermarkets and convenience stores more than the average user, then we can add cosmopolite nature to the formula when extracting innovative buyers.
Thus, we developed a model for extracting and verifying the innovativeness of O2O users based on a method of extracting innovative buyers, by estimating the speed of user product purchases and their cosmopolite natures, and by applying the hypotheses of socioeconomic status, personality, and communication. An extraction model makes it possible to extract the innovation level of a user by using an innovator score that can be calculated from purchase histories (users who purchased a certain product at a physical store) and access logs from an O2O site. To test our hypotheses, we use SPC points calculated with socioeconomic status variables, personality variables, and communication variables as points. SPC points are used to verify whether there is a difference between innovative buyers and noninnovative buyers. To compare the extraction accuracy between a new method and a traditional method, SPC points when estimated by a traditional method is also calculated.  When extracting users' innovation level, we estimate the cosmopolite natures of users in addition to speed of user product purchases. Therefore, the time elapsed until the purchase of an object product is used to estimate the speed of product purchase, and the number of a user's participation in the campaign conducted on an O2O site (defined in this paper as "number of CP participation") 1 is used to estimate cosmopolite natures. These innovator scores are calculated for each user.
The date and time when the user purchases an object product is defined as PD, and an CP start date and time is defined as SD. In 1 This number measures a user's campaign participation concerned with products other than the object product (in other words, the number of visits to physical stores).
addition, the number of CP participation, which is the number of visits to physical stores by users, is defined as CP. The difference between PD and SD is defined as elapsed time, and users are clustered based on the magnitude relationship between elapsed time and CP.
For user set J={1,2,⋯, n}, if PD j -SD is arranged in ascending order and is s(j), then Eq. (1) is obtained.
The score PSS of the elapsed time of a user is defined as Eq.
If c(j) is the order in which CP j is arranged in ascending order, then Eq. (3) is obtained.
The score CPS of the user's number of CP participation is defined as Eq. (4) 3 .
Using the average μ TS and the standard deviation σ TS of innovator scores, we can classify user set J into five clusters in Eq. (6) through Eq. (10) (TS I : Innovators, TS EA : Early Adopters, TS EM : Early Majority, TS LM : Late Majority, and TS L : Laggards).

Hypotheses Verification with SPC Points
We hypothesize that a proposed method (hereinafter referred to as a "new method") will have higher personality and communication variables if a user is highly innovative. However, we hypothesize that higher innovation does not increase socioeconomic status variables, and that there is no difference between innovative and non-innovative buyers. The reason for this is that the object products are consumer products sold in supermarkets and convenience stores, and there is a high possibility that they can be purchased without regard to a user's socioeconomic status. Innovator theory provides that socioeconomic status and innovation are positively correlated with the need for high economic status to address uncertainties and risks of adopting expensive innovations. However, many consumer products, such as foods, are inexpensive (costing about several hundred yen). Furthermore, as Rogers (2003) points out, innovation cannot be measured solely by economic 2 However, when PD s(j) -SD=PD s(j+1) -SD as in Eq. (1), factors, and there are many innovators who do not have high economic status.
In this study, the minimum point of each estimation item was set to 0 and the maximum point was set to 4, and point allocation processing of options was performed according to the number of options and contents of options. Then, Table 1 was constructed to calculate SPC points for each user. For example, if SA is a singleresponse-type question item and corresponds to the estimation item of age, based on Hypothesis 1-1 in Figure 1 we would use 0 in their 20s, 1 in their 30s, 2 in their 40s, 3 in their 50s, and 4 in their 60s or older, and the point width was distributed uniformly.
Additionally, for example, if CB is a multiple-response-type question item and corresponds to the estimation item of new product cognition information, the points of each option are assigned the same value based on Hypothesis 3-8, and the total points of all the options are set to 4. The "Other" option was set to 0, because points may vary greatly depending on the contents of a user's response.  3. The features of users are mapped using socioeconomic status points, personality points, and communication points calculated in step 2. As Figure 3 shows, the similarity relationship is expressed by analyzing the distance between users (Okada et al., 2001).
The distance UD jk between the point representing user j and the point representing user k is defined as Eq. (14) UD where x jt is the coordinates of user j in dimension t, and r is the number of dimensions. Then, as Table 2 shows, the distances between n users are calculated. Table 2 shows the number of combinations that can be made from users, that is, n C 2 =n×(n-1)/2 distances. (The table shows only the lower triangular part of distance matrix of n × n, excluding the main diagonal elements.) Because Table 2 shows UD jk =UD kj .
4. The distance between users calculated in step 3 is aggregated for each cluster, and the similarity relationship between clusters is expressed. The average distance CD A between point sets representing users in cluster A is defined as Eq. (15 where n A is the number of users in cluster A. The average distance CD AB between point sets representing users in cluster A and point sets representing users in cluster B is defined as Eq. (16).
Then, as Table 3 shows, the average distances between five clusters are calculated. Table 3 shows the number of combinations that can be made from 5 clusters (that is, 15 average distances). (The table shows only the lower triangular part of the distance matrix of 5 × 5, including the main diagonal elements.) Because Table 3 shows CD CD EAI IEA = .
5. Hypotheses are verified using the map diagram created in step 3. For example, comparing users classified as innovators with users classified as laggards, in accordance with hypotheses, the figures for personality and communication in the map are higher for innovators. In addition, average distances within a cluster and between clusters are compared using average distances calculated in step 4. For example, comparing users classified as innovators with users classified as laggards, in accordance with hypotheses, the average user distance within innovators is smaller than the average user distance between innovators and laggards.

Comparison with Traditional Method
We verify whether innovative buyers can be extracted by the speed of user product purchases used in a traditional method, and compare it with a new method.
1. Clustering is performed based on the magnitude relationship of elapsed time, representing the difference between a user's object product purchase date PD and a CP start date SD. For user set J={1,2,⋯,n}, if PD j -SD is arranged in ascending order and is s (j), then Eq. (17) is obtained.
Using the average μ PD-SD , the standard deviation σ PD-SD of elapsed time data and coefficient w(w≥0) 4 for adjustment during innovator extraction, classify user set into five clusters in Eq. (18) Table 1, points x s of socioeconomic status, points of x P personality, and points xc of communication are assigned to each user 3. Map users based on the calculated socioeconomic status points, personality points, and communication points. In addition, as Figure 4 shows, users of cluster A in a traditional method, users of cluster A in a new method, and users of cluster A that coincide with a traditional method and a new method are mapped 4

As in
The reason for using coefficient ω is that the standard deviation is too large for the average of the elapsed time data, the value of μ PD-SD -2ωσ PD-SD becomes minus, and there is a high possibility that innovators cannot be extracted successfully.   Figure 4: Comparison of classification results for each method Table 4 shows, the average SPC points are calculated for each cluster, classified by a traditional method and a new method 5. Comparing a traditional method with a new method using the map diagrams created in steps 3 and 4 and the average SPC points shows a more effective innovative buyers extraction method.

Summary of Estimation
We verified whether innovative buyers in Figure 1 can be extracted using a new method. In addition, we verified whether innovative buyers can be similarly extracted by speed of user product purchases used in a traditional method, and compared it with the new method.

Estimation Data
We used behavior data and questionnaire data collected by DO HOUSE Inc. To be concrete, behavior data is recorded data such as product name, user ID, date and time of purchase, and number of CP participation of an object product purchased by users visiting an O2O site and physical stores. The data collection period was November 2018, and the number of users analyzed was 17,450. Furthermore, in March 2019 we extracted some users corresponding to innovators, early adopters, early majority, late majority, and laggards, and obtained questionnaire data for 2,161 users. The application object was beverage brands that included new products among consumer products and had a comparably large number of purchasers. Table 5 summarizes the data on innovator scores and SPC points.

Classification of Object Users by Innovator Scores
Innovator scores were calculated and innovative buyers were extracted based on the magnitude relationship. The distribution of innovator scores is shown in Figure 5.
Using a new method, the top users with high innovator scores could be extracted as innovative buyers. As Figure 5 shows, the distribution has a shape close to the normal distribution shown in innovator theory. In addition, Figure 6 shows that the cumulative number of purchasers also has a shape close to the S-shaped curve shown in innovator theory.

Hypotheses Verification (SPC Points)
Hypotheses are verified for each cluster for the object users.

Verification Method 1: Mapping object users With SPC points
Users were mapped by calculating SPC points. The results are shown in Figure 7 (only innovators and laggards are plotted).
Maps are divided into three patterns: socioeconomic status and personality, socioeconomic status and communication, and personality and communication. The thick line is the cluster of innovators, and the thin line is the cluster of laggards. Figure 7 shows that innovators have higher personality points and communication points than laggards. It is also clear that the difference in socioeconomic status between innovators and laggards is small.

Verification Method 2: Average distance analysis between object clusters using SPC points
Average distances between clusters were analyzed. The results are shown in Table 6. Average distance tables are divided into four patterns: socioeconomic status, personality, and communication; socioeconomic status and personality; socioeconomic status and communication; and personality and communication. Table 6 shows that the average distances between clusters are closer for users within innovators, and the average distances tend to be greater for users between innovators and laggards.

Comparison with Traditional Method
Using the map diagram derived in step 3 of Chapter 4 and the average SPC points calculated in step 4, a new method and a traditional method were compared and estimated. Figure 8 shows the distribution of innovative buyers extracted by a traditional method before these comparative estimations.
As Figure 8 shows, the distribution is not the normal distribution shown in innovator theory, but has a shape close to a uniform   distribution. In addition, Figure 9 shows that the cumulative number of buyers also has a shape close to a straight line instead of an S-shaped curve.
The comparison result between a new method and a traditional method is shown in Figure 10 and Table 7. The results of the map comparison when extracting innovators for each method are divided into three patterns: socioeconomic status and personality, socioeconomic status and communication, and personality and communication ( Figure 10). The solid line is the cluster of innovators extracted by a new method, and the dotted line is the cluster of innovators extracted by a traditional method. Figure 10 and Table 7 show that a new method of classifying users by innovator scores (on most map diagrams) and average SPC points satisfies the conditions of each cluster with higher accuracy than a traditional method of classifying users based only on the speed of product purchase. We can therefore conclude that this new innovative buyers extraction method is effective.

CONSIDERATION
A new method of classifying users by innovator scores is overall more accurate than a traditional method of classifying only by speed of user product purchases. In fact, looking at each cluster  classified by a traditional method, there are many clusters that differ from hypotheses set in advance and are not suitable for extracting innovative buyers. A new method is more accurate than a traditional method in most clusters. However, at average distances between clusters, discussed in Chapter 4.4, users within innovators are farther apart than the distance between users who are innovators and early adopters. This suggests that the small number of samples of innovators themselves have a large effect on average distances, and to reduce average distances, it is necessary to consider an effective method.
Classification by this new method estimates users based on only two items: speed of product purchases and cosmopolite natures.
Since it does not consider other user behaviors in terms of communication and personality, it may not be suitable as an index value indicating innovation. A classification method using data suitable for measuring innovation is required. For example, Rogers (2003) pointed out that there is a positive correlation between user communication variables and innovation. For this reason, the accuracy may be improved by using a user's word-of-mouth as an estimation item. We want to consider this for future tasks.

CONCLUSIONS
This paper proposes a method for extracting innovative buyers using innovator scores that can be calculated from users' purchase histories at physical stores and from access logs from an O2O site.
As a result of verifying the method using beverage brands as an application object, it was confirmed that a new method is more accurate than a traditional method using only the speed of product purchase. This new method is characterized by the ability to extract innovative buyers in a short period of time, using behavior data without requiring user-intensive input. For this reason, it is easy to analyze the behavior and consciousness of large-scale innovative buyers online and offline using this new method.
In the future, it is necessary to improve the accuracy by searching for more effective user estimation items for extraction. Furthermore, we will continue to analyze the behavior and consciousness of innovative buyers and consider ways to use them more effectively in marketing measures.