ECONOMIC ANALYSIS AND APPLICATION OF BIG DATA Monique S.K. Wan Department of Economics
Agenda
• Definition of Big Data
• Industrial Revolution 4.0
• Applications of Big Data
– Banking and risk management
– Marketing and recommendation system – Government and smart city
• Implications of Big Data
– Business landscape and job nature – Data as asset
– General Data Protection Regulation – Role of economists
Definition of Big Data
“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced
insight and decision making. ”
– Doug Laney of Gartner, Inc.
Every minute:
• 204 million emails
• 1,800,000 Facebook
• likes
• 72 hours of videos upload to YouTube
Broader Definition
• Veracity
4– Quality of the data
Broader Definition
• Valence
5– How big data can bond with each other?
• Measure the ratio of actually connected data items to the possible number of
connections that could occur within the collection
Broader Definition
• Value
6– How can big data benefit your organization?
• Require clear business
strategy and data analytics tools
• Cross Industry Standard Process for Data mining (CRISP-DM; Shearer 2000)
Source: Provost and Fawcett (2013)
Industrial Revolution 4.0
Source: NetObjex Blog
How Big is Big?
• Company Signet Bank
– A small proportion of customers actually account for more than 100% of a bank’s profit from credit card operations
• Business objective
– Model profitability – Not just default probability, but also pricing
• Data understanding/ Data preparation
– Use of algorithms to learn the willingness-to-pay and charge-offs – Acquire information via 45,000 scientific tests
– Increase in bad accounts (charge-offs) Cost-and-benefit analysis
How Big is Big?
• Deployment/ Action
– Offer different customers different rates, terms, offers
• Business value
– A new spin-off in 1994 (Capital One)
• Richard Fairbank and Nigel Morris
– Information-based strategy – Lowest default rate
– Largest credit card issuers
How Fast is Fast?
• Vehicle with 100 telemetry sensors that capture data to improve track performance
• Data science team
– Driver
– Team principal/Race
engineers/Operations room analysts/trackside analysts
• Real-time data analysis
– Lap/split times – Tire/brake temp – Air pressure – Air flow
– Engine performance – GPS information
How complex can it be?
• Business understanding
– Provide streaming movie and TV service
• One-third of peak-time Internet traffic in US
• 65M members in over 50 countries
• 100M hours of TV shows and moves a day
– Build recommendation engines
• Data understanding / Data preparation – Unstructured data
• 80,000 features (costly!!!) Cost-and-benefit analysis
– Existing data = {Customer ID, movie ID, ratings and the date the movie was watched,…}
– New data = {Time spent on selecting movies, how often playback is stopped, tags of the movies,…}
• Modeling
– Measure similarity among products – Measure similarity among customers
– Predictive analytics make recommendation
• Improve user experience
• Induce to consume more
• Additional business value
– Product Innovation
• Develop new business as content creator
– Outbid HBO: House of Cards directed by David Fincher and starred by Kevin Spacey
Valuable ?
Banking & Risk Management
• Business question
– What is the likelihood of default for this loan applicant ?
• Data collection
– Past loan records: Applicant’s profiles and outcome (default or not default)
– The profile of this loan applicant
• Modeling (supervised models)
– Classification tree – Logistic regression
– Support vector machines – Neural networks
Decision Tree For Classification
• Business decision
– Should we write off the loan?
• Model
– Write off the loan if
• Not employed; and
• Remaining loan balance is huge; and
• Older than 45
Source: Provost and Fawcett (2013)
Decision Tree
• Measurement of Purity
– Error rate
• Definition: 𝐸 = 1 − 𝑚𝑎𝑥 𝑝+, 𝑝−
• Example: 𝐸 = 1 − 9
10 = 0.1
– Gini index
• A measure of total variance across the 2 classes
• Definition: 𝐺 = 𝑝+ 1 − 𝑝+ + 𝑝− 1 − 𝑝−
• Example: 𝐺 = 109 101 + 1
10 9
10 = 0.18
– Entropy
• Definition: 𝐷 = −𝑝+ log2 𝑝+ − 𝑝− 𝑙𝑜𝑔2 𝑝−
• Example: 𝐷 = − 9
10𝑙𝑜𝑔2 9
10 − 1
10𝑙𝑜𝑔2 1
10
Logistic Regression
Target variable
𝑦𝑖 = ቊ1 Write − off
0 Not Write − off
Feature variables
• 𝐴𝑔𝑒𝑖
• 𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑖 𝒍𝒐𝒈𝒊𝒕 𝒚𝒊 = −𝟔𝟎 + 𝑨𝒈𝒆𝒊 + 𝟏. 𝟓𝑩𝒂𝒍𝒂𝒏𝒄𝒆𝒊
Source: Provost and Fawcett (2013)
Forecasting Recession
Target variable
𝑦𝑡 = ቊ1 Recession 0 Otherwise
Feature variable
𝑌𝑖𝑒𝑙𝑑 𝑐𝑢𝑟𝑣𝑒𝑡 = 𝐿𝑅 − 𝑆𝑅 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝑟𝑎𝑡𝑒
Marketing Strategy
• Domain Knowledge
– Cost = Contact Hours × Wage Rate = $25 – Product Price = $200
– Product Cost = $100
– Respond (R) Benefit = $200 − $100 − $25 = $75 – No Respond (NR) Benefit = −$25
• Expected benefit of targeting
𝐸𝑉 = 𝑝𝑅 × 𝑣𝑅 + 𝑝𝑁𝑅 × 𝑣𝑁𝑅
𝐸𝑉 = 𝑝𝑅 × $75 + 1 − 𝑝𝑅 × (−$25) > 0 𝑝𝑅 > 0.25
• Business action
– Target the customers if their estimated probability of responding to the marketing plan >25%
Data Analytics
Model Prediction
(Probability Score) Action Class
0.87 Contact Respond
0.84 Contact Respond
0.76 Contact Respond
0.65 Contact Not Respond
0.61 Contact Respond
0.54 Contact Respond
0.47 Contact Not Respond
0.35 Contact Not Respond
0.24 Do Not Contact Respond
⋮ ⋮ ⋮
⋮ ⋮ ⋮
Recommendation System
• Personalization
– Predict user’s interests and recommend products/
services
• Spotify, YouTube, Netflix
• Data types
– Characteristic information
• Users: User’s background information, Preferences
• Items: Categories, Specific features
– User-item interactions
• Purchase histories, User ratings, Search engine queries, Browsing sequences
Recommendation System
• Content-based system
– Based on user’s profile features and item features
• New products falling outside the past “interests” will not be recommended
• New customers with not much information may be ignored
• Collaborative filtering system
– Utilize user-item interaction
• If a user likes item A, and another user likes A and B.
Then, B will be recommended to first user
– Identify clusters of users and items
Types of Recommendation System
Content-Based Collaborative Filtering
Source: https://www.youtube.com/watch?v=Eeg1DEeWUjA
Government & Smart City
• Technology
– More information with higher accuracy and faster information flow
• Applications
– Transportation: Bus scheduling/ smart parking – Banking: Cashless and contactless payments
– Waste/ Drainage management system: Robotics and assistive technologies
– Energy conservation: Data-empowered urban environments
– Healthcare: Technology-enabled homes
Government & Smart City
• Chakravorti and Chaturvedi (2017)
– Citizens/People Components
• Inclusivity, environment and quality of life, state of talent and the human condition, talent development
– Economy Components
• Global connectedness, economic robustness, entrepreneurial ecosystem, innovation capacity.
– Institutions Components
• Freedoms offline and online, trust, safety and security, public services
Business Landscape & Job Nature
Business Models
• New business/ business models
– Data consulting companies – Cloud storage
– Apps developers – Automation
– Sharing economy
• Traditional business models
– Entertainment companies – Media
– Intermediaries
Job Nature
• Job creation
– Data analysts
– Computer scientists
– Decision makers with strong domain knowledge
• Job displacement
– Old fashioned sales representatives
– Jobs with routine tasks
Cloud and Data as Asset
Alibaba-Tencent struggle disrupts, invigorates China internet
Source: https://www.bnext.com.tw/article/44785/shun-feng-with-alibaba-china-most-strong-logistics-logistics-platform-of-decorum Source: http://fortune.com/longform/alibaba-tencent-china-internet/
Produce and Manage Data Assets
“
Infonomics is the concept that
information is, or should be, an actual enterprise asset.
”– Doug Laney
Global Data Protection Regulation
• 7 Principles
– Purpose limitation – Data minimization – Accuracy
– Storage limitation
– Lawfulness, fairness and transparency
• Requiring the consent of subjects for data processing
– Integrity and confidentiality
• Anonymizing collected data to protect privacy
– Accountability
• Safely handling the transfer of data across borders
• Providing data breach notifications
Role of Economists
• Macroeconomists
• Microeconomists
• Econometricians
James Heckman Daniel McFadden John Nash Daniel Kahneman Robert Lucas Robert Solow
Jobs for Economists
• Examples of tech companies that have hired PhD Economists
Pat Bajari Hal Varian Susan Athey Johnathan Hall
Role of Economists in Tech Firms
• Empirical Industrial Organization (Athey and Luca, 2019; Shum, 2016)
– Estimate demand function and market power
– Design online advertising strategies and estimate returns to advertising
– Design review and reputation systems and analyze the effect of reviews
– Evaluate acquisitions, exclusive deals and strategy – Promote incentives in marketplaces
Demand Function
• Industrial organization
– More about supply-side (firm-side)
– How much market power do firms have?
• Market power
– Markup:
𝑝−𝑚𝑐𝑝
– Marginal cost (𝑚𝑐) is unobserved!
• Observation: High price (𝑝) in an industry
– High market power?
– High 𝑚𝑐?
Demand Function
• Monopoly
𝐦𝐚𝐱
𝒑𝒑𝒒 𝒑 − 𝑪 𝒒 𝒑
• First-order condition
𝒒 𝒑 + 𝒑𝒒
′𝒑 = 𝑪′ 𝒒 𝒑 𝒒
′𝒑
• Optimal price
𝒑
∗− 𝒎𝒄 𝒒 𝒑
∗𝒑
∗= − 𝒒 𝒑
∗𝒒
′𝒑
∗𝟏
𝒑
∗Price elasticity
of demand
Traditional Approach to Demand Estimation
• Consumer demand as a utility maximization problem
max𝑥1,𝑥2 𝑈 𝑥1, 𝑥2 s.t. 𝑝1𝑥1 + 𝑝2𝑥2 = 𝑀– 𝑝1 and 𝑝2 are prices of good 1 and good 2 – 𝑀 is income
• Solution
– 𝑥1∗ 𝑝1, 𝑝2, 𝑀 and 𝑥2∗ 𝑝1, 𝑝2, 𝑀
• Question
– Do we really need 𝑥1∗ good 1 and 𝑥2∗ good 2? Discrete choice
Discrete Choice Modeling
• Product nature/ Consumption pattern – Many alternatives, too many parameters
• Automobile, airlines, cereals, toothpaste
– Consumers only choose one of the available options (discrete choice)
• Random utility framework (McFadden, 1978, 1981) max
𝑗,𝑧𝑈
𝑖𝑥
𝑗, 𝑧 subject to 𝑝
𝑗+ 𝑝
𝑧𝑧 = 𝑦
𝑖– 𝑥
𝑗is the 𝑗
𝑡ℎproduct with price 𝑝
𝑗– 𝑧 is other product with price 𝑝
𝑧– 𝑦
𝑖is the income for the 𝑖
𝑡ℎindividual
Discrete Choice Modeling
• Indirect utility function
– 𝑈
𝑖,𝑗∗= 𝑉
𝑖𝑗𝑝
𝑗, 𝑝
𝑧, 𝑦
𝑖+ 𝜀
𝑖𝑗– 𝑉
𝑖𝑗𝑝
𝑗, 𝑝
𝑧, 𝑦
𝑖is a function that depends on observed variables
– 𝜀
𝑖𝑗is utility shock that is not observed by others
• Consumer preference
– Choose product 𝑗 when 𝑈
𝑖,𝑗∗> 𝑈
𝑖,𝑘∗for 𝑘 ≠ 𝑗
• Objective
– Estimate the choice probabilities for all goods
Economists at Amazon: Making an Impact
References
• Athey (2017). Beyond prediction: Using big data for policy problems. Science 355:
483-485.
• Athey and Luca (2019). Economists (and economics) in tech companies. Journal of Economic Perspectives 33(1): 209-230.
• Varian H.R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives 28(2): 3-28.
• Rudebusch G.D., and Williams J. (2005). Forecasting recessions: The puzzle of the enduring power of the yield curve. Journal of Business & Economic Statistics 27(4):
492-503.
• Chakravorti B. and Chaturvedi R.S. (2017) The “smart society” of the future doesn’t look like science fiction. Harvard Business Review.
• Provost F. and T. Fawcett (2013). Data Science for Business. 1st Ed. O’ Reilly Media.
• Bernard M. (2016). Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results. 1st Ed. Wiley.
• Shum M. (2016). Econometric Models for Industrial Organization. World Scientific Lecture Notes in Economics Vol. 3. World Scientific.