• 沒有找到結果。

ECONOMICANALYSIS AND APPLICATION OF BIG DATA

N/A
N/A
Protected

Academic year: 2022

Share "ECONOMICANALYSIS AND APPLICATION OF BIG DATA"

Copied!
40
0
0

加載中.... (立即查看全文)

全文

(1)

ECONOMIC ANALYSIS AND APPLICATION OF BIG DATA Monique S.K. Wan Department of Economics

(2)

Agenda

• Definition of Big Data

• Industrial Revolution 4.0

• Applications of Big Data

– Banking and risk management

– Marketing and recommendation system – Government and smart city

• Implications of Big Data

– Business landscape and job nature – Data as asset

– General Data Protection Regulation – Role of economists

(3)

Definition of Big Data

“Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced

insight and decision making. ”

– Doug Laney of Gartner, Inc.

(4)

Every minute:

• 204 million emails

• 1,800,000 Facebook

likes

• 72 hours of videos upload to YouTube

(5)

Broader Definition

• Veracity

4

– Quality of the data

(6)

Broader Definition

• Valence

5

– How big data can bond with each other?

• Measure the ratio of actually connected data items to the possible number of

connections that could occur within the collection

(7)

Broader Definition

• Value

6

– How can big data benefit your organization?

• Require clear business

strategy and data analytics tools

• Cross Industry Standard Process for Data mining (CRISP-DM; Shearer 2000)

Source: Provost and Fawcett (2013)

(8)

Industrial Revolution 4.0

Source: NetObjex Blog

(9)

How Big is Big?

• Company Signet Bank

– A small proportion of customers actually account for more than 100% of a bank’s profit from credit card operations

• Business objective

– Model profitability – Not just default probability, but also pricing

• Data understanding/ Data preparation

– Use of algorithms to learn the willingness-to-pay and charge-offs – Acquire information via 45,000 scientific tests

– Increase in bad accounts (charge-offs)  Cost-and-benefit analysis

(10)

How Big is Big?

• Deployment/ Action

– Offer different customers different rates, terms, offers

• Business value

– A new spin-off in 1994 (Capital One)

• Richard Fairbank and Nigel Morris

– Information-based strategy – Lowest default rate

– Largest credit card issuers

(11)

How Fast is Fast?

• Vehicle with 100 telemetry sensors that capture data to improve track performance

• Data science team

– Driver

– Team principal/Race

engineers/Operations room analysts/trackside analysts

• Real-time data analysis

– Lap/split times – Tire/brake temp – Air pressure – Air flow

– Engine performance – GPS information

(12)

How complex can it be?

• Business understanding

– Provide streaming movie and TV service

• One-third of peak-time Internet traffic in US

• 65M members in over 50 countries

• 100M hours of TV shows and moves a day

– Build recommendation engines

• Data understanding / Data preparation – Unstructured data

• 80,000 features (costly!!!)  Cost-and-benefit analysis

– Existing data = {Customer ID, movie ID, ratings and the date the movie was watched,…}

– New data = {Time spent on selecting movies, how often playback is stopped, tags of the movies,…}

(13)

• Modeling

– Measure similarity among products – Measure similarity among customers

– Predictive analytics make recommendation

• Improve user experience

• Induce to consume more

• Additional business value

– Product Innovation

• Develop new business as content creator

– Outbid HBO: House of Cards directed by David Fincher and starred by Kevin Spacey

Valuable ?

(14)

Banking & Risk Management

• Business question

– What is the likelihood of default for this loan applicant ?

• Data collection

– Past loan records: Applicant’s profiles and outcome (default or not default)

– The profile of this loan applicant

• Modeling (supervised models)

– Classification tree – Logistic regression

– Support vector machines – Neural networks

(15)

Decision Tree For Classification

• Business decision

– Should we write off the loan?

• Model

– Write off the loan if

• Not employed; and

• Remaining loan balance is huge; and

• Older than 45

Source: Provost and Fawcett (2013)

(16)

Decision Tree

• Measurement of Purity

– Error rate

• Definition: 𝐸 = 1 − 𝑚𝑎𝑥 𝑝+, 𝑝

• Example: 𝐸 = 1 − 9

10 = 0.1

– Gini index

• A measure of total variance across the 2 classes

• Definition: 𝐺 = 𝑝+ 1 − 𝑝+ + 𝑝 1 − 𝑝

• Example: 𝐺 = 109 101 + 1

10 9

10 = 0.18

– Entropy

• Definition: 𝐷 = −𝑝+ log2 𝑝+ − 𝑝 𝑙𝑜𝑔2 𝑝

• Example: 𝐷 = − 9

10𝑙𝑜𝑔2 9

10 1

10𝑙𝑜𝑔2 1

10

(17)

Logistic Regression

Target variable

𝑦𝑖 = ቊ1 Write − off

0 Not Write − off

Feature variables

• 𝐴𝑔𝑒𝑖

• 𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑖 𝒍𝒐𝒈𝒊𝒕 𝒚𝒊 = −𝟔𝟎 + 𝑨𝒈𝒆𝒊 + 𝟏. 𝟓𝑩𝒂𝒍𝒂𝒏𝒄𝒆𝒊

Source: Provost and Fawcett (2013)

(18)

Forecasting Recession

Target variable

𝑦𝑡 = ቊ1 Recession 0 Otherwise

Feature variable

𝑌𝑖𝑒𝑙𝑑 𝑐𝑢𝑟𝑣𝑒𝑡 = 𝐿𝑅 − 𝑆𝑅 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝑟𝑎𝑡𝑒

(19)

Marketing Strategy

• Domain Knowledge

– Cost = Contact Hours × Wage Rate = $25 – Product Price = $200

– Product Cost = $100

– Respond (R)  Benefit = $200 − $100 − $25 = $75 – No Respond (NR)  Benefit = −$25

• Expected benefit of targeting

𝐸𝑉 = 𝑝𝑅 × 𝑣𝑅 + 𝑝𝑁𝑅 × 𝑣𝑁𝑅

𝐸𝑉 = 𝑝𝑅 × $75 + 1 − 𝑝𝑅 × (−$25) > 0 𝑝𝑅 > 0.25

• Business action

– Target the customers if their estimated probability of responding to the marketing plan >25%

(20)

Data Analytics

Model Prediction

(Probability Score) Action Class

0.87 Contact Respond

0.84 Contact Respond

0.76 Contact Respond

0.65 Contact Not Respond

0.61 Contact Respond

0.54 Contact Respond

0.47 Contact Not Respond

0.35 Contact Not Respond

0.24 Do Not Contact Respond

(21)

Recommendation System

• Personalization

– Predict user’s interests and recommend products/

services

• Spotify, YouTube, Netflix

• Data types

– Characteristic information

• Users: User’s background information, Preferences

• Items: Categories, Specific features

– User-item interactions

• Purchase histories, User ratings, Search engine queries, Browsing sequences

(22)

Recommendation System

• Content-based system

– Based on user’s profile features and item features

• New products falling outside the past “interests” will not be recommended

• New customers with not much information may be ignored

• Collaborative filtering system

– Utilize user-item interaction

• If a user likes item A, and another user likes A and B.

Then, B will be recommended to first user

– Identify clusters of users and items

(23)

Types of Recommendation System

Content-Based Collaborative Filtering

Source: https://www.youtube.com/watch?v=Eeg1DEeWUjA

(24)

Government & Smart City

• Technology

– More information with higher accuracy and faster information flow

• Applications

– Transportation: Bus scheduling/ smart parking – Banking: Cashless and contactless payments

– Waste/ Drainage management system: Robotics and assistive technologies

– Energy conservation: Data-empowered urban environments

– Healthcare: Technology-enabled homes

(25)

Government & Smart City

• Chakravorti and Chaturvedi (2017)

– Citizens/People Components

• Inclusivity, environment and quality of life, state of talent and the human condition, talent development

– Economy Components

• Global connectedness, economic robustness, entrepreneurial ecosystem, innovation capacity.

– Institutions Components

• Freedoms offline and online, trust, safety and security, public services

(26)
(27)

Business Landscape & Job Nature

Business Models

• New business/ business models

– Data consulting companies – Cloud storage

– Apps developers – Automation

– Sharing economy

• Traditional business models

– Entertainment companies – Media

– Intermediaries

Job Nature

• Job creation

– Data analysts

– Computer scientists

– Decision makers with strong domain knowledge

• Job displacement

– Old fashioned sales representatives

– Jobs with routine tasks

(28)

Cloud and Data as Asset

Alibaba-Tencent struggle disrupts, invigorates China internet

Source: https://www.bnext.com.tw/article/44785/shun-feng-with-alibaba-china-most-strong-logistics-logistics-platform-of-decorum Source: http://fortune.com/longform/alibaba-tencent-china-internet/

(29)

Produce and Manage Data Assets

Infonomics is the concept that

information is, or should be, an actual enterprise asset.

– Doug Laney

(30)

Global Data Protection Regulation

• 7 Principles

– Purpose limitation – Data minimization – Accuracy

– Storage limitation

– Lawfulness, fairness and transparency

• Requiring the consent of subjects for data processing

– Integrity and confidentiality

• Anonymizing collected data to protect privacy

– Accountability

• Safely handling the transfer of data across borders

• Providing data breach notifications

(31)

Role of Economists

• Macroeconomists

• Microeconomists

• Econometricians

James Heckman Daniel McFadden John Nash Daniel Kahneman Robert Lucas Robert Solow

(32)

Jobs for Economists

• Examples of tech companies that have hired PhD Economists

Pat Bajari Hal Varian Susan Athey Johnathan Hall

(33)

Role of Economists in Tech Firms

• Empirical Industrial Organization (Athey and Luca, 2019; Shum, 2016)

– Estimate demand function and market power

– Design online advertising strategies and estimate returns to advertising

– Design review and reputation systems and analyze the effect of reviews

– Evaluate acquisitions, exclusive deals and strategy – Promote incentives in marketplaces

(34)

Demand Function

• Industrial organization

– More about supply-side (firm-side)

– How much market power do firms have?

• Market power

– Markup:

𝑝−𝑚𝑐

𝑝

– Marginal cost (𝑚𝑐) is unobserved!

• Observation: High price (𝑝) in an industry

– High market power?

– High 𝑚𝑐?

(35)

Demand Function

• Monopoly

𝐦𝐚𝐱

𝒑

𝒑𝒒 𝒑 − 𝑪 𝒒 𝒑

• First-order condition

𝒒 𝒑 + 𝒑𝒒

𝒑 = 𝑪′ 𝒒 𝒑 𝒒

𝒑

• Optimal price

𝒑

− 𝒎𝒄 𝒒 𝒑

𝒑

= − 𝒒 𝒑

𝒒

𝒑

𝟏

𝒑

Price elasticity

of demand

(36)

Traditional Approach to Demand Estimation

• Consumer demand as a utility maximization problem

max𝑥1,𝑥2 𝑈 𝑥1, 𝑥2 s.t. 𝑝1𝑥1 + 𝑝2𝑥2 = 𝑀

– 𝑝1 and 𝑝2 are prices of good 1 and good 2 – 𝑀 is income

• Solution

– 𝑥1 𝑝1, 𝑝2, 𝑀 and 𝑥2 𝑝1, 𝑝2, 𝑀

• Question

– Do we really need 𝑥1 good 1 and 𝑥2 good 2?  Discrete choice

(37)

Discrete Choice Modeling

• Product nature/ Consumption pattern – Many alternatives, too many parameters

• Automobile, airlines, cereals, toothpaste

– Consumers only choose one of the available options (discrete choice)

• Random utility framework (McFadden, 1978, 1981) max

𝑗,𝑧

𝑈

𝑖

𝑥

𝑗

, 𝑧 subject to 𝑝

𝑗

+ 𝑝

𝑧

𝑧 = 𝑦

𝑖

– 𝑥

𝑗

is the 𝑗

𝑡ℎ

product with price 𝑝

𝑗

– 𝑧 is other product with price 𝑝

𝑧

– 𝑦

𝑖

is the income for the 𝑖

𝑡ℎ

individual

(38)

Discrete Choice Modeling

• Indirect utility function

– 𝑈

𝑖,𝑗

= 𝑉

𝑖𝑗

𝑝

𝑗

, 𝑝

𝑧

, 𝑦

𝑖

+ 𝜀

𝑖𝑗

– 𝑉

𝑖𝑗

𝑝

𝑗

, 𝑝

𝑧

, 𝑦

𝑖

is a function that depends on observed variables

– 𝜀

𝑖𝑗

is utility shock that is not observed by others

• Consumer preference

– Choose product 𝑗 when 𝑈

𝑖,𝑗

> 𝑈

𝑖,𝑘

for 𝑘 ≠ 𝑗

• Objective

– Estimate the choice probabilities for all goods

(39)

Economists at Amazon: Making an Impact

(40)

References

Athey (2017). Beyond prediction: Using big data for policy problems. Science 355:

483-485.

Athey and Luca (2019). Economists (and economics) in tech companies. Journal of Economic Perspectives 33(1): 209-230.

Varian H.R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives 28(2): 3-28.

Rudebusch G.D., and Williams J. (2005). Forecasting recessions: The puzzle of the enduring power of the yield curve. Journal of Business & Economic Statistics 27(4):

492-503.

Chakravorti B. and Chaturvedi R.S. (2017) The “smart society” of the future doesn’t look like science fiction. Harvard Business Review.

Provost F. and T. Fawcett (2013). Data Science for Business. 1st Ed. O’ Reilly Media.

Bernard M. (2016). Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results. 1st Ed. Wiley.

Shum M. (2016). Econometric Models for Industrial Organization. World Scientific Lecture Notes in Economics Vol. 3. World Scientific.

參考文獻

相關文件

Official Statistics --- Reproduction of these data is allowed provided the source is quoted.. Further information can be obtained from the Documentation and Information Centre

Cost-and-Error-Sensitive Classification with Bioinformatics Application Cost-Sensitive Ordinal Ranking with Information Retrieval Application Summary.. Non-Bayesian Perspective

what is the most sophisticated machine learning model for (my precious big) data. • myth: my big data work best with most

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the

People need high level critical thinking skill to receive and deconstruct media messages and information from different sources.

• When the coherence bandwidth is low, but we need to use high data rate (high signal bandwidth). • Channel is unknown

• When the coherence bandwidth is low, but we need to use high data rate (high signal bandwidth). • Channel is unknown

Good Data Structure Needs Proper Accessing Algorithms: get, insert. rule of thumb for speed: often-get