人工智慧對科技發展的影響與挑戰陳力俊

(1)

人工智慧對科技發展的影響與挑戰

陳力俊

清華大學材料科學工程系

(2)

Concept Clarification

S&T on AI vs. AI on S&T S&T vs. Industry

智慧 vs. 智能

(Big, small) data and (big, small) science

AI vs. Automation

(3)

AI in need

Massive, Complex Efficient search

Machine (deep) learning from data

New Discovery, Hypothesis

(4)

AI in need

Particle physics (1 Higgs boson in 10 ⁹ events)

Astronomy (2x10 ¹² galaxy x10 ⁸ stars = 2x10 ²⁰ stars)

Genome (2x10 ⁹ base-pairs in DNA

20,000–25,000 distinct protein-coding

genes)

(5)

AI in need

Brain and neuroscience (10 ¹¹

neurons x 7x10 ³ synaptic connections)

Pharmaceutical compounds (P53 protein, 109,052 papers since

1981)

Meteorology ( 45x10 ¹⁵ bytes in archive)

(6)

AI in need

Machine learning

(computational statistics,

mathematical optimization) Data Mining (unsupervised

learning)

Data analytics, Informatics

(7)

Nano Structures and Dynamics Lab., MSE, NTHU

Data Explosion: running out of SI metric system

Tera (10 ¹² ), Peta (10 ¹⁵ ), Exa (10 ¹⁸ ): Zetta (10 ²¹ ), Yotta (10 ²⁴ )

All books written: 480 TB All words spoken: 5 EB

UC Berkeley School of Information

(2003)

(8)

Digitization of Everything

Digitization is the work of turning all kinds of information and media—text, sounds, photos, video, data from instruments and sensors, and so on—

into the ones and zeroes that are the

native language of computers and their

kin.

(9)

Nature 528,18–19 (2015)

(10)

http://hlaoo1980 .blogspot.tw/201 3/11/higgs-

boson-physicists-

share-2013.html

(11)

Large Hadron Collider, CERN

(12)

Large Hadron Collider (LHC)

• Compact Muon Solenoid (CMS)

• A Toroidal LHC ApparatuS (ATLAS)

• A Large Ion Collider Experiment (ALICE)

• Large Hadron Collider beauty (LHCb)

(13)

Discovery of Higgs bosons

ATLAS and CMS designed to discover new particles: several x 10 ⁸

collisions/s, 10 ^-3 events

Upgrade: 20–fold in 2025

1 Higgs boson in 10 ⁹ events

(14)

(15)

(16)

Large Hadron Collider (LHC)

• 27 km in cricumsference

• 7-8 (2010), 13 (2015) TeV

• > 10,000 scientists

• Large Hadron Collider beauty (LHCb)

(17)

Large Hadron Collider (LHC)

• World record holder of computing grid

• Tens of peta (10 ¹⁵ ) bytes/yr

• ~ light speed

• 25 nano-s pulse interval

(18)

LHC signal hints at cracks in physics' standard model

LHCb: finding known particles so they can be studied in detail: subtle asymmetries between particles and their antimatter counterparts.

Within two weeks of the energy upgrade,

the detector had ‘rediscovered’ a particle

called the J/Ψ meson — first found in 1974

by two separate US experiments, and later

deemed worthy of a Nobel prize.

(19)

(20)

(21)

https://www.theverge.com/2017/12/14/16777394/google-

nasa-ai-machine-learning-planets-astronomy

(22)

Kepler Space Telescope: Exoplanet Hunter

https://en.wikipedia.org/wiki/Kepler_(spacecraft)

(23)

(24)

(25)

https://www.space.com/24827-kepler-space-telescope-

exoplanet-bonanza-explained-infographic.html

(26)

https://www.space.com/24827-kepler-space-telescope-

exoplanet-bonanza-explained-infographic.html

(27)

(28)

(29)

(30)

https://twitter.com/kathykmy/status/8424090

63912677376

(31)

https://medium.com/data-collective/bringing- zymergen-to-scale-enabling-engineering-

biology-with-ai-robotics-d998a2b0cc1

(32)

(33)

Deep Variant

 to find mutations in genomes

 convert strands of DNA letters into

images that computers could recognize.

 Train their network on DNA snippets

that had been aligned with a reference

genome, and whose mutations were

known

(34)

Deep Variant

 Turn high-throughput sequencing

readouts into a picture of a full genome

 It automatically identifies small insertion

and deletion mutations and single-base-

pair mutations in sequencing data.

(35)

PrecisionFDA Truth Challenge, a contest run by the FDA to promote more accurate

genetic sequencing.

(36)

CARL ZIMMER, JUNE 4, 2014, New York Times

(37)

An AI-Driven Genomics Company Is

Turning to Drugs - MIT Technology Review

(38)

(39)

single_nucleotide_polymorphism (SNP) and insertion/deletion (INDEL) calling

Next Generation Sequencing (NGS)

(40)

(41)

(42)

(43)

Scientific Publications

• > 50 M with 1 M papers annually (every 30 sec)

• > 70,000 papers on p53

• > 240,000 papers on 500+ known human kinases in their abstract.

• 10 papers/day : 70 years to go

(44)

(45)

Knowledge Integration Toolkit (KnIT)

• mines the information contained in the scientific literature

• represents it explicitly in a queriable network

• reasons upon these data to

generate novel and experimentally

testable hypotheses

(46)

Knowledge Integration Toolkit (KnIT)

• entity detection

• neighbor-text feature analysis

• graph-based diffusion of information

to identify potential new properties of entities strongly implied by existing relationships

(new protein kinases that phosphorylate the

protein tumor suppressor p53)

(47)

Knowledge Integration Toolkit (KnIT)

• Retrospective analysis

• Verification by laboratory experiments

• Establish proof of principle for

automated hypothesis generation and discovery based on text

mining of the scientific literature.

(48)

https://scholar.harvard.edu/files/alacoste/files

/p1877-spangler.pdf

(49)

https://scholar.harvard.edu/files/alacoste/files

(50)

https://scholar.harvard.edu/files/

alacoste/files/p1877-spangler.pdf

(51)

(52)

Materials Science & Engineering

Bandwagon

Materials Genome Initiative Local Efforts

Future Outlook

(53)

Nano Structures and Dynamics Lab., MSE, NTHU

Bandwagons in MSE

Laser annealing

High Tc Superconductors

Carbon Nanotubes (174,473 ) Nanomaterials

Graphene (134,811)

2D Semiconductors

(54)

(55)

https://phys.org/news/2016-08-unraveling-

crystal-high-temperature-superconductor.html

(56)

https://www.slideshare.net/SREESANGH/carbo

n-nanotubes-sreesangh-p-ghosh

(57)

https://www.researchgate .net/figure/TEM-

micrographs-of-carbon- nanotubes-consisting-a- five-b-two-c-seven-

graphitic_fig1_221914388

(58)

HRTEM image ACF processed images

Amorphous -Ge (20 nm), as-deposited

(59)

https://www.youtube.com/watch?v=xJ32NfAyjpY

(60)

(61)

(62)

(63)

(64)

(65)

(66)

(67)

(68)

http://www.sim-flanders.be/event/round-

table-data-mining-materials

(69)

Presentation Title, 46pts

Presenter Name, Title, 20pts

材料數位科技(MGI+AI)

-結合電腦模擬與AI機器學習加速產業創新研發

張志祥/張哲銘

工研院材料化工所

(70)

領導廠商在材料數位科技發展案例

MGI/AI 發展案例

美國福特汽車

• 整合計算材料工程(ICME)虛擬鑄鋁方法

• 開發輕量化發動機合金材料

• 時程縮減25%，成本節省1億美元

美國QuesTek

• 跨尺度模擬設計，MGI研發模式

• 開發齒輪用Ferrium C61高強度合金

• UTS 提升 39 % ，輕量化 >20%

美國通用電氣公司

• 計算熱力學&材料性能模型和資料庫

• 開發的渦輪機用GTD262高溫合金

• 相穩定性>1000 °C，研發經費降低80%

美國波音公司

• 跨尺度模擬設計，MGI研發模式

• 新材料商業化時程縮短一半 (12年  6年)

韓國三星公司

• 第一原理計算材料物性(MGI)

• 類神經網路機器學習(AI)

• 開發高電壓鋰離子電池的電解液

• 循環效率從64.3%提升至80.8%

日本豐田中研所

• 2017宣布將投資 3,500 萬美元

（ 10.6 億元台幣），運用 AI 來尋 找電動車及氫動力汽車的電池材料和催化劑人工智能基因演算法

(71)

•Boeing面臨飛機複雜度越高，研發成本與交貨期不斷增加窘困。767、777、787飛機從訂單到交貨分別花了4、5、7年，主要瓶頸在於Materials property evaluation、Design-value

development、Component test的成本與時間大幅增加

•導入MGI研發模式，讓新材料從研發到商業化時程由原本超過12年縮短至6年，建立從 atoms to aircraft 跨尺度模擬設計能量，並搭配實驗測試驗證，讓Boeing持續保持競爭優勢

波音透過MGI大幅縮減研發時程

Launch Order –1978 Delivered –1982

4 years 5 years 7 years

• Reduced time and cost

• Increased design space

• Increased performance

Computation from atoms to aircraft

- Linking modeling & simulation with experiments

(72)

韓國三星電子

- 導入MGI/AI開發鋰離子電池電解液

•三星電子導入MGI/AI手法開發次世代高電壓鋰離子電池的電解液，以高通量計算近百萬筆數據篩選出數十種具潛力的分子，大幅降低材料開發時程與成本

•經快速篩選過程，選擇適當添加劑，將循環效率從64.3%大幅地提升至80.8%

•三星電子亦結合機器學習(ML)的數據分析和預測，達到~80%的精準度

原數據庫

有機分子數據

~1,000,000筆

循環效率提升

64.3%

80.8%

Quinoxaline

高通量計算 關鍵參數篩選

負極型高電壓電解質

Step I.

40,148筆

Step II.

8,733筆

Step III.

315筆

Final step

~20筆

實驗驗證

氧化還原電位鋰鹽溶解度熱力學穩定性化學反應性

原數據

~1,000,000筆

8,733筆 315筆

40,148筆

~20筆關鍵問題

• 工作電壓小 → ~4V

• 電化學活性大 → SEI形成

• 穩定性低 →安全性

• …

Electrolyte

關鍵參數電化學視窗 (氧化還原電位差)

(73)

機器學習應用領域特性比較

資料取得使用目標

特徵選取

材料預測/製程最佳化

• 可藉由實驗、模擬、檢測分析獲取數據

• 數據量少(10

¹

~10

³

)

• 材料特性預測

• 製程參數解析

• 透過專家知識 (domain knowledge)準確選取特徵

• 高關聯性特徵可降低數據量的要求

電子商務/金融

• 分析使用者習慣

• 市場預測

• 藉由網站蒐集使用者訊息

• 數據無時無刻都在累積

• 數據量多(10

⁶

~10

⁸

)

• 特徵雜亂，不易準確選取

• 一般會以深度學習 (deep

learning) 來自動尋找合適

特徵

(74)

AI機器學習在材料產業的應用方向

• 基因演算法輔助分子設計

• 搭配第一原理材料計算

• 加速材料物性資料庫建置

新材料快速設計應用

元件特性快速預測分析

• 元件可靠度分析

• 元件壽命預測

• 產品特性預測

• 虛擬量測

影像圖譜快速判圖解析

• 圖譜相似度SS計算

• 圖譜快速解圖

• MS/UV/FTIR/NMR

• 缺陷影像辨識

材料配方與製程快速優化

• 材料配方比例預測

• 製程參數快速優化

• 製程參數全域最佳化

• 因子權重排序分析

• IoT生產數據分析

• 材料科學文獻檢索

• 文獻數據蒐集擷取

• 專利大量閱讀

• 產業資料蒐集

材料Text Mining文字探勘技術

(75)

機器學習加速新材料物性資料庫建立 -以二元陶瓷材料為例-

• 二元陶瓷材料物性資料庫建立(共約500多種組合)，需依靠第一原理材料理論計算。

• 以128組理論數據結合隨機森林法(RF)，建立楊氏係數預測模型。

• 以此預測模型(準確度R

²

>0.90)可於1天內取得其餘組成之物性。

• 如全部依靠第一原理材料理論計算，至少需耗時半年以上。

• 當數據量少時，專家知識(Domain Knowledge)導入的準確性就更為重要。

取得材料晶體結構晶體結構最佳化

(取得輸入特徵)

楊氏係數、熱膨脹係數預測快速建立資料庫

其餘400組陶瓷材料特性可在1天內取得

預測模型準確度R

²

>0.90後可加速資料庫建立

物性關鍵影響因子 楊氏係數

• 生成能

• 陽離子半徑

• 陽離子價數

• 電負度

楊氏係數預測驗證

R

²

=0.940

學習模型建立的過程

提升準確度的關鍵 1. 材料專家知識 2. 晶體結構優化 3. 了解數據特性 4. 篩選合適演算法 5. 演算法參數優化 (grid search)

(76)

機器學習案例-製程參數設計

-以陶瓷材料耐化性為例-

• 國內民生水五金產業聚落主要坐落在彰化及苗栗，產業市值達600億，其中陶瓷閥芯約占80億。

• 缺乏配方設計與製程條件優化等工具，無法滿足工業等級閥片在耐化性的規格需求。

• 透過機器學習以小樣本實驗數據(25組)建立酸蝕失重與鹼蝕失重的預測模型，大幅減少後續實驗次數，縮減產品開發時程≧50%。

• 該預測模型可提供國內廠商產品快速由民生等級提升至工業等級，大幅提升產品產值20倍。

酸蝕失重(%)預測準確度(R²)為0.89 以12組實驗結果進行學習驗證

• 氧化鋁比例

• 助熔劑配方

• 助熔劑比例

• 粉體造粒

• 乾壓壓力

• 燒結溫度

• ….

陶瓷材料耐化性 影響參數 (輸入特徵)

鹼蝕失重(%)預測準確度(R²)為0.50

酸蝕失重(%)預測準確度(R²)提升至0.97 以18組實驗結果進行學習驗證

鹼蝕失重(%)預測準確度(R²)提升至0.95

廠商提出耐化性規格

材料配比、製程條件快速預測

進行實驗驗證與微調協助水五金閥片廠商

快速取得工業等級耐

化性產品，提升產值 20倍。

預測模型建立後，可快速協助廠商進行配方與製程參數設計

以25組實驗結果進行學習驗證 酸蝕失重(%)預測準確度(R²)為0.95 鹼蝕失重(%)預測準確度(R²)為0.93 配

方範圍 專家知識 小

+

隨機森林法 +

Outside testing

(77)

R

²

= 0.88

機器學習案例-材料配方特性預測

-以樹脂配方導熱係數為例-

• 高導熱樹脂配方(Epoxy環氧樹脂、Amine硬化劑)多以實驗試誤法進行開發，耗時費力。

• 以隨機森林法(RF)針對既有53組樹脂配方建立樹脂配方導熱係數預測模型。

• 未來可以透過材料模擬設計新結構，結合預測模型快速預測配方之導熱係數，大幅降低實驗成本與開發時間。

• Outside Testing R²：0.88

純材料物性(輸入特徵)

硬化劑

• 比熱容

• 密度

• 導熱係數

• HOMO/LUMO 環氧樹脂

• 比熱容

• 密度

• 導熱係數

• HOMO/LUMO

新材料結構設計

材料物性模擬

配方導熱係數預測

(不易由材料模擬取得)

實驗驗證

預測模型建立後，加速配方開發

(78)

各層平均鋪銅率

• 即時熱翹曲率預測

• 解決方案建議-調整各層 鋪銅率，降低熱翹曲率。

機器學習案例-產品特性預測

-以PCB異質多層結構快速預測為例-

• 2016年台灣PCB產值達5656億。電路電性設計完後之PCB板經壓合、電鍍、蝕刻後如發生變形，

翹曲，將造成後續例如鑽孔對位、機台進出等製程的困難。傳統力學模擬流程耗時過久，工廠端無法即時評估因PCB各層材料性質所導致的變形。

• 我們以力學模擬(FEA)建立800組數據並結合類神經網路(ANN)，建立客製化快速預測(10~30min) 模型與工具，協助工廠端在製程開始前即可預估熱翹曲，以決定是否調整材料或鍍銅圖案配置。

PCB多層板 熱翹曲

各材料CTE差異 各層鋪銅率差異

Х 實務：無法預測製程與翹曲率關係

 FEA模擬：耗時以月計

 AI客製化工具：僅需數分鐘~小時

異質多層結構影響參數：

各層平均鋪銅率

壓合顯影蝕刻壓合顯影蝕刻/雷鑽孔電鍍綠漆

材料組成

玻布鍍銅樹脂增層綠漆

客製化預測工具

內層板 增層製程(重複製程)

(79)

(80)

(81)

(82)

(83)

(84)

107年度科技部工程司

「智慧仿生材料與數位設計平台」

專案研究計畫

(85)

五、執行規劃與推動方式

 從仿生材料研究導入數位基因平台，應用於關鍵產業開發：

(1)節能/儲能/環保應用技術；(2)功能輔助及醫療元件應用 (生技醫藥) (3)光電/通訊關鍵元件技術, (4)輕量化高強度結構元件

B-1. 仿生材料與數位基因技術整合應用開發

B-2. 仿生材料的基因工程、先進製程技術、及分析檢測技術

 從仿生材料設計及製程，導入材料數位基因平台模擬運算及人工智慧能力：

建構仿生材料資料庫與提供材料選擇及設計與分析之預測能力

(86)

Future Outlook

Ubiquitous: literature search, publication and grant review Opportunity in research and applications: new queries

AI Education

Concerted efforts

(87)

Epistemology

Polanyi’s Paradox: ‘We can know more than we can tell.’

老子: 「道可道，非常道；名可名，

非常名。」

(88)

人工智慧對科技發展 的影響與挑戰 陳力俊