行政院國家科學委員會專題研究計畫 成果報告
晶片系統溫度管理架構設計之矽智財研究
計畫類別: 個別型計畫
計畫編號: NSC92-2215-E-009-042-
執行期間: 92 年 01 月 01 日至 92 年 07 月 31 日
執行單位: 國立交通大學電信工程學系
計畫主持人: 闕河鳴
計畫參與人員: 張祐誠、何錫錡、王志軒、張智閔、劉明治、詹謹鴻、吳書豪、
林璟輝
報告類型: 精簡報告
處理方式: 本計畫可公開查詢
中 華 民 國 92 年 10 月 24 日
行政院國家科學委員會專題研究計畫 成果報告
晶片系統溫度管理架構設計之矽智財研究
計畫類別: 個別型計畫
計畫編號:
92-2215-E-009-042-執行期間:
92 年01 月01 日至92 年07 月31 日
執行單位: 國立交通大學電信工程研究所
計畫主持人: 闕河鳴
計畫參與人員: 張祐誠、何錫錡、王志軒、張智閔、劉明治、詹謹鴻
、吳書豪、林璟輝
報告類型: 精簡報告
處理方式: 本計畫可公開查詢
2003 年 10 月 22 日
晶片系統溫度管理架構設計之矽智財研究
計畫編號:92-2215-E-009-042-
執行期限: 2003年01月01日 至 2003年07月31日
主持人:闕河鳴 國立交通大學電信工程學系
一、中文摘要
隨者超大型積體電路的發展,電路密度與系統時脈逐步提升,產生區域性過熱的問題,
造成積體電路上時脈不同步、電路參數不協調等參數的局部變化,進而使整個系統崩潰。
對於現代的微處理器、晶片系統設計及混頻積體電路來說,此效應已成為系統設計的主要
限制。因此 SoC 的溫度效應分析與其管理機制已為晶片系統時代的一個相當值得研究課
題。基於上述的理由,本計畫即針對 SoC 的溫度管理,開發關鍵性的元件及電路設計,以
提供一個完整的溫度管理系統給現今的 SoC 設計平台使用,在此年度的計畫中,我們完成
了溫度管理架構、SoC 介面定義及設計、溫度感測器設計等,相關設計陸續透過 CIC 下線
中,而部分成果已發表在國際的期刊與研討會。
關鍵詞:晶片系統設計、溫度效應、溫度管理、積體電路設計
Abstract
Increases in circuit density and clock speed in modern VLSI designs have brought thermal
issues into the spotlight of high-speed integrated circuit design. Local overheating in one spot
of a high-density circuit, such as a CPU or high-speed mixed-signal circuit, can cause a whole
system to crash. Clock synchronization problems, parameter mismatching and other
coefficient changes due to temperature gradients generated by uneven heat-up of on-chip
circuitry are the major reasons for system failure. The early stage of this project completely
characterized the local heat-up problem in system-on-chip designs. The impact of temperature
gradients on circuit behavior is evaluated. A systematic solution to thermal management is
proposed. Instead of worst-case thermal management used in conventional systems, this
design targets nominal power dissipation and requires the system to actively manage its
thermal activity, including monitoring thermal activity and reacting to specified conditions
through the control of cooling mechanisms to ensure operation within specification. This work
includes the design and implementation of circuits and architectures of several building blocks
for SoC thermal managements. They are Thermal management architecture, system
management bus interface for SoC platform design, on-chip temperature sensor design for
deep sub-micron technology. An intellectual property for thermal management is proposed
and integrated to modern SOC CAD flows. The success of this project offers an opportunity
for modern system-on-chip designs to incorporate thermal management techniques to enhance
system stability and performance. This design yields intricate control and optimal
management with little system overhead and minimum hardware requirements, as well as
provides the flexibility to support different management algorithms.
二、計畫緣由及目的
隨者超大型積體電路的發展,電路密度與系統時脈逐步提升,產生區域性過熱的問題,
造成積體電路上時脈不同步、電路參數不協調等參數的局部變化,進而使整個系統崩潰。
對於現代的微處理器、晶片系統設計及混頻積體電路來說,此效應已成為系統設計的主要
限制。因此 SOC 的溫度效應分析與其管理機制已為晶片系統時代的一個相當值得研究課
題。
在本研究的先期,我們對積體電路上的溫度效應及其對電路參數的影響作一完整分
析,然後再對 SoC 設計的流程中,提出一個針對溫度管理的系統性方法,並將此方法落實
為晶片架構上的子系統架構設計。之後,本研究將會針對現今的電腦系統及系統晶片設計
的介面(如系統管理匯流排 system management bus)作一整合,使的成果成為一個準矽智財
(prototype IP)。而晶片系統及電腦系統的設計者即可在其設計流程中非常容易的將此矽智財
整合至最終的設計中。
此設計將著重於如何在系統中使用最少的資源(電路複雜度、佈局面積、輸出入埠的要
求及增加的功率消耗)來達到對系統溫度及溫差的即時偵測,並對於即時的溫度事件(局部
過熱或局部溫差過大)作有效的回應。有別於傳統的低限度溫度管理方式(緊急關機以保護
系統),本研究將能提供晶片系統的設計者,使用最少的系統資源來達到系統的溫度及溫差
的有效管理,而系統的穩定度及效能也能因此而提昇。
本計畫的執行分為兩大部分,第一部份為溫度管理架構與 SoC 介面之設計與實做,我
們針對前述分析的結果,完成溫度管理架構設計,並針對選定的系統管理匯流排,整合溫
度管理架構設計為一個軟體矽智財,並將此軟體矽智財獨立設計成一個測試晶片以驗證其
功能。第二部分則是深次微米製成的溫度感測器部分,由於一個完整的溫度管理系統必須
配合晶片上的溫度感測器方能有效運作,由於溫度感測器的電路在 0.6um 以下製程實現有
其困難度,因此本計畫及針對 0.25um 製程設計溫度感測器的電路,使其成為硬體矽智財。
配合第一級第二不份的成果,本計畫即可提供完整的溫度管理系統給現今的 SoC 設計使用。
本報告的第三部份將討論本計畫的研究方法及成果,第四部份則為結論與討論,附件
則為本計畫針對溫度管理系統所發表的期刊論文,實做部分則陸續下線整理終將發表於其
他的國際期刊或會議。
三、研究方法及成果
如第二部分所述,研究方法及成果分為兩個部分,一是溫度管理系統及系統管理匯流排,
本部分及針對上述主題逐一介紹,並在最後呈現兩部分的積體電路設計。
(1) 溫度管理系統
本設計的溫度管力系統見圖一,相關的架構設計見附件以發表的期刊論文,此計畫著力
較多的部分是經由一個標準界面—系統管理匯流排(SMBus)來增加溫度管理系統的相容性,
此匯流排目前廣泛使用在系統、功率和溫度管理元件上;SMBus 是由兩條訊號所組成的一種
匯流排,一條是 SMBCLK(one direction),一條是 SMBDATA(bi-direction)如(圖 1.1),可
讀出與寫入資料,減少外部的腳位和溫度管理系統內部的連接,根據(圖 1.2)與(圖 1.3)的狀態
圖來建立 SMBus 的 Master 裝置與 Slave 裝置,為避免溫度管理系統過熱,因溫度急速升高造
成本身的損毀,所以採取簡單且前瞻的設計,也須考慮電源分佈需均勻,不可有局部過熱的
情形發生。
Watchdog Unit
Temperature Acquire Unit Power Control/Active Cooling Unit
Programmable Unit Multi-level controller (0) Multi-level controller (3) Power Controller Fan Sensors Sensors Temperature (0) Temperature (3) Host Interface Threshold Registers (0~7) Offset Threshold Registers (0~1) Configuration Registers (0~2) Threshold Temperature Monitor Offset Temperature Monitor Power/Cooling Level (0) Power/Cooling Level (3) Temperature Sensor Interface (0) Temperature Sensor Interface (3) Output and Interrupt
Generator
Processor
圖一、晶片系統溫度管理系統之架構設計
圖三、 SMBus Master 裝置狀態圖 圖四、SMBus Slave 裝置狀態圖
根據上述,經由 cell-based 設計流程來整合這個標準介面於溫度管理系統中,其中所使
用的製程為 TSMC 0.25 micron CMOS technology,(圖 1.4)為此系統的模組分工情形,(表 1.1)
為組成各模組的程式行數,此設計使用了四個的 Multi-level Controller 和一個 TMU,一個
SMBbus,其中包括一個 Slave 裝置和一個 Master 裝置,來組成完整的溫度管理系統,如(圖
1.5),經過完整的驗證與模擬形成 Soft IP,可容易的與其他系統做整合,其並達到預期的功能。
top.v
TM_SMBsalve.v SMBmaster.v
mlc.v tmu.v SMBslave.v
表一、模組程式行數
模組名稱
模組數目 程式行數 模組功能
top.v 1
55
System
Integrated
TM_SMBslave.v
1
70
SMBus Slave and TMU Integrated
SMBmaster.v
1
262
SMBus Master Device
SMBslave.v
1
208
SMBus Slave Device
tmu.v
1
470
Thermal Managemet Unit (TMU)
mlc.v 4
20
Multi-level
Control
FAIL RESET CLK CLK RW SCLK FREQDIV_SMBslave OUT_M[7:0] RESET FAN0[7:0] OUT_EN OUT[7:0] INTR SMBDATOUT_M FAN2[7:0] OUT[7:0] IN[7:0] SMBDATIN FAIL_M FAN2 FAN3 RESET EN SENSOR0[7:0] OUT_EN OUT RESET OUT[7:0] EN RW FREQDIV_Muti-level controller FAN0 INTR_OFFS OUT ADDR[2:0] CLK MLC RESET SMBCLKIN_M IN[7:0] FAN3[7:0] CLK ADDR[2:0] INTR SMBmaster RESET MLC SMBCLK RESET OUT_EN_M CLEAN IN_READY SENSOR2[7:0] SMBDATIN_M IN[7:0] RESET_M MLC SMBDATOUT FREQDIV_SMBmaster FAN1 CLK D_IN[7:0] SCLK RESET IN_EN INTR_OFFS SMBDATIN SEN0[7:0] OUT SEN0[7:0] SMBCLKOUT SENSOR3[7:0] CLK CLK D_IN[7:0] SMBCLK SMBCLKIN SENSOR1[7:0] D_IN[7:0] SEN0[7:0] CLK RESET CLK SMBDATIN SMBDATOUT IN[7:0] MLC SEN0[7:0] TMU SMBCLKOUT_M CLK FAN1[7:0] RESET OUT SMBslave CLEAN_M RESET OUT_EN CLK FCLK D_IN[7:0] SMBDATOUT圖六、溫度管理系統之系統整合
(2)溫度感測器
此部分包含.BiCMOS PTAT (Proportional to Absolute Temperature), MOS PTAT, 三角積分
調變器(Oversampling Sigma-Delta Modulator)等三個部分,詳述如下:
a .BiCMOS PTAT (Proportional to Absolute Temperature)
圖七、BiCMOS PTAT 電路圖 圖八、BiCMOS PTAT 加上類比輸出介面的電路圖
(圖 2.1)是一個 PTAT 的架構,由於 M3~M9 是一顆運算放大器,假設此運算放大器理想,則
VD2 = VD1,可得到 Q2 的電流是正比於絕對溫度。再來解釋(圖 2.2)輸出部分,M47 與 M2
是一組電流鏡將 PTAT 電流,複製到輸出端以便量測。另外為了使下一級的 Oversampling Sigma
Delta Modulator 做參考電流用,我們利用 VEB 為負的溫度係數的特性,與 PTAT 正溫度係數
的特性。調整 M43 與 M44 的大小使輸出的電流。其溫度係數為零。
圖九、PTAT 電流輸出 圖十、PTAT 電流所量的溫度誤差
圖九的結果說明了 PTAT 電流與溫度呈現出了非常線性的關係,而圖十說明了由 PTAT
電流所量的的溫度誤差的結果均在 1°C 的範圍之內,而圖十一所呈現出來的參考電壓的變動
範圍也在 24.798uA ~ 24.828uA 之間,也是非常的小。
b .MOS PTAT
圖十二、MOS PTAT(I) 電路圖 圖十三、MOS PTAT(II) 電路圖
圖十二是一個 MOS PTAT 的架構。M1、M2 操作在 weak inversion region,其 I-V 曲線呈現出
指數函數(exponential)特性;M3~M8 組成一 ORA(Operational Transresistance Amplifier),固
定流經 M1、M2 的電流比值。如此一來,可得到跨在電阻上的電壓是正比於絕對溫度。圖十
三是將圖十二中的電阻以 M9~M11 取代,可大幅降低晶片面積。
圖十四、V
PTATV.S. V
DD(R-based)
圖十五、V
PTATV.S. V
DD(all MOS)
(圖十四)
、
(圖十五)分別為(圖十二)
、
(圖十三)中 V
PTAT對 V
DD的模擬結果,從圖中可
看出操作電壓可降到 1.2V;PSRR(Power Supply Rejection Ratio)可達到 50 dB 以上.
(圖十六)
為(圖十二)
、
(圖十三)中 VPTAT
對溫度的模擬結果,圖中電阻架構與 all MOS 架構的模擬結
果幾乎一致,而且呈現出一正比溫度的特性。
c. 三角積分調變器(Oversampling Sigma-Delta Modulator)
Oversampling Sigma-Delta 架構的特色是利用一原先解析度較低的 ADC,透過閉迴路和遠
大於兩倍輸入頻寬的取樣頻率,使的頻帶之內的雜訊被壓縮,進而提高訊雜比(SNR),得以達
到高解析度 ADC 的規格,此架構對電路製程的敏感度遠低於其他架構,因此被大量運用在低
頻高解析度的 ADC 之上。對運用在溫度感測器方面的調變器,一定要在溫度範圍很大的情況
下保持正常工作,且本身也要盡可能的降低功率損耗,避免成為熱源之一。一般而言 8bit 到
10bit 而操作速度低於 1MHz 的規格便已足夠,溫度範圍則約從 0 度到 150 度左右。為達到上
述條件,使用一階三角積分調變是可行的方法之一,其架構圖如圖十七所示,修改的電路為
圖十八。
圖十七、一階三角積分調變器
圖十八、修改後的一階三角積分調變器
整體架構中的子電路,包括運算放大器、離散時間比較器、offset 抵銷電路、取樣電路四
大部分,設計的關鍵則是取樣電路的工作頻率、積分器的濾波能力、迴授電路的解析度三方
面。因此真正要完成一個調變器,必須從系統面找出可實現的規格再去設計子電路。(圖十九)
為 folded cascade 運算放大器的電路圖。(圖二十)為離散時間比較器的電路圖,利用正迴授形
成的 latch 結構,使的操作速度上升,再利用時脈去切換操作的工作模式。
圖十九、 Folded cascade 運算放大器
圖二十、 離散時間比較器
三角積分調變器能將低頻的類比輸入訊號,轉成高頻的數位訊號,兩者的頻率倍率的一半,
即為過取樣比(Oversampling ratio),(圖二十一)是過取樣比為 64 的模擬結果,(圖二十二)是過
取樣比為 256 的模擬結果。
圖二十一、過取樣比為 64 的模擬結果
圖二十二、過取樣比為 128 的模擬結果
(3)溫度管理系統與溫度感測器完整佈局圖
圖二十四、溫度感測器佈局圖
表二、溫度管理系統規格表
Thermal management
operation frequency
100 MHz
SMB slave and master
operation frequency
500kHz
Multi-level controller
operation frequency
10kHz
Technology
TSMC 0.25um Mixed Signal (1P5M) CMOS
Power Consumption
10mW
Transistor/Gate Count
152340.484375/17.28 = 8816
Chip Area (
µ
m
2)
Total: 1535 x 1535
Pins
Total: 104 pins
DC Power: 21 pins (Core power)
AC Power: 11 pins (Pad Power)
System signals: 72 pins
(1) TM and SMBslave (2) SMBmaster
input: 39 pins input: 13 pins
output: 7 pins output: 13pins
表三、 溫度感測器規格表
Operation frequency
50 MHz
溫度誤差
1 degree C
Power consumption
3.3mw
Chip Area (
µ
m
2)
Total: 1000x 900
Pins
Total: 34 pins
DC Power: 8 pins
AC Power: 6 pins
Output :20 pins
Package Type
40 S/B
四、結論與討論
本計畫已順利完成各項預期工作項目。其中部分研究成果已被國外期刊發表的有一篇、
已投稿國外會議的有 1 篇,參與人員並完成兩篇碩士論文;其他部分仍陸續整理投稿於國際
會議和期刊中。以發表之期刊論文請見附件。
五、參考文獻
1. Herming Chiueh, Jeffrey Draper, and John Choma, Jr., “A Dynamic Thermal Management
Circuit for System-on-Chip Designs,” Analog Integrated Circuits and Signal Processing, Vol 36,
pp 175-181, 2003.
2. 張佑誠, “Design and Implementation of Interface Circuits for Thermal Management Systems”,
Master Thesis, Department of Communication Engineering, National Chiao Tung University,
Hsin-Chu, Taiwan, 2003.
3. 何錫錡, “A Fully Integrated Multi-Level Controller for System-on-Chip Thermal Management
Designs”, Master Thesis, Department of Communication Engineering, National Chiao Tung
University, Hsin-Chu, Taiwan, 2003.
c
2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
A Dynamic Thermal Management Circuit for System-On-Chip Designs
∗HERMING CHIUEH,1,†JEFFREY DRAPER2 AND JOHN CHOMA, JR.2
1Department of Communication Engineering, National Chiao Tung University, HsinChu 30050, Taiwan 2Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA
E-mail: [email protected]; [email protected]; [email protected]
Received April 27, 2002; Revised December 12, 2002; Accepted January 25, 2003
Abstract. A novel fully integrated dynamic thermal management circuit for system-on-chip design is proposed. Instead of worst-case thermal management used in conventional systems, this design yields continual monitoring of thermal activity and reacts to specified conditions. With the above system, we are able to incorporate on-chip power/speed modulation and integrated multi-stage fan controllers, which allows us to achieve nominal power dissipation and ensure operation within specification. Both architecture and circuitry are optimized for modern system-on-chip designs. This design yields intricate control and optimal mangement with little system overhead and minimum hardware requirements, as well as provides the flexibility to support different thermal mangement algorithms.
Key Words: thermal management, system-on-chip, VLSI system design
1. Introduction
Increases in circuit density and clock speed in mod-ern VLSI designs have brought thermal issues into the spotlight of high-speed integrated circuit design. Local overheating [1] in one spot of a high-density circuit, such as CPUs and high-speed mixed-signal circuits, can cause a whole system to crash due to resulting clock synchronization problems, parameter mismatches or other coefficient changes due to the uneven heat-up on a single chip [2].
Passive heat dissipation mechanisms, such as heat sinks and fans, are widely used in system design. Re-cently, advanced computer systems and circuit designs have incorporated active mechanisms to detect and properly handle an over-heating event [3]. Such a capa-bility guarantees the system will operate within a cer-tain package temperature specification to avoid failure. The ACPI (Advanced Configuration and Power Inter-face) standard is an example specification for active
∗This research was supported by DARPA contract F30602-98-2-0180, USA and by National Science Council grant NSC-92-2215-E-009-042-, Taiwan.
†Author to whom correspondence should be addressed. Tel.:
+886-35-712121 ext. 54597,+996-918422677, Fax: +886-35-5710116.
power and thermal management in personal computer systems [4,5]. However, the ACPI standard is quite lim-ited, as it simply supports extra control to turn on or off a cooling mechanism and shift the alert level that is fed back to the system.
As die size and power density increase in this syson-chip (SoC) era, the management of package tem-perature is no longer sufficient to solve the problem. Uneven heat-up and temperature offset on chip [1,6,7] has become a major factor and limits the system per-formance. A good example is the recent Intel recall on Pentium III 1.13 GHz CPUs [8–10]. Recent research has focused on predicting on-chip temperature offset [1] and electro-thermal simulations [7,11,12] to pro-vide thermal distribution information to circuit simula-tion to achieve more accurate circuit behavior predic-tions. Such research is valuable for circuit design, but post-fabrication approaches to addressing on-chip tem-perature offsets are also needed as die size and power density increase. Without such an approach, some circuit behavior becomes unacceptable, which makes management and control of on-chip temperature offset as important as the reduction of package temperature.
In this research, we propose a dynamic thermal man-agement circuit to provide a watchdog for system-on-chip designs. This circuit is optimized in architecture
and circuit implementation to fit system-on-chip de-signs. The following items describe the technical justi-fication of the thermal management design for SoC that we take into consideration. First, since an on-chip mon-itoring mechanism is included, complicated electro-thermal or numerical electro-thermal simulation [7,11,12] can be omitted. However, an analytical model [1,6,7] pro-viding sufficient information like temperature range and quality guidelines for circuit designers is bene-ficial. Second, architecture and circuit implementation will be constrained to be compatible with the system’s process (most likely to be a digital process), and min-imum extra system resources should be used. There-fore, an interrupt-based system will be implemented, and reprogramming to provide flexibility and simplify the architecture will be supported. Finally, with respect to system integration, a complex cooling system that requires extra processing steps is not chosen, although this proposed system has the potential to cooperate with such novel micro-machining cooling methods [13]. In-stead, a pure digital design for a fan controller is attrac-tive if the circuit block is small enough. Target systems with power management can take advantage of such a cooling mechanism when combined with thermal man-agement systems.
Given the above considerations, a circuit based on our previous research [14–16] with significant architec-ture enhancements is proposed. Those enhancements are described as follows. First, the number of tempera-ture sensors has been increased to fit the need of more complex systems to monitor the temperature in several locations throughout the system. Second, the updated architecture provides simultaneous monitoring of mul-tiple temperature sensors instead of the previous ap-proach of single-sensor monitoring at a time. Third, circuits to monitor temperature offset between sensors and thresholds for interrupts that provide alerts other than package temperature have been added. Fourth, the threshold values have been expanded to have upper and lower limits for each sensor in order to achieve a more robust monitoring capability. Last, we have integrated a multi-channel, multi-stage fan controller, which we developed as an active cooling mechanism for main-taining a consistent package temperature.
These enhancements are aimed at solving thermal problems specific to SoC designs. The proposed ther-mal management subcomponents are encapsulated into a single IP block to foster use by the SoC market, in which the IP-based design approach has become very popular. The resulting discrete IP block facilitates
verification of the architecture and thermal manage-ment algorithm through small low-cost test proto-types without compromising the applicability of the approach to SoC designs.
In Section 2, the function and architecture of the Thermal Management Circuit are described. In Section 3, the implementation plan of this system is addressed. A summary and conclusion follow in Section 4.
2. Function and Architecture
2.1. Architecture
The architecture of the thermal management circuitry is divided into two portions: the thermal management circuit blocks and the system integration blocks. The former represent the designed thermal management system, and the latter represent the interface to the tar-get system. The designed thermal management system could be applied to different SoC designs. However, the system integration portion is modified to fit different target systems as well as prototyping implementations. The block diagram of the dynamic thermal agement circuit is shown in Fig. 1. The thermal man-agement circuit blocks are the white boxes with shad-ows; the gray boxes represent the system integration blocks. The function of every block is described in Section 2.1.1 and Section 2.1.2.
2.1.1. Thermal Management Circuit Blocks
• Temperature Acquisition Unit: This unit is simply an interface to acquire temperature from sensors. This circuit could be very different when applied to
Tem perature Acquisition Unit Active C ooling Unit: Fan C ontrollers Program m able Unit: R egisters and M asks Output and Interrupt Generator CPU /S ystem Tem perature Sensors Program m able W atchdog Unit Active C ooling Unit: System Speed Controller
different temperature sensors. In our prototype sys-tem, a commercial temperature sensor with one-bit serial output will be used. The major function of this circuit is to convert and latch the temperature input to parallel digital values. In our prototype design, four sensors with 16-bit precision are supported. Sensor placement will be optimized by application of the developed analytical model [1,6] to compensate for the temperature offset between the heat source and sensor. Thus, the same analytical model will also predict the maximum reading error with respect to the highest junction temperature. These offsets will be processed in the Temperature Acquisition Unit in order to provide a complete thermal analysis of the target to the thermal management system.
• Programmable Unit: This unit contains 8 threshold registers to program the high and low threshold val-ues for each temperature sensor. Two threshold reg-isters for upper and lower bounds are provided for offset temperatures between different sensors. With these threshold values, the watchdog unit can gener-ate interrupts for desired situations. Three fan-speed registers provide the setup for the integrated fan con-trollers. Interrupt mask and offset mask registers in-dicate which interrupts should be enabled and which set of temperature sensors should be included for offset temperature monitoring. Finally, decoding cir-cuitry and necessary configuration registers provide the communication signals between the processor and other circuit blocks.
• Watchdog Unit: This unit contains two monitoring circuits: the threshold monitor for each temperature sensor, and the offset temperature monitor. Both cir-cuits are designed to minimize circuit area while viding sufficient speed to compare the sensors pro-vided in the system.
• Output and Interrupt Generator: This unit provides data outputs that are read by the system CPU, like temperature value, offset temperature value, and in-terrupt types.
• Active Cooling Unit-Fan Controller: There are two active cooling units: the integrated fan controller and the system speed controller provided by the system. The integrated fan controller circuit is based on our previous pure-digital fully integrated design [15]. 2.1.2. System Integration Blocks
• Temperature Sensors: Many kinds of temperature sensors can be used in this design. Our previous
on-chip temperature sensor design is one option for system-on-chip design. For the prototype, we are us-ing a commercial part with a system management bus interface [17].
• CPU/System: For pure system-on-chip design, the thermal management circuitry should be directly mapped into a CPU special-purpose register and in-terrupt space. In this prototype design, a memory-mapped approach will be implemented to emulate the proposed architecture. This approach also sup-ports a flexible off-chip hardware and software plat-form for testing the circuit.
• Active Cooling Unit-System Speed Controller: For complete dynamic thermal management systems, the processor should be able to use the offset temperature data to tune the speed of different execution units to maintain the offset temperature within specification. Tradeoffs for slowing down some execution units are necessary in a critical temperature situation to pre-vent system failure. The mechanisms provided in the SoC implementation or processors should cooperate with this circuit to provide the function of managing the offset temperature.
2.2. Operating Modes
The operation of the thermal management system can be divided into three modes from the point of view of the processor. They are the programming, data acquisi-tion, and interrupt modes. Each mode requires different timing and data order definitions, which will be im-plemented in the programmable unit of the system. The basic functionality of each mode is described below.
• Programming Mode: This mode provides the func-tion to program the threshold registers for tempera-ture sensors and offset temperatempera-tures, mask registers for interrupt and offset temperature monitoring, and fan stage assignments for the integrated fan con-trollers. To conserve address space, the multiple temperature sensor registers will be mapped to the same address, with the configuration register con-tents specifying which set is actually being accessed. • Data Acquisition Mode: This mode provides the ca-pability for the processor to read data and status from the thermal management circuit in a polling fashion. Information like current temperatures, offset temper-atures, setups, and interrupt status can be acquired by
the processor as often as it wishes to flexibly support different thermal management algorithms.
• Interrupt Mode: Interrupts are provided for desig-nated alert conditions, and interrupt type information is also provided when the interrupt service routine reads the interrupt type register.
2.3. System Integration
The thermal management circuit architecture for an SoC design is proposed in the previous sections. How-ever to prove the validity of the complete architecture, some attention to detail must be given to the system integration blocks (gray boxes in Fig. 1). The tech-nical decision and justification for these blocks are given here with a detailed implementation following in Section 3.
To reduce circuit complexity and die size in this area-constrained prototype chip, an off-the-shelf processor with conventional interface signals will be used for the prototype of the SoC design. In our previous design, a PowerPC interface was implemented [14], but in this prototype, a simpler memory-mapped interface will be implemented for more flexible hardware/software sup-port. The basic function and architecture of the ther-mal management system are not affected since only system integration portions of the proposed designed have been modified due to the prototyping limitations as discussed in Section 2.
Instead of using on-chip temperature sensors from previous research [17], we will use external commer-cial parts for monitoring temperatures. Although on-chip sensors provide direct temperature readout with-out constraining the data transfer protocol due to pin limitations, as the number of sensors increases, the die size limit makes using on-chip sensors impractical for this prototype. Furthermore, the interface to the exter-nal sensors requires very few pins, and the sensors are not the focus of this prototype.
3. Prototype Implementation
A prototype implementation of the proposed design is presented in this section. Due to the limitations of an area-constrained prototype TinyChip [18] and cost of integrating the proposed IP to a complete SoC de-sign, the prototype thermal management system is im-plemented separately from the processor (computer system). In Fig. 2, a detailed block diagram of a
prototype design is presented. Block diagrams of the offset temperature monitor and threshold temperature monitor are shown in Figs. 3 and 4. Both designs achieve minimum area with sufficient speed to respond to system temperature changes.
As shown in Figs. 2–4, the proposed SoC IP is im-plemented in a discrete fashion while adhering to the IP-based design methodology. Using this approach, the proposed thermal management architecture is ver-ified using the external temperature sensors and cool-ing mechanisms; such parts and processors are often treated as “hard” IPs in modern SoC design flows. Once the architecture and management algorithms are veri-fied, this design can be easily integrated to IP-based platform design flows. The following remarks address the compatibility of the prototype implementation with the proposed architecture:
• The SoC computer system will be replaced with a hybrid design consisting of a commercial processor and a prototype TinyChip. Since the proposed de-sign requires a special register and address mapping in the processor, a bus interface circuit between the Thermal Management Unit and processor is imple-mented to replace the special register and address mapping. The signal assignment between bus inter-face and Thermal Management Unit will accurately reflect the proposed design.
• Since the target temperature reading is on the proces-sor, a matching hybrid temperature sensor part for the processor is used to replace the on-chip one. This sit-uation introduces an extra System Mangement Bus Interface [17] between the Thermal Management Unit and temperature sensor used. This mechanism provides the ability to measure the target proces-sor’s temperature and does not impede the concept of the proposed design since the on-chip temper-ature sensor is implemented in our previous re-search [19], matching the qualification defined in Section 2.
• Even with the added blocks and replacement parts needed for an initial prototype implementation, the signal assignment and design specification is still valid for SoC design. The prototype implementation is simply used to verify the architecture as described in Section 2.
• In both designs of offset temperature monitor and threshold temperature monitor, the speed of the tem-perature sensors and speed of the programmable unit have been defined to use a single comparator circuit
Watchdog Unit
Temperature Acquire Unit PowerControl/ActiveCoolingUnit Programmable Unit Multi-levelcontroller (0) Multi-levelcontroller (3) Power Controller Fan Sensors Sensors Temperature (0) Temperature (3) Host Interface Threshold Registers (0~7) Offset Threshold Registers (0~1) Configuration Registers (0~2) Threshold Temperature Monitor Offset Temperature Monitor Power/Cooling Level (0) Power/Cooling Level (3) Temperature Sensor Interface (0) Temperature Sensor Interface (3) Outputand Interrupt Generator Processor
Fig. 2. Detailed block diagram of thermal management system.
Fig. 3. Offset temperature monitor.
to monitor multiple sensors using serial I/O, thus an implementation of more then 4 channels in this design can be done with very little extra circuitry. With this sample prototype design, the proposed thermal management system for SoC design can be
Fig. 4. Threshold temperature monitor.
easily verified. Also, different approaches for a thermal management system can be easily implemented with the proposed architecture, since this system provides flexible ways for systems to read the temperature, set
the threshold value for interrupt generation, and mea-sure temperature values from different sensors. This design can be used to implement but is not limited to the ACPI protocol. For instance, the temperature threshold can be set to any number of values to rep-resent any number of critical situations. Fuzzy logic control and other algorithms requiring more levels of alerts can be applied. Also with the capability of ac-tively acquiring temperature measures at any time, the CPU can verify a desired temperature response when it exectues a cooling action. With this feedback, actions like increase/reduce FAN speed and clock rates can be applied for more compex management algorithms.
4. Conclusion
A novel fully integrated dynamic thermal manage-ment circuit for system-on-chip design has been de-scribed. The architecture and design detail with its jus-tification, as well as the final system integration for a complete thermal management system for SoC de-sign was presented. The innovative temperature off-set monitoring provides a mechanism for system-on-chip designs to monitor the temperature offset across the system and enhance stability. With proper han-dling of this information, the system not only prevents failure but also enhances performance by controlling each subcomponent’s operation speed with feedback from thermal information. With minimum overhead in chip area and system resources, this design provides intricate control and optimal thermal management on chip, upon which a complete dynamic thermal man-agement system for modern computer designs can be implemented.
References
1. H. Chiueh, J. Draper, L. Luh, and J. Choma Jr., “A thermal eval-uation of integrated circuits: On chip offset temperature mea-surement and modeling,” in Proc. 2nd Internationl Workshop
on Design of Mixed-Mode Integrated Circuits and Applications,
1998, pp. 109–113.
2. V. Szekely, M. Rencz, and B. Courtois, “Thermal testing meth-ods to increase system reliability,” in Proc. 13th IEEE
SEMI-THERM Symposium, 1997, pp. 210–217.
3. J. Draper, J. Block, J. Koller, and C. Steele, “Thermal man-agement in embedded systems using MEMS,” in Proc. Lecture
Notes in Computer Science 1388 (IPPS/SPDP’98 Workshops Proceedings), 1998, pp. 900–901.
4. Compaq, Intel, Microsoft, Phoenix, and Toshiba, “Advanced configuration and power interface specification,” July 27, 2000.
5. J. Steele, “ACPI thermal sensing and control in the PC,” in Proc.
Wescon 98, 1998, pp. 169–182.
6. H. Chiueh, J. Draper, L. Luh, and J. Choma Jr., “A novel model for on-chip heat dissipation,” in Proc. The 1998 IEEE
Asia-Pacific Conference on Circuits and Systems, 1998, pp. 779–782.
7. C.-H. Tsai and S.-M. Kang, “Substrate thermal model reduction for efficient transient electrothermal simulation,” in Proc. 2000
Southwest Sympoium on Mixed-Signal Designs, 2000, pp. 185–
190.
8. I. Fried, “Glitch prompts Intel to recall 1.13-GHz Pentiums,” http://news.cnet.com.
9. I. Fried, “Hardware sites help Intel isolate chip problem,” http://news.cnet.com.
10. S. Musil, “The week in review: Intel hits a speed bump,” http://news.cnet.com.
11. J. W. Sofia, “Analysis of thermal transient data with synthesized dynamic models for semiconductor devices,” in Proc. 10th IEEE
SEMI-THERM, 1994, pp. 78–85.
12. V. Szekely, A. Poppe, A. Pahi, A. Csendes, G. Hajas, and M. Rencz, “Electro-thermal and logic-thermal simulation of VLSI designs.” IEEE Transactions on VLSI Systems 5, pp. 258–269, 1997.
13. Goodson, Santiago, T. W. Kenny, Carruthers, and Towe, “Elec-trokinetic micro coolers,” presented at International
Intercon-nect Technology Conference, San Francisco, CA, 2000.
14. H. Chiueh, J. Draper, and J. Choma, Jr., “A programmable ther-mal management interface circuit for powerPC systems,” in
Proc. 6th International Workshop on Thermal Investigation of ICs and Systems, 2000.
15. H. Chiueh, L. Luh, J. Draper, and J. Choma Jr., “A novel fully in-tergrated fan controller for advanced computer systems,” in Proc.
Southwest Symposium on Mixed-Signal Design, 2000, pp. 191–
194.
16. H. Chiueh, J. Draper, and J. Choma Jr., “Implementation of a temperature monitor interface circuit for powerPC systems,” in
Proc. The 43rd Midwest Symposium on Circuits and Systems,
2000.
17. SBS Implementers Forum, “System management bus (SMBus) specification,” August 3, 2000.
18. MOSIS. http://www.mosis.com.
19. L. Luh, J. Choma Jr., J. Draper, and H. Chiueh, “A high-speed CMOS on-chip temperature sensor,” in Proc. European
Solid-State Circuits Conference (ESSCIRC99), 1999, pp. 290–293.
Herming Chiueh received the B.S. degree from the Department of Electrophysics, National Chiao Tung University, Hsin-Chu, Taiwan in 1992, and the
M.S. and Ph.D. degrees from Department of Electrical Engineering, University of Southern California, Los Angeles, U.S. in 1994 and 2002. From 1996–2002, he was with Information Sciences Institute, University of Southern California, Marina del Rey, California, U.S. Currently, he is an Assistant Professor, Department of Communication Engineering, School of Electrical En-gineering and Computer Science, National Chiao Tung University, Hsin-Chu, Taiwan. His research interests include system-on-chip design methodology, thermal management for VLSI, and power-aware integrated circuits.
Jeffrey Draper holds a joint appointment as a Research Assistant Professor in the Department of Electrical Engineering at University of Southern California and a Project Leader in the Computational Sciences Division at USC Information Sciences Insti-tute. Dr. Draper has led the VLSI effort on several large projects in the past 5 years and most recently directed the development of a 55-million transistor processing-in-memory (PIM) chip. Dr. Draper received his Ph.D. in Computer Engineering from the University of Texas at Austin. His research interests are PIM architectures, VLSI, thermal management parallel computer archi-tectures, and interconnection networks.
John Choma earned his B.S., M.S., and Ph.D. de-grees in electrical engineering from the University of
Pittsburgh in 1963, 1965, and 1969, respectively. He is Professor of Electrical Engineering at the University of Southern California, where he teaches undergrad-uate and gradundergrad-uate courses in electrical circuit the-ory and analog integrated electronics. Prof. Choma consults in the areas of broadband analog and high-speed digital integrated circuit analysis, design, and modeling.
Prior to joining the USC faculty in 1980, Prof. Choma was a senior staff design engineer in the TRW Microelectronics Center in Redondo Beach, California. His earlier positions include technical staff at Hewlett-Packard Company in Santa Clara, California, Senior Lecturer in the Graduate Division of the Department of Electrical Engineering of the California Institute of Technology, lectureships at the University of Santa Clara and the University of California at Los Angeles, and a faculty appointment at the University of Pennsylvania.
Prof. Choma, the author or co-author of some 135 journal and conference papers and the presenter of more than sixty invited short courses, seminars, and tu-torials, is the 1994 recipient of the Prize Paper Award from the IEEE Microwave Theory and Techniques So-ciety. He is the author of a Wiley Interscience text on electrical network theory and a forthcoming text on in-tegrated circuit design for communication system ap-plications. Prof. Choma has contributed several chap-ters to five edited electronic circuit texts, and he was an area of editor of the IEEE/CRC Press Handbook of Circuits and Filters.
Prof. Choma has served the IEEE Circuits And Sys-tems Society as a member of its Board of Governors, its Vice President for Administration, and its President. He has been an Associate Editor and Editor ‘In’ Chief of the IEEE Transactions On Circuits And Systems, Part II. He is an Associate Editor of the Journal of Analog Integrated Circuits and Signal Processing and a former Regional Editor of the Journal of Circuits, Systems, and Computers.
A Fellow of the IEEE, Prof. Choma has been awarded the IEEE Millennium medal and has received three awards from the IEEE Circuits and Systems So-ciety; namely, the Golden Jubilee Award, the 1999 Education Award, and the 2000 Meritorious Service Award. He is also the recipient of several local and national teaching awards. Prof. Choma is a Distin-guished Lecturer in the IEEE Circuits And Systems Society.