This paper proposes a finer grain memory protection mechanism that work at cache line level. It reside on the L1 cache 1 miss path, thus prevent it from slow down the processor pipeline and consume power very cycle.
2012年1月25日星期三
2012年1月24日星期二
ISCA'11 : The Role of Optics in Future High Radix Switch Design
This paper shows some dark future of the electrical switcher, and proposes to use optical switcher.
We need to further reread of its content when we need optical serdes.
We need to further reread of its content when we need optical serdes.
ISCA'11: Dark Silicon and the End of Multicore Scaling
This paper presents a dark future of the multi core methodology, that it will end within 9 years due to the power and utilization wall.
ISCA'11 : SpecTLB: A Mechanism for Speculative Address Translation
This paper proposes to parallel walking the page table and predicate the address translation result with interpolant, such that the translation latency can be hidden.
ISCA'11 : A Case for Globally Shared-Medium On-Chip Interconnect
This paper presents a transmission line link design with standard CMOS implementation, at 26.4Gb/s. It is very impressive and we may need to refer to it latter.
But the diff wires are also similar to what I seen before in serdes reference clocks, can I use them ?
But the diff wires are also similar to what I seen before in serdes reference clocks, can I use them ?
2012年1月18日星期三
ISCA'11 : Releasing Efficient Beta Cores to Market Early
This paper is very interesting that it can run a simple and slow but correct core with a complex, fast but buggy core together.
They check each other, if not match, the simple core is invoked.
ISCA'11 : FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes
Blocked-execution processor continuously run atomic blocks of instructions — also called Chunks. Larger chunk may lead to frequently contention, and lost performance.
This paper proposes an automatic algorithm to remove the contention.
2012年1月17日星期二
ISCA'11 : OUTRIDER: Efficient Memory Latency Tolerance with Decoupled Strands
This paper uses compiler to separate the instruction stream into several strands, some of them are memory accessing, others are memory consuming. Thus torelants long memory latency without huge hardware overhead like OOO.
ISCA'11: Increasing the Effectiveness of Directory Caches by Deactivating Coherence for Private Memory Blocks
This paper proposes to dynamically detect the memory that can only be accessed by a core, and prevent them from being coherented.
ISCA'11 : FabScalar: Composing Synthesizable RTL Designs of Arbitrary Cores within a Canonical Superscalar Template
This paper proposes to generate superscalar processor from templates and stages with different width and depth.
ISCA'11: CRIB: Consolidated Rename, Issue, and Bypass
Conventional high-performance processors use complex logic structure to deal with register rename, instruction schedule and so on jobs, only to make effecently use of heavily pipelined ALU and memory ports. This leads to huge dynamic power consumption.
This paper proposes a processor with lots of simple computation components-- CRIB, and make the computation happen in place instead of been scheduled by complex logic.
订阅:
博文 (Atom)