2011年12月25日星期日

MICRO'11: Parallel Application Memory Scheduling

This paper proposes to use run time system and compiler to find out the most contended lock and then the set of threads that may hold this lock, such that these threads may be prioritized in accessing memory.

For the barrier mechanism, a similar approach is also proposed.


2011年12月24日星期六

FMCAD'11: Effective Word-Level Interpolation for Software Verification

This paper proposes a novel approach to generate interpolant for BV using a layered structure, which includes 4 layers: the bit blasting , EUF, Equality Substitution
and linear arithmetic.



FMCAD'11: IC3: Where Monolithic and Incremental Meet

The authors combines their incremental approach with old monolithic approach that overapproximates the reachability on longer and longer state sequences.


2011年12月23日星期五

ASPLOS'10: Conservation Cores: Reducing the Energy of Mature Computations

The dark silicon problem is attacked by this paper again. It states that with the silicon process advancement, the number of devices will double every generation, but the power budget remain unchanged, while the voltage and threshold voltage remain unchanged.

So the percentage of devices that can work at full speed will reduced from a generation to the next. So this paper proposes to synthesize specialized cores to run those hot code segments.


HPCA'10: Aérgia: Exploiting Packet Latency Slack in On-Chip Networks

This paper proposes to use slack, the number of cycle that a package can be delayed without affecting the NOC performance, to arbitrate which package in contention should be forwarded.

And this paper also proposes approaches to approximate the slack by correlate it with i) the number of precedence L2 cache miss package ii) whether the package is itself a L2 miss and iii) the hop of this package.




2011年12月22日星期四

HPCA'03 : Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors

It mentions a very interesting technique, when the instruction window is blocked by a long run instruction, such as a memory access, the processor will check point current state and turn to a runahead mode, in which the processor run all following instructions but not commit their result.

When the long run instruction return back, the processor will return to the check pointed state and rerun all following instructions.

This technique can make all the data referred by following instructions to be fetched into cache before they are needed.

2011年12月14日星期三

OSDI'10: FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

Current synchronous system call can evict large amount of cache and TLB. So frequently calling system calls will damage the performance.

This paper proposes to use an agent to hold all system call request, and halt the calling process or thread. When a certain number of calls are waiting or all process/threads are halting, then switch to the kernel mode to process all requests.


2011年12月2日星期五

MICRO'11: Q S C ORES: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores

This paper analysizes the program benchmarks to find out hot fragment that are executed frequently, and then find out their common flow graph. And the use these commoness to synthesize new specialized core and integrated them with a general purpose processor.


MICRO'11: Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies

This paper proposes a hierarchical framework for cache coherent protocol, such that it can be designed in a multi-layer way.

Thus, this approach can avoid the design of a global single layer protocol that is neither salable nor power efficient.

MICRO'11:PACMan: Prefetch-Aware Cache Management for High Performance Caching

This paper proposes a novel cache that knows the existence of prefetch, and deal it in a different way than normal data request.


2011年12月1日星期四

MICRO'11:Idempotent Processor Architecture

This paper proposes to use compiler to construct Idempotent region, a list of instructions that can reexecute again without obtaining different results.

And it also proposes a hardware structure that can execute such code. In this way, it can handle mispredication and exception cheaply.