shengyushen's academic research: 二月 2012

2012年2月19日星期日

CCS'03: Randomized instruction set emulation to disrupt binary code injection attacks

This paper is very similar to the previous one, except that is store the original program on disk, and randomize it while loading it into memory.

And its key is very long, may be as long as the program.

CCS'03:Countering Code-Injection Attacks With Instruction-Set Randomization

This paper proposes an approach to counter the code injection attacks. It stores key for de-randomizing the randomized program. When the OS tries to schedule the program to run, it will load this key into a write-only register in CPU, and the CPU will de-randomize the input instruction stream before the CPU actually process it.

2012年2月18日星期六

ASPLOS'10:Speculative Parallelization Using Software Multi-threaded Transactions

This paper proposes a software transaction memory system that can maintain automic across multiple thread.

ASPLOS'10: COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders

This paper proposes to use the GPU as a programable unit to run a GPU program that help prefetching the data for CPU.

ASPLOS'10: Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive Memories

This paper proposes to add another level of page table that maps the physical address to real address in PCM. This mapping can map a physical page to two PCM page that have different stuck-at fault locations. In this way, the PCM pages can still be used instead of discarded.

ASPLOS'10: Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

This paper proposes mechanism that exploits the relation between the TLBs of different cores in CMP.

ASPLOS'10:Dynamic Filtering: Multi-Purpose Architecture Support for Language Runtime Systems

This paper proposes a hardware mechanism and instrutions to detect the rare cases that checked repeatedly in STM and GC for some address patten.

2012年2月12日星期日

ASPLOS'10: Flexible Architectural Support for Fine-Grain Scheduling

Fine-grain scheduling, which schedule small threads with only thousands of instructions, requires exchanging huge amount of information across multi-level of cache and memory, which leads to about 100 cycles for every scheduling that is unacceptable for such small threads.On the other hand, purely hardware scheduler is not flexiable enough to deploy multiple scheduling algorithm. So this paper proposes a flexiable message exchange mechnism that can be used by software scheduler to avoid the need of crossing multiple level of caches, while preserve the flexiability.

ISCA'10: Necromancer: Enhancing System Throughput by Animating Dead Cores

In manufactoring test, any core that can not pass will be disabled, but they are most correct with only minor bugs. So this paper use them to run ahead and direct a simpler core with useful information, such as branch destination and cache prefetch, thus lead to bettern performance on the simpler core.

ISCA'10: Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

This paper proposes an interesting cache design, in which every cache is divided into private and shared part. The private part is only for the attached core, while the shared part can be alloced to other core, leading to a much larger private cache for other cores.

ISCA'10 : ynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance

This paper proposes to divide the SIMD lanes into two seperate sets on lane Divergence caused by different in branch or memory latency. And then these sets can be executed as threads interleaved.

ISCA'10 : A Case for FAME: FPGA Architecture Model Execution This paper surveies many types of FPGA emulation system.

Direct FAME blow the rtl directly into FPGA, which is currently used by use.
Decoupled FAME use many cycle to simulate a single cycle of a complex device, such as the multi-port reg file.
Multithread use a single pipeline and many state set to simulate many different copy, such as cores.

ISCA'10: Translation Caching: Skip, Don’t Walk (the Page Table)

Every TLB miss may require several dram access, each for a page table level.

MMU caches are used to store those page entries like data cache.

There are many varaints, such as unified one that store all level of entries in one cache, or seperated one that store entries of each level in different cache. Another varaint is that Page Table Cache that indexed by the physical address of the entries, which shows how to find the entries, or translation cache that store the translation result, thus make it indexed by virtual address.

shengyushen's academic research

2012年2月19日星期日

CCS'03: Randomized instruction set emulation to disrupt binary code injection attacks

CCS'03:Countering Code-Injection Attacks With Instruction-Set Randomization

2012年2月18日星期六

ASPLOS'10:Speculative Parallelization Using Software Multi-threaded Transactions

ASPLOS'10: COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders

ASPLOS'10: Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive Memories

ASPLOS'10: Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

ASPLOS'10:Dynamic Filtering: Multi-Purpose Architecture Support for Language Runtime Systems

2012年2月12日星期日

ASPLOS'10: Flexible Architectural Support for Fine-Grain Scheduling

ISCA'10: Necromancer: Enhancing System Throughput by Animating Dead Cores

ISCA'10: Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

ISCA'10 : ynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance

ISCA'10 : A Case for FAME: FPGA Architecture Model Execution This paper surveies many types of FPGA emulation system.

ISCA'10: Translation Caching: Skip, Don’t Walk (the Page Table)

标签

favor site

博客归档

2012年2月19日星期日

2012年2月18日星期六

2012年2月12日星期日

标签

favor site

订阅

博客归档