This paper is very similar to the previous one, except that is store the original program on disk, and randomize it while loading it into memory.
And its key is very long, may be as long as the program.
2012年2月19日星期日
CCS'03:Countering Code-Injection Attacks With Instruction-Set Randomization
This paper proposes an approach to counter the code injection attacks. It stores key for de-randomizing the randomized program. When the OS tries to schedule the program to run, it will load this key into a write-only register in CPU, and the CPU will de-randomize the input instruction stream before the CPU actually process it.
2012年2月18日星期六
ASPLOS'10:Speculative Parallelization Using Software Multi-threaded Transactions
This paper proposes a software transaction memory system that can maintain automic across multiple thread.
ASPLOS'10: COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders
This paper proposes to use the GPU as a programable unit to run a GPU program that help prefetching the data for CPU.
ASPLOS'10: Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive Memories
This paper proposes to add another level of page table that maps the physical address to real address in PCM. This mapping can map a physical page to two PCM page that have different stuck-at fault locations. In this way, the PCM pages can still be used instead of discarded.
ASPLOS'10: Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors
This paper proposes mechanism that exploits the relation between the TLBs of different cores in CMP.
ASPLOS'10:Dynamic Filtering: Multi-Purpose Architecture Support for Language Runtime Systems
This paper proposes a hardware mechanism and instrutions to detect the rare cases that checked repeatedly in STM and GC for some address patten.
2012年2月12日星期日
ASPLOS'10: Flexible Architectural Support for Fine-Grain Scheduling
Fine-grain scheduling, which schedule small threads with only thousands of instructions, requires exchanging huge amount of information across multi-level of cache and memory, which leads to about 100 cycles for every scheduling that is unacceptable for such small threads.On the other hand, purely hardware scheduler is not flexiable enough to deploy multiple scheduling algorithm. So this paper proposes a flexiable message exchange mechnism that can be used by software scheduler to avoid the need of crossing multiple level of caches, while preserve the flexiability.
ISCA'10: Necromancer: Enhancing System Throughput by Animating Dead Cores
In manufactoring test, any core that can not pass will be disabled, but they are most correct with only minor bugs. So this paper use them to run ahead and direct a simpler core with useful information, such as branch destination and cache prefetch, thus lead to bettern performance on the simpler core.
ISCA'10: Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors
This paper proposes an interesting cache design, in which every cache is divided into private and shared part. The private part is only for the attached core, while the shared part can be alloced to other core, leading to a much larger private cache for other cores.
ISCA'10 : ynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance
This paper proposes to divide the SIMD lanes into two seperate sets on lane Divergence caused by different in branch or memory latency. And then these sets can be executed as threads interleaved.
ISCA'10 : A Case for FAME: FPGA Architecture Model Execution This paper surveies many types of FPGA emulation system.
Direct FAME blow the rtl directly into FPGA, which is currently used by use.
Decoupled FAME use many cycle to simulate a single cycle of a complex device, such as the multi-port reg file.
Multithread use a single pipeline and many state set to simulate many different copy, such as cores.
ISCA'10: Translation Caching: Skip, Don’t Walk (the Page Table)
Every TLB miss may require several dram access, each for a page table level.
MMU caches are used to store those page entries like data cache.
There are many varaints, such as unified one that store all level of entries in one cache, or seperated one that store entries of each level in different cache. Another varaint is that Page Table Cache that indexed by the physical address of the entries, which shows how to find the entries, or translation cache that store the translation result, thus make it indexed by virtual address.
订阅:
博文 (Atom)