The hypervisor can still access these VM's private information be calling this novel hardware.
2011年11月30日星期三
MICRO'11: Architectural Support for Secure Virtualization under a Vulnerable Hypervisor
To prevent the hypervisor from reading the guest VM's memory, this paper proposes hardware mechanism to hide the VM's status and memory owner mapping table from the hypervisor.
MICRO'11: Towards the Ideal On-chip Fabric for 1-to-Many and Many-to-1 Communication
This paper proposes a NOC that support 1-M broadcast/multicasr and M-1 aggregation.
For 1-M, it can balance the usage of X and Y links.It also use matrix-like crossbar with tri-state driver to easy the layout and reduce power. It also reduce the traditional 3 stages pipeline to 1 cycle by sending the advance request before sending flit, such that the next router can perform arbitration before the flit actually arrive.
To aggregate the ACK messages, every ACK message arrive first will become the master at that router, and then each arriving ACK will be merged with the master by adding its ACK number to master's. After a predefined time, no matter whether the master had aggregate all ACKs, it will be send to upper stream router.
2011年11月27日星期日
CAV'11: Linear Completeness Thresholds for Bounded Model Checking
This paper find a set of LTL formulas that contains only F operator, is cliquey. That is, such formulas's corresponding Buchi automata's each SCC have bidirection edge between every two edges.
So when produced with a kripke structure M with recurence diameter rd, the product machine's diameter is a linear function of rd.
This paper also give an algorithm to computer an even tighter approximation of the product machine's diameter.
2011年11月25日星期五
CAV'11: Interactive Synthesis of Code Snippets
This paper improves PLDI'05 paper "Jungloid mining: helping to navigate the API jungle" by supporting parametric type that can lead to multiple ground type.
CAV'11: A Specialization Calculus for Pruning Disjunctive Predicates to Support Verification
The manipulation of seperation logics typically requires the unfolding of disjunctive predicates which may lead to expensive proof search.
This paper proposes to rule out invalid instances of predicates before unfolding, by propagating hidden properties to prune infeasible disjuncts.
2011年11月24日星期四
CAV'10: Smoothing a Program Soundly and Robustly
Based on previous work in "Smooth interpretation" PLDI'10, this paper propose three correctness requirements that make sure smoothing approach can be correctly applied.
1 soundness: general smoothing problem is undecidable, so an approximation is needed, which computes an k-dimension interval <{l_1,u_i},...,{l_k,u_k}> for each input. The soundness requirement is that the output ideal smooth function should be in this interval.
2 robustness: the resulting smooth function must have derivations of all order.
3 β-robustness: partial derivative is bounded by a function linear in x and β.A large number of numerical search routines work by sampling the input space under the expectation that the derivative around each sample point is well defined.
PLDI'10: Smooth interpretation
In digital signal processing, there is Gaussian smoothing technology that can remove the noise of a signal by computing the convolution of the signal and Gaussian function.
This paper proposes to do the same on a simple imperative program that contains while loop, if statement, assign statement and Boolean/floating expression. This simple syntax seems somewhat like fortran.
This paper first perform symbolic simulation on this program to obtain its denotational semantics. And then take the convolution of this semantics and N(x,β), a joint density function of k independent normal variables, each with mean 0 and standard deviation β.
The resulting convolution will always be smooth, and the higher β, the smoother the convolution.
Thus, this smoothed program can be analysized by numerical optimization techniques. This paper uses technology in finding optimal parameter value of PID controller in physical-cyber system.
CAV'11: Quantitative Synthesis for Concurrent Programs
This paper construct the product machine of the environment, the scheduler, the performance model and the partial program. The partial program contains some unspecified predicates that need to be inferred by the algorithm.
And then it is reduced to 2-player game with probabilistic transitions.
The most interesting thing to me is the probabilistic transition that computes a distribution on the next state set.
2011年11月23日星期三
CAV'11: Resolution Proofs and Skolem Functions in QBF Evaluation and Applications
This paper first mentions some concepts that are not familiar to me:
1 Universal reduction : all universal qualified variables at the inner most layer can be removed
2 Q-resolution : first apply normal resolution to existential variables and then apply universal reduction, repeat this process again and again
3 Any QBF can be converted to skolem normal form that contains only two level of quantifiers, first existential and then universal. In this conversion, every existential variable can be replaced with a function F_x, that refers only to inner layer of universal variables. These functions are then existential quantified before those original universal variables.
This paper then describe an algorithm similar to craig interpolant, to transform the resolution proof of the QBF into the skolem functions.
2011年11月22日星期二
MICRO'10: Register Cache System not for Latency Reduction Purpose
This paper shows some astonished facts about the huge register files in wide superscalar processors:
1 The reg file is almost the same size and latency of the L1 cache
2 the reg file requires 2 or even 3 cycles to access
3 deeper pipeline for reg file increase the penalty of the speculation and also result in result shortage that leads to pipeline stall
4 deeper pipeline for reg file also leads to deep bypass network with large number of mux and long wire.
reg cache can store the most used reg file content, and leads to reduced cycle time and cycle number and power.
MICRO'10: Combating Aging with the Colt Duty Cycle Equalizer
In modern CMOS process, the uneven usage of devices will make some heavily used devices approaching their failue very fast.
For example, adders are often used to add a small constant to a variable, so some devices in the higher bits of this adder are always in a constant state.
At the same time, data cache often contains more 0s than 1s.
So this paper proposes to use these devices in a more even way.
For data path components, it proposes to flip it into complementary mode instantly, in which all data are represented in 1's complementary.
For data cache, it proposes to instantly flip it into complementary mode, in which all data are stored in their 1's complementary.
For cache that suffers from uneven usage, it use LFSR to hash the index pointer, to prevent some address from heavily used, while other are not used.
For data components that suffer from uneven usage of left and right operand, it propose to swap these operand instnatly.
2011年11月21日星期一
MICRO'10:Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
This paper shows that for current processors, the clock and control consume 24% power, data supply, such as data cache, consume 28% power, and instruction supply consume 42% power, the real computation only consume 6%. So over 90% power is overhead.
So it seems that combining general purpose cpu, fpga, asic and gpu in a chip is a good solution. But this paper does not give a clear conclusion yet, only bunch of figure that show trend.
MICRO'10:Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling
This paper shows that the actions taken by the cores, such as TLB miss, branch mispredication, clock gating, will cause supply voltage swing. But actual swing is often smaller than the swing margin of the designer. So the processors are normally run in a too conservative mode, which leads to too low voltage, and therefore too slow speed.
So many papers propose to run the processors with smaller margin, and use some mechanism to recover when errors occur.
This paper proposes to measure the relationship between the voltage swing and the job run on the processor, and extrapolante this relation to wider range. It also proposes a software scheduler that help smooth the voltage swing by balancing the load on cores.
2011年11月19日星期六
MICRO'10 : Probabilistic Distance-based Arbitration: Providing Equality of Service for Many-core CMPs
In a NOC that connect multiple PE, if the router is round-robin, then a PE that is far from its communication destination can be treated very unfair in getting bandwidth, while a PE nearer can get more bandwidth.
This paper proposes use the traveling distance of a package to control the arbitrator in a router, such that fairness can be guaranteed.
MICRO'10:Fractal Coherence: Scalably Verifiable Cache Coherence
This paper proposes a cache coherence protocol whose shape is fractal, which is a special case of induction.
For example, if the whole system is a tree, then for a tree of height n, this paper's approach expand one of its leaf to a small tree of height 2, then this new tree is of height n+1.
Then along the border of this newly expanded small tree, this paper tries to prove that both side can observe the other side just the SAME as the old tree.
Here, SAME doesn't just mean one to one state corresponding, instead it means one side in the new tree can have a temporal mapping of its state sequence to a single state in the old tree.
2011年11月18日星期五
MICRO'10: Virtual Snooping: Filtering Snoops in Virtualized Multi-cores
In a multi-core system running multiple Virtual machines, each VM runs on a relatively fixed set of cores, so this paper proposes a mechanism that perform snoopy based cache coherent within VM boundary.
For those rare cases that cache coherent may cross VM boundary, for example, communication between VMs and hypervisor, VM migration to new core, this mechanism provides some techniques to deal with them.
MICRO'10: Adaptive Flow Control for Robust Performance and Energy
This paper proposes an NOC, whose individual router can switch between backpressured and backpressureless mode. The advantage of the former mode is its high throughput, while the latter don't use buffer in low through put application, which leads to low power overhead. The backless mode just drop contention flit or route it ramdomlly.
When two adjacent routers are in different mode, the backpressureless router will change back to backpressure mode, such that it can communicate correctly with its neighbor.
2011年11月17日星期四
MICRO'10: ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment
Some new architecture support execution of an atomic block of instruction(chunk), and commit their result at the end of the trunk.
When multiple processors commit at the same time, it is desirable that it is scalable.
This paper's approach:
(i) need no centralized structure,
(ii) communicate only with the relevant directories, and
(iii) allow the concurrent commit of chunks that use the same directory, as long as their addresses do not overlap.
2011年11月4日星期五
MICRO10 : Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks
Based on the observation that multiple flit in a package normally traverse through the same input-output pair in a router, so it is not necessary to arbitrate for every flit. So this paper propose to remember the last arbitrate result and use it for the next flit.
MICRO10: Erasing Core Boundaries for Robust and Configurable Performance
Dynamically merging computation components and pipeline states to form a processor with wider data-path or a group of simple processors with narrow data-path.
This approach can also remove defective component to increase reliability.
MICRO10: Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches
To reduce power consumption, the cache is driven by very low voltage that is near the threshold. But this may significantly increase the possibility of error. so this paper proposes to detect and correct these error with turbo code.
MICRO10:Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors
Detecting instruction flow similarity between multiple thread, and only split them when they expose different computation.
订阅:
博文 (Atom)