Maximus
February 2nd, 2011, 06:31
hello,
roaming the net i've found this:
http://www.agner.org/optimize/#manuals
worth of a look, especially for the updated latency/throughput instruction sets, very very useful information for speed optimizing a critical code snippet.
(...not sure of it is legal to dl & put them in collaborative library, but surely interesting stuff).
(! Sandy bridge measurements too? cool. Too bad there arent Bulldozer measurements yet)
-----
interesting: on sandbridge, INC DEC NEG NOT takes 3 mops for memory and 1 from register, whereas add/sub takes 2 and 1. It means intel maintained the old approach of adding 1 mop to 'fix' the flags for the INC/DEC, but only for inc[mem] (on older p4 INC were taking 2 mops, 1 for add and 1 for flag fix).
roaming the net i've found this:
http://www.agner.org/optimize/#manuals
worth of a look, especially for the updated latency/throughput instruction sets, very very useful information for speed optimizing a critical code snippet.
(...not sure of it is legal to dl & put them in collaborative library, but surely interesting stuff).
(! Sandy bridge measurements too? cool. Too bad there arent Bulldozer measurements yet)
-----
interesting: on sandbridge, INC DEC NEG NOT takes 3 mops for memory and 1 from register, whereas add/sub takes 2 and 1. It means intel maintained the old approach of adding 1 mop to 'fix' the flags for the INC/DEC, but only for inc[mem] (on older p4 INC were taking 2 mops, 1 for add and 1 for flag fix).