-O2/-O3 -tp {athlonxp | k8-32 | k8-64 | px | p5 | p6 | p7} -tp {p1 | p2 | p5 | p6 | p7} p6 = Pentium 3, p7 = Pentium 4, P4 Xeon and Prescott} -arch command. -arch pn4 optimizes for the Intel Penium 4 processor
-MSafepointer - local/global/argument pointers are safe
-fno-alias - assume that pointer aliasing won't happen in the program
-fno-fnalias - assume no aliasing within functions, but don't assume aliasing can't happen between calls
-ansi-alias, -assume dummy_aliases,
-Mscalarsse - do all scalar floating-point ops with SSE/SSE2
-Mvect=sse - try to use SSE/SSE2 "packed" instructions to speed up vectorizable loops.
-Mneginfo={concur|loop} - display why loops are not being vectorized or parallelized
-x {K | W | N | B | P} (K = P3, W = P4, N = P4 with enhancements, B = Pentium M (mobile), P = Prescott/Nocona
-vec_report[n] - Intel equivalent of -Mneginfo. n varies from 0-5 depending on information desired. -vec_report1 is the default, and displays vectorized loops. -vec_report2 displays both vectorized and non-vectorized loops.
-par_report[n] flag under 'Auto-parallelizing'
-Mipa=fast - Enable inter-procedural optimizations
-ip - Enable inter-procedural optimizations
-ipo - Enable multifile inter-procedural optimizations
-pc {32 | 64 | 80} - sets precision of ops on FP stack
-Kieee - sets strict IEEE FP compliance
-pc {32 | 64 | 80} (same as PGI)
-mp - Intel equivalent of PGI's -Kieee. Floating point performance decreases with thie option. Note, mp has an entirely different meaning in PGI! (see OpenMP).
-mp1 - improved floating point precision with reduced performance, but less of an impact than -mp
-fp_port, -prec_div
-Mprof - enable function or line-level profiling
-p- - compiles and links for function profiling with gprof
-Munroll
−unroll[n] Unroll loops up to n iterations. −unroll0 disables all loop unrolling. Just −unroll, causes compiler to use default heuristics.
-Minline= - inlines functions and subroutines
−ip - does interprocedural optimizations withing 1 file, including inline function expansion.
−ip_no_inlining - when used with -ip or -ipo, disables full and partial inlining
-Mconcur - tells compiler to attempt to auto-parallelize loops for SMP systems
-mp - handle OpenMP/SGI directives & pragmas. Note, mp has an entirely different meaning in Intel! (see 'Floating Point Precision').
-Minfo - compile-time optimization/parallelization messages
-openmp - Intel equivalent of -mp
-parallel - Probably similar to -Mconcur. Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. Requires −O2 or −O3.
-par_report[n] - displays controllable amounts of details describing which loops were/were not auto-parallelized
-Mcache_align - align unconstrained objects that are >= 16 bytes to a cache line boundary. Note: comment at ClusterWorld was that this flag is important for Opteron performance
-align - see the documentation for the many ways to use this option. The simplest way is -align all
-byteswapio - swaps big-endian to little-endian and vice versa when reading bytes via input/output of unformatted Fortran data files
-convert along with the F_UFMTENDIAN environment variable. There are many permutations of the -convert option; see the documentation for details
-mcmmodel=medium - needed to use > 2GB on Opteron.
-Msecond_underscore , -g77libs
-f77rtl, -intconstant
-S
-fast (includes -O2 -tp <target> -Munroll -Mnoframe -Mlre)
-fastsse - includes -fast -Mvect=sse -Mscalarsse -Mcache_align
-fast (includes -O3 -ipo -static)