This is mainly a starting point for users to then read the respective user's guides.

General optimization flags (-O2/-O3)

PGI: -O2/-O3
Intel: (same)
Does: Scheduling, register allocation and global optimizations

Target Processor Optimizations(-tp)

PGI:
  • -tp {athlonxp | k8-32 | k8-64 | px | p5 | p6 | p7}

Intel:

  • very similar, save for the AMD options: -tp {p1 | p2 | p5 | p6 | p7}
    (p6 = Pentium 3, p7 = Pentium 4, P4 Xeon and Prescott}
  • see also the -arch command. -arch pn4 optimizes for the Intel Penium 4 processor

Pointer Aliasing

PGI:
  • -MSafepointer - local/global/argument pointers are safe

Intel:

  • -fno-alias - assume that pointer aliasing won't happen in the program
  • -fno-fnalias - assume no aliasing within functions, but don't assume aliasing can't happen between calls
  • see also -ansi-alias, -assume dummy_aliases,

Vectorizer

PGI:
  • -Mscalarsse - do all scalar floating-point ops with SSE/SSE2
  • -Mvect=sse - try to use SSE/SSE2 "packed" instructions to speed up vectorizable loops.
  • -Mneginfo={concur|loop} - display why loops are not being vectorized or parallelized

Intel:

  • -x {K | W | N | B | P} (K = P3, W = P4, N = P4 with enhancements, B = Pentium M (mobile), P = Prescott/Nocona
  • -vec_report[n] - Intel equivalent of -Mneginfo. n varies from 0-5 depending on information desired. -vec_report1 is the default, and displays vectorized loops. -vec_report2 displays both vectorized and non-vectorized loops.
  • see also the -par_report[n] flag under 'Auto-parallelizing'

Inter-Procedural Optimizations

PGI:
  • -Mipa=fast - Enable inter-procedural optimizations

Intel:

  • -ip - Enable inter-procedural optimizations
  • -ipo - Enable multifile inter-procedural optimizations

Floating Point Precision

PGI:
  • -pc {32 | 64 | 80} - sets precision of ops on FP stack
  • -Kieee - sets strict IEEE FP compliance

Intel:

  • -pc {32 | 64 | 80} (same as PGI)
  • -mp - Intel equivalent of PGI's -Kieee. Floating point performance decreases with thie option. Note, mp has an entirely different meaning in PGI! (see OpenMP).
  • -mp1 - improved floating point precision with reduced performance, but less of an impact than -mp
  • see also: -fp_port, -prec_div

Profiling

PGI:
  • -Mprof - enable function or line-level profiling

Intel:

  • -p- - compiles and links for function profiling with gprof

Loop Unrolling

PGI:
  • -Munroll

Intel:

  • −unroll[n] Unroll loops up to n iterations. −unroll0 disables all loop unrolling. Just −unroll, causes compiler to use default heuristics.

Inlining

PGI:
  • -Minline= - inlines functions and subroutines

Intel:

  • −ip - does interprocedural optimizations withing 1 file, including inline function expansion.
  • −ip_no_inlining - when used with -ip or -ipo, disables full and partial inlining

Auto-parallelizing / OpenMP

PGI:
  • -Mconcur - tells compiler to attempt to auto-parallelize loops for SMP systems
  • -mp - handle OpenMP/SGI directives & pragmas. Note, mp has an entirely different meaning in Intel! (see 'Floating Point Precision').
  • -Minfo - compile-time optimization/parallelization messages

Intel:

  • -openmp - Intel equivalent of -mp
  • -parallel - Probably similar to -Mconcur. Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. Requires −O2 or −O3.
  • -par_report[n] - displays controllable amounts of details describing which loops were/were not auto-parallelized

Cache Alignment

PGI:
  • -Mcache_align - align unconstrained objects that are >= 16 bytes to a cache line boundary. Note: comment at ClusterWorld was that this flag is important for Opteron performance

Intel:

  • -align - see the documentation for the many ways to use this option. The simplest way is -align all

Byte Swapping

PGI:
  • -byteswapio - swaps big-endian to little-endian and vice versa when reading bytes via input/output of unformatted Fortran data files

Intel:

  • -convert along with the F_UFMTENDIAN environment variable. There are many permutations of the -convert option; see the documentation for details

Memory Models

PGI:
  • -mcmmodel=medium - needed to use > 2GB on Opteron.

Intel:

  • ???

Fortran Compatibility

PGI:
  • -Msecond_underscore , -g77libs

Intel:

  • -f77rtl, -intconstant

See both PGI and Intel docs for details

Assembly dump only

PGI: ==-Mkeepasm==
Intel -S

Combined Optimized flags (-fast)

PGI:
  • -fast (includes -O2 -tp <target> -Munroll -Mnoframe -Mlre)
  • -fastsse - includes -fast -Mvect=sse -Mscalarsse -Mcache_align

Intel:

  • -fast (includes -O3 -ipo -static)

See also...

-- MattWalsh - 23 Apr 2004

-- MattWalsh - 13 Mar 2006

Topic revision: r1 - 13 Mar 2006 - MattWalsh
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback