1
0
Commit Graph

16849 Commits

Author SHA1 Message Date
Linus Torvalds
b88af99487 Merge tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix two issues in the Operating Performance Points (OPP)
  framework, one cpufreq driver issue, one problem related to the tasks
  freezer and a few build-related issues in the cpupower utility.

  Specifics:

   - Fix tasks freezer deadlock in de_thread() that occurs if one of its
     sub-threads has been frozen already (Chanho Min).

   - Avoid registering a platform device by the ti-cpufreq driver on
     platforms that cannot use it (Dave Gerlach).

   - Fix a mistake in the ti-opp-supply operating performance points
     (OPP) driver that caused an incorrect reference voltage to be used
     and make it adjust the minimum voltage dynamically to avoid hangs
     or crashes in some cases (Keerthy).

   - Fix issues related to compiler flags in the cpupower utility and
     correct a linking problem in it by renaming a file with a duplicate
     name (Jiri Olsa, Konstantin Khlebnikov)"

* tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  exec: make de_thread() freezable
  cpufreq: ti-cpufreq: Only register platform_device when supported
  opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call
  opp: ti-opp-supply: Dynamically update u_volt_min
  tools cpupower: Override CFLAGS assignments
  tools cpupower debug: Allow to use outside build flags
  tools/power/cpupower: fix compilation with STATIC=true
2018-11-23 10:52:57 -08:00
Vlad Dumitrescu
f11216b242 bpf: add skb->tstamp r/w access from tc clsact and cg skb progs
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.

Write access from cg skb progs only with CAP_SYS_ADMIN, since the value
will be used by downstream qdiscs. It might make sense to relax this.

Changes v1 -> v2:
  - allow access from cg skb, write only with CAP_SYS_ADMIN

Signed-off-by: Vlad Dumitrescu <vladum@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-22 15:47:28 -08:00
Paolo Abeni
1d0795ea9c selftests: explicitly require kernel features needed by udpgro tests
commit 3327a9c463 ("selftests: add functionals test for UDP GRO")
make use of ipv6 NAT, but such a feature is not currently implied by
selftests. Since the 'ip[6]tables' commands may actually create nft rules,
depending on the specific user-space version, let's pull both NF and
NFT nat modules plus the needed deps.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 3327a9c463 ("selftests: add functionals test for UDP GRO")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22 11:35:28 -08:00
Colin Ian King
ab85b01434 tools/bpf: fix spelling mistake "memeory" -> "memory"
The CHECK message contains a spelling mistake, fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22 12:15:54 +01:00
Greg Kroah-Hartman
7c0bc65c84 Merge tag 'iio-for-4.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-testing
Jonathan writes:

First set of new device support, features and cleanups for IIO in the 4.21 cycle

Along with the headline feature of 5 new drivers, we have the
substantial addition of auxilliary sensor support on the lsm6sdx
parts for ST.  There has also been a good set of staging cleanup
in this period with more underway.

An ever increasing number of devices supported with just a new
ID which is a good sign that at least some manufacturers are
continuing to stabilise their interfaces.

New device support,
* ad7124
  - New driver supporting Analog Devices' ad7124-4 and ad7124-8 parts
    with the inevitable DT binding.
* ad7949
  - New driver supporting Analog Devices' ad7949, AD7682 and AD7689 ADCs.
* rm3100
  - New driver supporting PNIs RM3100 magnometer with bindings and
    vendor prefix.
* ti-dac7311
  - New driver supporting DAC7311, DAC6311 and DAC5311 TI DACs, with
    DT bindings.
* vcnl5035
  - New driver supporting the light sensor part of the VCNL4035, with
    DT bindings

Features,
* bindings
  - Add a generic ADC channel binding as we keep reinventing this
    wheel.
* adc128s052
  - Add IDs for additional pin compatible parts.
  - Add APCI ID seen on E3940 UP squared boards.
* ad_sigma_delta
  - Allow for custom data register overiding default.
* kxcjk1013
  - Add KIOX0009 ACPI ID as seen on the Acer One 10.
* lsm6dsx
  - Rework leading to...
  - External sensor support using the built in I2C master.
  - Initial support for a slave lis2mdl magnetometer.
* meson-saradc
  - Add temperature sensor support and bindings.
* st_magn
  - New ID for lsm9dsl_magn with bindings
  - New ID for lis3de accelerometer
* tpl0102
  - Add supprot for IIO_AVAIL_RANGE to report the range available
    from this device to userspace and in kernel users.

Cleanups and minor fixes
* tools
  - Allow outside specification of CFLAGS
* ad2s90
  - Handle and spi_read error.
  - Handle spi_setup failure
  - Drop a pointless assignment.
  - Prevent a potentail race by moving device registration to after
    all other setup.
  - Add missing scale attribute.
  - Add a sanity check on channel type before trying to read it.
* ad2s1210
  - Move to modern gpio descriptors.
  - Drop a gpioin flag which made no sense as far as we can tell.
  - Add dt table (bindings doc to follow when this is ready for
    moving out of staging).
* ad5933
  - Drop camel-case naming of ext_clk_hz.
  - White space fixes.
* ad7150
  - Local variable to shorten overly long line.
  - Alignment and line break fixes.
* ad7280a
  - Handle an error path that was previously ignored.
  - Use crc8.h to build the crc table replacing custom code.
  - Avoid unecessary cast.
  - Power down the device if an error happens in probe
  - Use devm routines to simplify probe and remove.
* ad7606
  - Alignment fixes.
* ad7780
  - This worked as long as by coincidence an uninitialized value
    was 0.  Lets not rely on that.
  - Ensure gain update is only used with the ad778x chips that
    actually support it.
  - Tidy up pattern mask generation.
  - Read regulator when scale is requested (which should be infrequent)
    as it might have changed from initialization.
* ad7816
  - Move to modern gpio descriptors
  - Don't use a busy_pin for ad7818 as there isn't one.
  - Ensure RD/WR and CONVST pins are outputs (previously they
    were brought up as inputs which doesn't seem to make any sense)
  - DT id table.
* adc128s052
  - SPDX
* adt7316
  - Alignment fix.
  - Fix data reading.  When using I2C the driver never actually
    used the value read.  This has been broken a very long time
    hence no rush to fix it now + the driver is undergoing a lot
    of cleanup.
  - Sanity check that the i2c read didn't fail to actually read
    anything.
* dpot-dac
  - Mark a switch full through with slightly different text so that
    gcc doesn't warn on it.
* gyro-adc
  - Fix a wrong file in the MAINTAINERS entry and add binding doc to the
    listed files.
* ina2xx
  - Add some early returns to clarify error paths in switch.
* lsm6dsx
  - MAINTAINERS entry.
* max11100
  - SPDX
* max9611
  - SPDX
* mcp4131
  - use of_device_get_match_data in preference to spi_get_device_id
    approach.
* rcar-adc
  - SPDX
* sc27xx
  - Add ADC conversion timeout support to avoid possible fault.
* ssp_sensors
  - Don't free managed resources manually.
* st-magn
  - Add a comment to avoid future confusion over when to use -magn
    postfix (on multi chip in package parts)
  - Add BDU register for LIS3MDL where it seems to have been missed.
* st-sensors
  - Minor spelling, grammar etc fixes.
* tpl0102
  - Use a pointer rather than an index of an array to improve conciseness.

* tag 'iio-for-4.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (80 commits)
  Staging: iio: adt7316: Add an extra check for 'ret' equals to 0
  Staging: iio: adt7316: Fix i2c data reading, set the data field
  dt-bindings: iio: adc: Add docs for ad7124
  iio: adc: Add ad7124 support
  dt-bindings: iio: adc: Add common ADCs properties to a separate file
  iio: ad_sigma_delta: Allow to provide custom data register address
  staging: iio: ad7816: Add device tree table.
  iio: imu: st_lsm6dsx: add entry in MAINTAINERS file
  iio: potentiometer: mcp4131: use of_device_get_match_data()
  staging: iio: adc: ad7280a: use devm_* APIs
  staging: iio: adc: ad7280a: power down the device on error in probe
  dt-bindings: iio: imu: st_lsm6dsx: add support to i2c pullup resistors
  iio: imu: st_lsm6dsx: add hw FIFO support to i2c controller
  iio: imu: st_lsm6dsx: add st_lsm6dsx_push_tagged_data routine
  iio: imu: st_lsm6dsx: add i2c embedded controller support
  iio: imu: st_lsm6dsx: introduce st_lsm6dsx_sensor_set_enable routine
  iio: imu: st_lsm6dsx: introduce ST_LSM6DSX_ID_EXT sensor ids
  iio: imu: st_lsm6dsx: remove static from st_lsm6dsx_set_watermark
  iio: imu: st_lsm6dsx: reload trimming parameter at bootstrap
  iio: imu: st_lsm6dsx: introduce locked read/write utility routines
  ...
2018-11-22 09:39:45 +01:00
Kan Liang
f4a0742b3c perf pmu: Move *_cpuid_str() weak functions to header.c
The weak functions, strcmp_cpuid_str() and get_cpuid_str(), are defined
in pmu.c.

Most of the cpuid related functions, including *_cpuid_str()'s
declaration and platform specific definition, are in header.c/h.

To make the declaration and definition of all cpuid related functions in
a consistent place, move the weak functions to header.c.

There is no functional change.

Suggested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Link: http://lkml.kernel.org/r/20181121164939.13482-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:59 -03:00
Eric Saint-Etienne
1e6285699b perf symbols: Fix slowness due to -ffunction-section
Perf can take minutes to parse an image when -ffunction-section is used.
This is especially true with the kernel image when it is compiled this
way, which is the arm64 default since the patcheset "Enable deadcode
elimination at link time".

Perf organize maps using a rbtree. Whenever perf finds a new symbols, it
first searches this rbtree for the map it belongs to, by strcmp()'aring
section names.  When it finds the map with the right name, it uses it to
add the symbol. With a usual image there aren't so many maps but when
using -ffunction-section there's basically one map per function.  With
the kernel image that's north of 40,000 maps. For most symbols perf has
to parses the entire rbtree to eventually create a new map and add it.
Consequently perf spends most of the time browsing a rbtree that keeps
getting larger.

This performance fix introduces a secondary rbtree that indexes maps
based on the section name.

Signed-off-by: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: David Aldridge <david.aldridge@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1542822679-25591-1-git-send-email-eric.saint.etienne@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:59 -03:00
Jiri Olsa
dd1d0044dd perf jvmti: Separate jvmti cmlr check
The Compiled Method Load Record (cmlr) is JDK specific interface to
access JVM stack info. This makes the jvmti agent code not compile under
another jdk, which does not support that.

Separating jvmti cmlr check into special feature check, and adding
HAVE_JVMTI_CMLR macro to indicate that.

Mark cmlr code in jvmti/libjvmti.c with HAVE_JVMTI_CMLR, so we can
compile it on system without cmlr support.

This change makes the jvmti compile with java-1.8.0-ibm package. It's
without the line numbers support, but the rest works.

Adding NO_JVMTI_CMLR compile variable for testing.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20181121154341.21521-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:58 -03:00
Kan Liang
ecd94f1be3 perf vendor events: Add JSON metrics for Cascadelake server
Add JSON metrics (based on event list v1) for Cascadelake server

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/3ab97c73-c197-8555-1a35-b54636e667e6@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:58 -03:00
Kan Liang
3b54411a44 perf vendor events: Add stepping in CPUID string for x86
The perf tools cannot find the proper event list for the Cascadelake
server.  Because the Cascadelake server and the Skylake server have the
same CPU model number, which are used by the perf tools to find the
event list.

The stepping for Skylake server is up to 4.

The stepping for Cascadelake server starts from 5.

The stepping can be used to distinguish between them.

The stepping is added in get_cpuid_str().

The stepping information for Skylake server is updated in mapfile.csv.

A x86 specific strcmp_cpuid_cmp() function is added to handle two CPUID
formats in mapfile.csv, "vendor-family-model-stepping" and
"vendor-family-model":

- If a cpuid-regular-expression from the mapfile.csv using the new
  stepping format, a cpuid-string generated on the machine must include
  stepping. Otherwise, it is a mismatch.

- If the cpuid-regular-expression using the old non-stepping format,
  the stepping in the cpuid-string will be ignored.

The script, using environment string "PERF_CPUID" without stepping on
Skylake server, will be broken. If so, users must fix their scripts.

Committer notes:

Fixed this build error on centos:6 and debian:7:

  arch/x86/util/header.c: In function 'is_full_cpuid':
  arch/x86/util/header.c:82:39: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
  arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
  arch/x86/util/header.c: In function 'strcmp_cpuid_str':
  arch/x86/util/header.c:98:56: error: declaration of 'cpuid' shadows a global declaration [-Werror=shadow]
  arch/x86/util/header.c:12:1: error: shadowed declaration is here [-Werror=shadow]
  cc1: all warnings being treated as errors

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181114212416.15665-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:57 -03:00
Ravi Bangoria
eb08d00605 perf stat: Use perf_evsel__is_clocki() for clock events
We already have function to check if a given event is either
SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: yuzhoujian@didichuxing.com
Link: http://lkml.kernel.org/r/20181115095533.16930-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:57 -03:00
Ben Hutchings
11a64a05dc perf pmu: Suppress potential format-truncation warning
Depending on which functions are inlined in util/pmu.c, the snprintf()
calls in perf_pmu__parse_{scale,unit,per_pkg,snapshot}() might trigger a
warning:

  util/pmu.c: In function 'pmu_aliases':
  util/pmu.c:178:31: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
    snprintf(path, PATH_MAX, "%s/%s.unit", dir, name);
                               ^~

I found this when trying to build perf from Linux 3.16 with gcc 8.
However I can reproduce the problem in mainline if I force
__perf_pmu__new_alias() to be inlined.

Suppress this by using scnprintf() as has been done elsewhere in perf.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20181111184524.fux4taownc6ndbx6@decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:56 -03:00
Pu Wen
4787eff3fa perf tools: Add Hygon Dhyana support
The tool perf is useful for the performance analysis on the Hygon Dhyana
platform. But right now there is no Hygon support for it to analyze the
KVM guest os data. So add Hygon Dhyana support to it by checking vendor
string to share the code path of AMD.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1542008451-31735-1-git-send-email-puwen@hygon.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:56 -03:00
Davidlohr Bueso
231457ec70 perf bench: Add epoll_ctl(2) benchmark
Benchmark the various operations allowed for epoll_ctl(2).  The idea is
to concurrently stress a single epoll instance doing add/mod/del
operations.

Committer testing:

  # perf bench epoll ctl
  # Running 'epoll/ctl' benchmark:
  Run summary [PID 20344]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.

  [thread  0] fdmap: 0x21a46b0 ... 0x21a47ac [ add: 1680960 ops; mod: 1680960 ops; del: 1680960 ops ]
  [thread  1] fdmap: 0x21a4960 ... 0x21a4a5c [ add: 1685440 ops; mod: 1685440 ops; del: 1685440 ops ]
  [thread  2] fdmap: 0x21a4c10 ... 0x21a4d0c [ add: 1674368 ops; mod: 1674368 ops; del: 1674368 ops ]
  [thread  3] fdmap: 0x21a4ec0 ... 0x21a4fbc [ add: 1677568 ops; mod: 1677568 ops; del: 1677568 ops ]

  Averaged 1679584 ADD operations (+- 0.14%)
  Averaged 1679584 MOD operations (+- 0.14%)
  Averaged 1679584 DEL operations (+- 0.14%)
  #

Lets measure those calls with 'perf trace' to get a glympse at what this
benchmark is doing in terms of syscalls:

  # perf trace -m32768 -s perf bench epoll ctl
  # Running 'epoll/ctl' benchmark:
  Run summary [PID 20405]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs.

  [thread  0] fdmap: 0x21764e0 ... 0x21765dc [ add: 1100480 ops; mod: 1100480 ops; del: 1100480 ops ]
  [thread  1] fdmap: 0x2176790 ... 0x217688c [ add: 1250176 ops; mod: 1250176 ops; del: 1250176 ops ]
  [thread  2] fdmap: 0x2176a40 ... 0x2176b3c [ add: 1022464 ops; mod: 1022464 ops; del: 1022464 ops ]
  [thread  3] fdmap: 0x2176cf0 ... 0x2176dec [ add: 705472 ops; mod: 705472 ops; del: 705472 ops ]

  Averaged 1019648 ADD operations (+- 11.27%)
  Averaged 1019648 MOD operations (+- 11.27%)
  Averaged 1019648 DEL operations (+- 11.27%)

  Summary of events:

  epoll-ctl (20405), 1264 events, 0.0%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   eventfd2             256     9.514     0.001     0.037     5.243     68.00%
   clone                  4     1.245     0.204     0.311     0.531     24.13%
   mprotect              66     0.345     0.002     0.005     0.021      7.43%
   openat                45     0.313     0.004     0.007     0.073     21.93%
   mmap                  88     0.302     0.002     0.003     0.013      5.02%
   futex                  4     0.160     0.002     0.040     0.140     83.43%
   sched_setaffinity      4     0.124     0.005     0.031     0.070     49.39%
   read                  44     0.103     0.001     0.002     0.013     15.54%
   fstat                 40     0.052     0.001     0.001     0.003      5.43%
   close                 39     0.039     0.001     0.001     0.001      1.48%
   stat                   9     0.034     0.003     0.004     0.006      7.30%
   access                 3     0.023     0.007     0.008     0.008      4.25%
   open                   2     0.021     0.008     0.011     0.013     22.60%
   getdents               4     0.019     0.001     0.005     0.009     37.15%
   write                  2     0.013     0.004     0.007     0.009     38.48%
   munmap                 1     0.010     0.010     0.010     0.010      0.00%
   brk                    3     0.006     0.001     0.002     0.003     26.34%
   rt_sigprocmask         2     0.004     0.001     0.002     0.003     43.95%
   rt_sigaction           3     0.004     0.001     0.001     0.002     16.07%
   prlimit64              3     0.004     0.001     0.001     0.001      5.39%
   prctl                  1     0.003     0.003     0.003     0.003      0.00%
   epoll_create           1     0.003     0.003     0.003     0.003      0.00%
   lseek                  2     0.002     0.001     0.001     0.001     11.42%
   sched_getaffinity        1     0.002     0.002     0.002     0.002      0.00%
   arch_prctl             1     0.002     0.002     0.002     0.002      0.00%
   set_tid_address        1     0.001     0.001     0.001     0.001      0.00%
   getpid                 1     0.001     0.001     0.001     0.001      0.00%
   set_robust_list        1     0.001     0.001     0.001     0.001      0.00%
   execve                 1     0.000     0.000     0.000     0.000      0.00%

 epoll-ctl (20406), 1245480 events, 14.6%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   epoll_ctl         619511  1034.927     0.001     0.002     6.691      0.67%
   nanosleep           3226   616.114     0.006     0.191    10.376      7.57%
   futex                  2    11.336     0.002     5.668    11.334     99.97%
   set_robust_list        1     0.001     0.001     0.001     0.001      0.00%
   clone                  1     0.000     0.000     0.000     0.000      0.00%

 epoll-ctl (20407), 1243151 events, 14.5%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   epoll_ctl         618350  1042.181     0.001     0.002     2.512      0.40%
   nanosleep           3220   366.261     0.012     0.114    18.162      9.59%
   futex                  4     5.463     0.001     1.366     5.427     99.12%
   set_robust_list        1     0.002     0.002     0.002     0.002      0.00%

 epoll-ctl (20408), 1801690 events, 21.1%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   epoll_ctl         896174  1540.581     0.001     0.002     6.987      0.74%
   nanosleep           4667   783.393     0.006     0.168    10.419      7.10%
   futex                  2     4.682     0.002     2.341     4.681     99.93%
   set_robust_list        1     0.002     0.002     0.002     0.002      0.00%
   clone                  1     0.000     0.000     0.000     0.000      0.00%

 epoll-ctl (20409), 4254890 events, 49.8%

   syscall            calls    total       min       avg       max      stddev
                               (msec)    (msec)    (msec)    (msec)        (%)
   --------------- -------- --------- --------- --------- ---------     ------
   epoll_ctl        2116416  3768.097     0.001     0.002     9.956      0.41%
   nanosleep          11023  1141.778     0.006     0.104     9.447      4.95%
   futex                  3     0.037     0.002     0.012     0.029     70.50%
   set_robust_list        1     0.008     0.008     0.008     0.008      0.00%
   madvise                1     0.005     0.005     0.005     0.005      0.00%
   clone                  1     0.000     0.000     0.000     0.000      0.00%
  #

Committer notes:

Fix build on fedora:24-x-ARC-uClibc, debian:experimental-x-mips,
debian:experimental-x-mipsel, ubuntu:16.04-x-arm and ubuntu:16.04-x-powerpc

    CC       /tmp/build/perf/bench/epoll-ctl.o
  bench/epoll-ctl.c: In function 'init_fdmaps':
  bench/epoll-ctl.c:214:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
    for (i = 0; i < nfds; i+=inc) {
                  ^
  bench/epoll-ctl.c: In function 'bench_epoll_ctl':
  bench/epoll-ctl.c:377:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
    for (i = 0; i < nthreads; i++) {
                  ^
  bench/epoll-ctl.c:388:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
    for (i = 0; i < nthreads; i++) {
                  ^
  cc1: all warnings being treated as errors

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-3-dave@stgolabs.net
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:39:55 -03:00
Davidlohr Bueso
121dd9ea01 perf bench: Add epoll parallel epoll_wait benchmark
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.

Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.

Committer testing:

  # perf bench
  Usage:
	perf bench [<common options>] <collection> <benchmark> [<options>]

        # List of all available benchmark collections:

         sched: Scheduler and IPC benchmarks
           mem: Memory access benchmarks
          numa: NUMA scheduling and MM benchmarks
         futex: Futex stressing benchmarks
         epoll: Epoll stressing benchmarks
           all: All benchmarks

  # perf bench epoll

        # List of available benchmarks for collection 'epoll':

          wait: Benchmark epoll concurrent epoll_waits
           all: Run all futex benchmarks

  # perf bench epoll wait
  # Running 'epoll/wait' benchmark:
  Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.

  [thread  0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
  [thread  1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
  [thread  2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]

  Averaged 353786 operations/sec (+- 4.35%), total secs = 8
  #

Committer notes:

Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:

    CC       /tmp/build/perf/bench/epoll-wait.o
  bench/epoll-wait.c: In function 'writerfn':
  bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
    printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  ~~~~
  bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
    do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
                                 ^~~
  cc1: all warnings being treated as errors

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:38:47 -03:00
Arnaldo Carvalho de Melo
11c6cbe706 tools build feature: Check if eventfd() is available
A new 'perf bench epoll' will use this, and to disable it for older
systems, add a feature test for this API.

This is just a simple program that if successfully compiled, means that
the feature is present, at least at the library level, in a build that
sets the output directory to /tmp/build/perf (using O=/tmp/build/perf),
we end up with:

  $ ls -la /tmp/build/perf/feature/test-eventfd*
  -rwxrwxr-x. 1 acme acme 8176 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.bin
  -rw-rw-r--. 1 acme acme  588 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.d
  -rw-rw-r--. 1 acme acme    0 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.make.output
  $ ldd /tmp/build/perf/feature/test-eventfd.bin
	  linux-vdso.so.1 (0x00007fff3bf3f000)
	  libc.so.6 => /lib64/libc.so.6 (0x00007fa984061000)
	  /lib64/ld-linux-x86-64.so.2 (0x00007fa984417000)
  $ grep eventfd -A 2 -B 2 /tmp/build/perf/FEATURE-DUMP
  feature-dwarf=1
  feature-dwarf_getlocations=1
  feature-eventfd=1
  feature-fortify-source=1
  feature-sync-compare-and-swap=1
  $

The main thing here is that in the end we'll have -DHAVE_EVENTFD in
CFLAGS, and then the 'perf bench' entry needing that API can be
selectively pruned.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-wkeldwob7dpx6jvtuzl8164k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 22:25:44 -03:00
Ido Schimmel
c39c56a8c8 selftests: forwarding: vxlan_bridge_1d: Add learning test
Add a test which checks that the VxLAN driver can learn FDB entries and
that these entries are correctly deleted and aged-out.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 17:10:31 -08:00
Ido Schimmel
dbd4485a69 selftests: mlxsw: Consider VxLAN learning enabled as valid
The test currently expects that a configuration which includes a VxLAN
device with learning enabled to fail.

Previous patches enabled VxLAN learning in mlxsw, so change the test
accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 17:10:31 -08:00
Jakub Kicinski
dde7011a82 tools: bpftool: fix potential NULL pointer dereference in do_load
This patch fixes a possible null pointer dereference in
do_load, detected by the semantic patch deref_null.cocci,
with the following warning:

./tools/bpf/bpftool/prog.c:1021:23-25: ERROR: map_replace is NULL but dereferenced.

The following code has potential null pointer references:
881             map_replace = reallocarray(map_replace, old_map_fds + 1,
882                                        sizeof(*map_replace));
883             if (!map_replace) {
884                     p_err("mem alloc failed");
885                     goto err_free_reuse_maps;
886             }

...
1019 err_free_reuse_maps:
1020         for (i = 0; i < old_map_fds; i++)
1021                 close(map_replace[i].fd);
1022         free(map_replace);

Fixes: 3ff5a4dc5d ("tools: bpftool: allow reuse of maps with bpftool prog load")
Co-developed-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-22 00:45:51 +01:00
Nikita V. Shirokov
b1957c92eb bpf: adding tests for map_in_map helpber in libbpf
adding test/example of bpf_map__set_inner_map_fd usage

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:33:22 +01:00
Nikita V. Shirokov
addb9fc90f bpf: adding support for map in map in libbpf
idea is pretty simple. for specified map (pointed by struct bpf_map)
we would provide descriptor of already loaded map, which is going to be
used as a prototype for inner map. proposed workflow:
1) open bpf's object (bpf_object__open)
2) create bpf's map which is going to be used as a prototype
3) find (by name) map-in-map which you want to load and update w/
descriptor of inner map w/ a new helper from this patch
4) load bpf program w/ bpf_object__load

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:33:21 +01:00
Stanislav Fomichev
5b32a23e1d bpf: libbpf: don't specify prog name if kernel doesn't support it
Use recently added capability check.

See commit 23499442c3 ("bpf: libbpf: retry map creation without
the name") for rationale.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:26:14 +01:00
Stanislav Fomichev
94cb310cfa bpf: libbpf: remove map name retry from bpf_create_map_xattr
Instead, check for a newly created caps.name bpf_object capability.
If kernel doesn't support names, don't specify the attribute.

See commit 23499442c3 ("bpf: libbpf: retry map creation without
the name") for rationale.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:26:04 +01:00
Stanislav Fomichev
47eff61777 bpf, libbpf: introduce bpf_object__probe_caps to test BPF capabilities
It currently only checks whether kernel supports map/prog names.
This capability check will be used in the next two commits to
skip setting prog/map names.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:25:33 +01:00
Stanislav Fomichev
8c4905b995 libbpf: make sure bpf headers are c++ include-able
Wrap headers in extern "C", to turn off C++ mangling.
This simplifies including libbpf in c++ and linking against it.

v2 changes:
* do the same for btf.h

v3 changes:
* test_libbpf.cpp to test for possible future c++ breakages

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 23:15:41 +01:00
Yonghong Song
462c124c59 bpf: fix a libbpf loader issue
Commit 2993e0515b ("tools/bpf: add support to read .BTF.ext sections")
added support to read .BTF.ext sections from an object file, create
and pass prog_btf_fd and func_info to the kernel.

The program btf_fd (prog->btf_fd) is initialized to be -1 to please
zclose so we do not need special handling dur prog close.
Passing -1 to the kernel, however, will cause loading error.
Passing btf_fd 0 to the kernel if prog->btf_fd is invalid
fixed the problem.

Fixes: 2993e0515b ("tools/bpf: add support to read .BTF.ext sections")
Reported-by: Andrey Ignatov <rdna@fb.com>
Reported-by: Emre Cantimur <haydum@fb.com>
Tested-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-21 22:22:17 +01:00
Davidlohr Bueso
d47d77c3f0 perf bench: Move HAVE_PTHREAD_ATTR_SETAFFINITY_NP into bench.h
Both futex and epoll need this call, and can cause build failure on
systems that don't have it pthread_attr_setaffinity_np().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181109210719.pr7ohayuwqmfp2wl@linux-r8p5
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:32 -03:00
Milian Wolff
9add8fe8e6 perf script: Share code and output format for uregs and iregs output
The iregs output was missing the newline at end as well as the leading
ABI output. This made it hard to compare the iregs and uregs values.
Instead, use a single function to output the register values and use it
for both, iregs and uregs, to ensure the output is consistent.

Before:

  perf  7049 [-01]  1343.354347:          1 cycles:ppp:
        ffffffffa7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
    AX:0x80000000    BX:0x0    CX:0x0    DX:0x7    SI:0xf    DI:0x286    BP:0xffff95bc8213a460    SP:0xffffacbf0ba97d18    IP:0xffffffffa7bc21cd FLAGS:0x28e    CS:0x10    SS:0x18    R8:0x2    R9:0x21440   R10:0x33816fb3b8c   R11:0x1   R12:0xffff95bc8213a460   R13:0xffff95bc8213a400   R14:0xffff95bc8213a400   R15:0x1  ABI:2    AX:0xffffffffffffffda    BX:0xffffffffffffffff    CX:0x7f84ad85798b    DX:0x560209699d50    SI:0x7ffe2c7a6820    DI:0x7ffe2c7a8c9b    BP:0x7ffe2c7a20d0    SP:0x7ffe2c7a2058    IP:0x7f84ad85798b FLAGS:0x206    CS:0x33    SS:0x2b    R8:0x7ffe2c7a2030    R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0xffffffffffffffff   R13:0xffffffffffffffff   R14:0xffffffffffffffff   R15:0xffffffffffffffff

  perf  7049 [-01]  1343.354363:          1 cycles:ppp:
        ...

After:

  perf  7049 [-01]  1343.354347:          1 cycles:ppp:
        ffffffffa7bc21ce perf_event_exec+0x18e (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7ead3 setup_new_exec+0xf3 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7cd7be5 load_elf_binary+0x395 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7e540 search_binary_handler+0x80 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f1aa __do_execve_file.isra.13+0x58a (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f561 do_execve+0x21 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7c7f596 __x64_sys_execve+0x26 (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa7a041cb do_syscall_64+0x5b (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
        ffffffffa840008c entry_SYSCALL_64+0x7c (/lib/modules/4.20.0-rc1perf-devel-05115-gc0bc98f76e39-dirty/build/vmlinux)
    ABI:2    AX:0x80000000    BX:0x0    CX:0x0    DX:0x7    SI:0xf    DI:0x286    BP:0xffff95bc8213a460    SP:0xffffacbf0ba97d18    IP:0xffffffffa7bc21cd FLAGS:0x28e    CS:0x10    SS:0x18    R8:0x2    R9:0x21440   R10:0x33816fb3b8c   R11:0x1   R12:0xffff95bc8213a460   R13:0xffff95bc8213a400   R14:0xffff95bc8213a400   R15:0x1
    ABI:2    AX:0xffffffffffffffda    BX:0xffffffffffffffff    CX:0x7f84ad85798b    DX:0x560209699d50    SI:0x7ffe2c7a6820    DI:0x7ffe2c7a8c9b    BP:0x7ffe2c7a20d0    SP:0x7ffe2c7a2058    IP:0x7f84ad85798b FLAGS:0x206    CS:0x33    SS:0x2b    R8:0x7ffe2c7a2030    R9:0x7f84ae55f010   R10:0x8   R11:0x206   R12:0xffffffffffffffff   R13:0xffffffffffffffff   R14:0xffffffffffffffff   R15:0xffffffffffffffff

  perf  7049 [-01]  1343.354363:          1 cycles:ppp:
        ...

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181107223437.9071-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:32 -03:00
Arnaldo Carvalho de Melo
0f7c2de5dd perf bpf: Reduce the hardcoded .max_entries for pid_maps
While working on augmented syscalls I got into this error:

  # trace -vv --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
  <SNIP>
  libbpf: map 0 is "__augmented_syscalls__"
  libbpf: map 1 is "__bpf_stdout__"
  libbpf: map 2 is "pids_filtered"
  libbpf: map 3 is "syscalls"
  libbpf: collecting relocating info for: '.text'
  libbpf: relo for 13 value 84 name 133
  libbpf: relocation: insn_idx=3
  libbpf: relocation: find map 3 (pids_filtered) for insn 3
  libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
  libbpf: relo for 8 value 0 name 0
  libbpf: relocation: insn_idx=1
  libbpf: relo for 8 value 0 name 0
  libbpf: relocation: insn_idx=3
  libbpf: relo for 9 value 28 name 178
  libbpf: relocation: insn_idx=36
  libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 36
  libbpf: collecting relocating info for: 'raw_syscalls:sys_exit'
  libbpf: relo for 8 value 0 name 0
  libbpf: relocation: insn_idx=0
  libbpf: relo for 8 value 0 name 0
  libbpf: relocation: insn_idx=2
  bpf: config program 'raw_syscalls:sys_enter'
  bpf: config program 'raw_syscalls:sys_exit'
  libbpf: create map __bpf_stdout__: fd=3
  libbpf: create map __augmented_syscalls__: fd=4
  libbpf: create map syscalls: fd=5
  libbpf: create map pids_filtered: fd=6
  libbpf: added 13 insn from .text to prog raw_syscalls:sys_enter
  libbpf: added 13 insn from .text to prog raw_syscalls:sys_exit
  libbpf: load bpf program failed: Operation not permitted
  libbpf: failed to load program 'raw_syscalls:sys_exit'
  libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
  bpf: load objects failed: err=-4009: (Incorrect kernel version)
  event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                       \___ Failed to load program for unknown reason

  (add -v to see detail)
  Run 'perf list' for a list of valid events

   Usage: perf trace [<options>] [<command>]
      or: perf trace [<options>] -- <command> [<options>]
      or: perf trace record [<options>] [<command>]
      or: perf trace record [<options>] -- <command> [<options>]

      -e, --event <event>   event/syscall selector. use 'perf list' to list available events

If I then try to use strace (perf trace'ing 'perf trace' needs some more work
before its possible) to get a bit more info I get:

  # strace -e bpf trace --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=4, map_flags=0, inner_map_fd=0, map_name="__bpf_stdout__", map_ifindex=0}, 72) = 3
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=4, map_flags=0, inner_map_fd=0, map_name="__augmented_sys", map_ifindex=0}, 72) = 4
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=1, max_entries=500, map_flags=0, inner_map_fd=0, map_name="syscalls", map_ifindex=0}, 72) = 5
  bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=512, map_flags=0, inner_map_fd=0, map_name="pids_filtered", map_ifindex=0}, 72) = 6
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=57, insns=0x1223f50, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_enter", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = 7
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=18, insns=0x1224120, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=18, insns=0x1224120, license="GPL", log_level=1, log_size=262144, log_buf="", kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
  bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=18, insns=0x1224120, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(4, 18, 10), prog_flags=0, prog_name="sys_exit", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = -1 EPERM (Operation not permitted)
  event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                       \___ Failed to load program for unknown reason
  <SNIP similar output as without 'strace'>
  #

I managed to create the maps, etc, but then installing the "sys_exit" hook into
the "raw_syscalls:sys_exit" tracepoint somehow gets -EPERMed...

I then go and try reducing the size of this new table:

  +++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
  @@ -47,6 +47,17 @@ struct augmented_filename {
   #define SYS_OPEN 2
   #define SYS_OPENAT 257

  +struct syscall {
  +       bool    filtered;
  +};
  +
  +struct bpf_map SEC("maps") syscalls = {
  +       .type        = BPF_MAP_TYPE_ARRAY,
  +       .key_size    = sizeof(int),
  +       .value_size  = sizeof(struct syscall),
  +       .max_entries = 500,
  +};

And after reducing that .max_entries a tad, it works. So yeah, the "unknown
reason" should be related to the number of bytes all this is taking, reduce the
default for pid_map()s so that we can have a "syscalls" map with enough slots
for all syscalls in most arches. And take notes about this error message,
improve it :-)

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-yjzhak8asumz9e9hts2dgplp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:32 -03:00
Milian Wolff
b07d16f7e9 perf script: Add newline after uregs output
This change makes it much easier to easily distinguish between
consecutive samples by keeping the empty line between them, like we see
when we do not enable uregs output.

Before:

  cpp-inlining 28298 [-01] 54837.342780:    3068085 cycles:pp:
              7ffff7c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
              ...
   ABI:2    AX:0x0    BX:0x40f56cf6    CX:0x294a3ae7    ...
  cpp-inlining 28298 [-01] 54837.344493:    2881929 cycles:pp:
              7ffff7c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
              ...
   ABI:2    AX:0x40d440c7    BX:0x40d440c7    CX:0x4d45e5da    ...

After:

  cpp-inlining 28298 [-01] 54837.342780:    3068085 cycles:pp:
              7ffff7c96709 __hypot_finite+0xa9 (/usr/lib/libm-2.28.so)
              ...
   ABI:2    AX:0x0    BX:0x40f56cf6    CX:0x294a3ae7    ...

  cpp-inlining 28298 [-01] 54837.344493:    2881929 cycles:pp:
              7ffff7c96696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
              ...
   ABI:2    AX:0x40d440c7    BX:0x40d440c7    CX:0x4d45e5da    ...

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20181107093705.16346-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
4aa792de0b Revert "perf augmented_syscalls: Drop 'write', 'poll' for testing without self pid filter"
Now that we have the "filtered_pids" logic in place, no need to do this
rough filter to avoid the feedback loop from 'perf trace's own syscalls,
revert it.

This reverts commit 7ed71f124284359676b6496ae7db724fee9da753.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-88vh02cnkam0vv5f9vp02o3h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
e312747b49 perf augmented_syscalls: Remove example hardcoded set of filtered pids
Now that 'perf trace' fills in that "filtered_pids" BPF map, remove the
set of filtered pids used as an example to test that feature.

That feature works like this:

Starting a system wide 'strace' like 'perf trace' augmented session we
noticed that lots of events take place for a pid, which ends up being
the feedback loop of perf trace's syscalls being processed by the
'gnome-terminal' process:

  # perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c
     0.391 ( 0.002 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f750bc, count: 8176) = 453
     0.394 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f75280, count: 7724) = -1 EAGAIN Resource temporarily unavailable
     0.438 ( 0.001 ms): gnome-terminal/2469 read(fd: 4<anon_inode:[eventfd]>, buf: 0x7fffc696aeb0, count: 16) = 8
     0.519 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f75280, count: 7724) = 114
     0.522 ( 0.001 ms): gnome-terminal/2469 read(fd: 17</dev/ptmx>, buf: 0x564b79f752f1, count: 7611) = -1 EAGAIN Resource temporarily unavailable
  ^C

So we can use --filter-pids to get rid of that one, and in this case what is
being used to implement that functionality is that "filtered_pids" BPF map that
the tools/perf/examples/bpf/augmented_raw_syscalls.c created and that 'perf trace'
bpf loader noticed and created a "struct bpf_map" associated that then got populated
by 'perf trace':

  # perf trace --filter-pids 2469 -e tools/perf/examples/bpf/augmented_raw_syscalls.c
     0.020 ( 0.002 ms): gnome-shell/1663 epoll_pwait(epfd: 12<anon_inode:[eventpoll]>, events: 0x7ffd8f3ef960, maxevents: 32, sigsetsize: 8) = 1
     0.025 ( 0.002 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8240, count: 8112) = 48
     0.029 ( 0.001 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8258, count: 8088) = -1 EAGAIN Resource temporarily unavailable
     0.032 ( 0.001 ms): gnome-shell/1663 read(fd: 24</dev/input/event4>, buf: 0x560c01bb8240, count: 8112) = -1 EAGAIN Resource temporarily unavailable
     0.040 ( 0.003 ms): gnome-shell/1663 recvmsg(fd: 46<socket:[35893]>, msg: 0x7ffd8f3ef950) = -1 EAGAIN Resource temporarily unavailable
    21.529 ( 0.002 ms): gnome-shell/1663 epoll_pwait(epfd: 5<anon_inode:[eventpoll]>, events: 0x7ffd8f3ef960, maxevents: 32, sigsetsize: 8) = 1
    21.533 ( 0.004 ms): gnome-shell/1663 recvmsg(fd: 82<socket:[42826]>, msg: 0x7ffd8f3ef7b0, flags: DONTWAIT|CMSG_CLOEXEC) = 236
    21.581 ( 0.006 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_BUSY, arg: 0x7ffd8f3ef060) = 0
    21.605 ( 0.020 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_CREATE, arg: 0x7ffd8f3eeea0) = 0
    21.626 ( 0.119 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_SET_DOMAIN, arg: 0x7ffd8f3eee94) = 0
    21.746 ( 0.081 ms): gnome-shell/1663 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_PWRITE, arg: 0x7ffd8f3eeea0) = 0
  ^C

Oops, yet another gnome process that is involved with the output that
'perf trace' generates, lets filter that out too:

  # perf trace --filter-pids 2469,1663 -e tools/perf/examples/bpf/augmented_raw_syscalls.c
         ? (         ): wpa_supplicant/1366  ... [continued]: select()) = 0 Timeout
     0.006 ( 0.002 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e430) = 0
     0.011 ( 0.001 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e3e0) = 0
     0.014 ( 0.001 ms): wpa_supplicant/1366 clock_gettime(which_clock: BOOTTIME, tp: 0x7fffe5b1e430) = 0
         ? (         ): gmain/1791  ... [continued]: poll()) = 0 Timeout
     0.017 (         ): wpa_supplicant/1366 select(n: 6, inp: 0x55646fed3ad0, outp: 0x55646fed3b60, exp: 0x55646fed3bf0, tvp: 0x7fffe5b1e4a0) ...
   157.879 ( 0.019 ms): gmain/1791 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: , mask: 16789454) = -1 ENOENT No such file or directory
         ? (         ): cupsd/1001  ... [continued]: epoll_pwait()) = 0
         ? (         ): gsd-color/1908  ... [continued]: poll()) = 0 Timeout
   499.615 (         ): cupsd/1001 epoll_pwait(epfd: 4<anon_inode:[eventpoll]>, events: 0x557a21166500, maxevents: 4096, timeout: 1000, sigsetsize: 8) ...
   586.593 ( 0.004 ms): gsd-color/1908 recvmsg(fd: 3<socket:[38074]>, msg: 0x7ffdef34e800) = -1 EAGAIN Resource temporarily unavailable
         ? (         ): fwupd/2230  ... [continued]: poll()) = 0 Timeout
         ? (         ): rtkit-daemon/906  ... [continued]: poll()) = 0 Timeout
         ? (         ): rtkit-daemon/907  ... [continued]: poll()) = 1
   724.603 ( 0.007 ms): rtkit-daemon/907 read(fd: 6<anon_inode:[eventfd]>, buf: 0x7f05ff768d08, count: 8) = 8
         ? (         ): ssh/5461  ... [continued]: select()) = 1
   810.431 ( 0.002 ms): ssh/5461 clock_gettime(which_clock: BOOTTIME, tp: 0x7ffd7f39f870) = 0
   ^C

Several syscall exit events for syscalls in flight when 'perf trace' started, etc. Saner :-)

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-c3tu5yg204p5mvr9kvwew07n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
a9964c432b perf trace: Fill in BPF "filtered_pids" map when present
This makes the augmented_syscalls support the --filter-pids and
auto-filtered feedback loop pids just like when working without BPF,
i.e. with just raw_syscalls:sys_{enter,exit} and tracepoint filters.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zc5n453sxxm0tz1zfwwelyti@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
744fafc787 perf trace: See if there is a map named "filtered_pids"
Lookup for the first map named "filtered_pids" and, if augmenting
syscalls, i.e. if a BPF event is present and the
"__augmented_syscalls__" is present, then fill in that map with the pids
to filter, be it feedback loop ones (perf trace's pid, its father if it
is "sshd", more auto-filtered in the future) or the ones explicitely
stated in the tool command line via --filter-pids.

The code to actually fill in the map comes next.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rhzytmw7qpe6lqyjxi1ded9t@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
6a0b3abad9 perf trace: Add "_from_option" suffix to trace__set_filter()
As we'll need that name for a new function to set filters for both
tracepoints and BPF maps for filtering pids.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mdkck6hf3fnd21rz2766280q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
7ad92a3371 perf evlist: Rename perf_evlist__set_filter* to perf_evlist__set_tp_filter*
To better reflect that this is a tracepoint filter, as opposed, for
instance to map based BPF filters.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9138svli6ddcphrr3ymy9oy3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
ed9a77ba77 perf augmented_syscalls: Use pid_filter
Just to test filtering a bunch of pids, now its time to go and get that
hooked up in 'perf trace', right after we load the bpf program, if we
find a "pids_filtered" map defined, we'll populate it with the filtered
pids.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1i9s27wqqdhafk3fappow84x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
77ecb64050 perf augmented_syscalls: Drop 'write', 'poll' for testing without self pid filter
When testing system wide tracing without filtering the syscalls called
by 'perf trace' itself we get into a feedback loop, drop for now those
two syscalls, that are the ones that 'perf trace' does in its loop for
writing the syscalls it intercepts, to help with testing till we get
that filtering in place.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rkbu536af66dbsfx51sr8yof@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
8008aab096 perf bpf: Add simple pid_filter class accessible to BPF proggies
Will be used in the augmented_raw_syscalls.c to implement 'perf trace
--filter-pids'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9sybmz4vchlbpqwx2am13h9e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
382b55dbef perf bpf: Add defines for map insertion/lookup
Starting with a helper for a basic pid_map(), a hash using a pid as a
key.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-gdwvq53wltvq6b3g5tdmh0cw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
66067538e0 perf augmented_syscalls: Remove needless linux/socket.h include
Leftover from when we started augmented_raw_syscalls.c from
tools/perf/examples/bpf/augmented_syscalls.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: e58a0322dbac ("perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit}")
Link: https://lkml.kernel.org/n/tip-pmts9ls2skh8n3zisb4txudd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
55f127b431 perf augmented_syscalls: Filter on a hard coded pid
Just to show where we'll hook pid based filters, and what we use to
obtain the current pid, using a BPF getpid() equivalent.

Now we need to remove that hardcoded PID with a BPF hash map, so that we
start by filtering 'perf trace's own PID, implement the --filter-pid
functionality, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oshrcgcekiyhd0whwisxfvtv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:31 -03:00
Arnaldo Carvalho de Melo
1475d35c4a perf bpf: Add unistd.h to the headers accessible to bpf proggies
Start with a getpid() function wrapping BPF_FUNC_get_current_pid_tgid,
idea is to mimic the system headers.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zo8hv22onidep7tm785dzxfk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-21 12:00:30 -03:00
Ingo Molnar
b1a9d7b019 Merge tag 'perf-urgent-for-mingo-4.20-20181121' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes:

- Update kernel ABI headers, one of them lead to a small change in
  the ioctl 'cmd' beautifier in 'perf trace' to support the new ISO7816
  commands. (Arnaldo Carvalho de Melo)

- Restore proper cwd on return from mnt namespace (Jiri Olsa)

- Add feature check for the get_current_dir_name() function used in the
  namespace fix from Jiri, that is not available in systems such as
  Alpine Linux, which uses the  musl libc (Arnaldo Carvalho de Melo)

- Fix crash in 'perf record' when synthesizing the unit for events such
  as 'cpu-clock' (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-21 15:57:21 +01:00
Rafael J. Wysocki
0db699f747 Merge tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower utility updates for 4.20-rc4 from Shuah Khan:

"This cpupower update for Linux 4.20-rc4 consists of compile fixes to allow
 use of outside build flags and override of CFLAGS from Jiri Olsa, and fix
 to compilation with STATIC=true from Konstantin Khlebnikov."

* tag 'linux-cpupower-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  tools cpupower: Override CFLAGS assignments
  tools cpupower debug: Allow to use outside build flags
  tools/power/cpupower: fix compilation with STATIC=true
2018-11-21 13:33:06 +01:00
Yonghong Song
254471e57a tools/bpf: bpftool: add support for func types
This patch added support to print function signature
if btf func_info is available. Note that ksym
now uses function name instead of prog_name as
prog_name has a limit of 16 bytes including
ending '\0'.

The following is a sample output for selftests
test_btf with file test_btf_haskv.o for translated insns
and jited insns respectively.

  $ bpftool prog dump xlated id 1
  int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
     0: (85) call pc+2#bpf_prog_2dcecc18072623fc_test_long_fname_1
     1: (b7) r0 = 0
     2: (95) exit
  int test_long_fname_1(struct dummy_tracepoint_args * arg):
     3: (85) call pc+1#bpf_prog_89d64e4abf0f0126_test_long_fname_2
     4: (95) exit
  int test_long_fname_2(struct dummy_tracepoint_args * arg):
     5: (b7) r2 = 0
     6: (63) *(u32 *)(r10 -4) = r2
     7: (79) r1 = *(u64 *)(r1 +8)
     ...
     22: (07) r1 += 1
     23: (63) *(u32 *)(r0 +4) = r1
     24: (95) exit

  $ bpftool prog dump jited id 1
  int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
  bpf_prog_b07ccb89267cf242__dummy_tracepoint:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    3c:   add    $0x28,%rbp
    40:   leaveq
    41:   retq

  int test_long_fname_1(struct dummy_tracepoint_args * arg):
  bpf_prog_2dcecc18072623fc_test_long_fname_1:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    3a:   add    $0x28,%rbp
    3e:   leaveq
    3f:   retq

  int test_long_fname_2(struct dummy_tracepoint_args * arg):
  bpf_prog_89d64e4abf0f0126_test_long_fname_2:
     0:   push   %rbp
     1:   mov    %rsp,%rbp
    ......
    80:   add    $0x28,%rbp
    84:   leaveq
    85:   retq

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Yonghong Song
999d82cbc0 tools/bpf: enhance test_btf file testing to test func info
Change the bpf programs test_btf_haskv.c and test_btf_nokv.c to
have two sections, and enhance test_btf.c test_file feature
to test btf func_info returned by the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Yonghong Song
d7f5b5e051 tools/bpf: refactor to implement btf_get_from_id() in lib/bpf
The function get_btf() is implemented in tools/bpf/bpftool/map.c
to get a btf structure given a map_info. This patch
refactored this function to be function btf_get_from_id()
in tools/lib/bpf so that it can be used later.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Yonghong Song
9ce6ae22c8 tools/bpf: do not use pahole if clang/llvm can generate BTF sections
Add additional checks in tools/testing/selftests/bpf and
samples/bpf such that if clang/llvm compiler can generate
BTF sections, do not use pahole.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Yonghong Song
2993e0515b tools/bpf: add support to read .BTF.ext sections
The .BTF section is already available to encode types.
These types can be used for map
pretty print. The whole .BTF will be passed to the
kernel as well for which kernel can verify and return
to the user space for pretty print etc.

The llvm patch at https://reviews.llvm.org/D53736
will generate .BTF section and one more section .BTF.ext.
The .BTF.ext section encodes function type
information and line information. Note that
this patch set only supports function type info.
The functionality is implemented in libbpf.

The .BTF section can be directly loaded into the
kernel, and the .BTF.ext section cannot. The loader
may need to do some relocation and merging,
similar to merging multiple code sections, before
loading into the kernel.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00