Development Tools Microconference Notes
Welcome to Linux Plumbers Conference 2015
The structure will be short introductions to an issue or topic followed by a discussion with the audience.
A limit of 3 slides per presentation is enforced to ensure focus and allocate enough time for discussions.
Please use this etherpad to take notes. Microconf leaders will be giving a TWO MINUTE summary of their microconference during the Friday afternoon closing session.
Please remember there is no video this year, so your notes are the only record of your microconference.
Miniconf leaders: Please remember to take note of the approximate number of attendees in your session(s).
Wifi SSID: Exhibitor Internet Login: linux Password: seattle
SCHEDULE
- Linux Kernel Debugging Tools Overview
- Shuah Khan, Samsung
- Timing issues
- prints not effective for this, because the prints change the timing
- tools to see system state
- magic sysrq
- can give you state on system hangs
- debugging tools
- gdb
- kdb
- dynamic probes
- kmemcheck
- kernel address sanitizer
- ARM64 lags some other architectures, but Linaro has backports...generally similar capabilities across all architectures
- kernel debug interfaces
- debug config options-- compile time
- debug APIs
- dynamic debug-- control via sysfs: /sys/kernel/debug/dynamic
- tracepoints
- used for tracing for debug, event reporting, performance accounting
- enable at runtime
- git bisect - isolate a bad commit
- config options
- lock debugging
- debug lockups/hangs
- read-copy update (rcu) debugging
- memory debugging
- debug modules
- e.g. test_firmware, test_bpf
- debug tools (info under ./Documentation)
- new "thread sanitizer" from team that created kernel address sanitizer -- finds data races
- comment: need more tools like Trinity system call fuzzer to expose bugs
- debugging techniques: http://lwn.net/images/pdf/LDD3/ch04.pdf
- http://elinux.org/Debugging_Portal
- There was a question about IOMMU problems found using IOMMU tracing: http://blogs.s-osg.org/how-can-iommu-event-tracing-help-you/ for more nformation on this.
· Speaker: Shuah Khan from Samsung.
· Abstract: https://linuxplumbersconf.org/2015/ocw/proposals/2643
· This is a 20 minute talk
· Debugging methods
o Redirecting console
· Net console has limited use
o Enable early printk right away to get messages from early boot sequence
o Assuming system boots up, but you want to see if system state is good, can
poke with lots of tools
· Lspci, lsusb to look at devices
o Panics:
· Oops messages are helpful
o System hangs - SysRq key helps you get out of the state to see debug info
o Races/timing:
· Note what device, what usage
o e.g. Video application tuning into TV channel & see streaming errors
· Are there changes in device driver?
· Are there new features added?
· What are the conditions?
· Debugging tools
o How do you look into oops message?
· Use GDB to investigate
o KDB needs to be configured
· She doesn't use it so much. More GDB
· Low level debugger that can poke at addresses directly
· KGDB sits on top of KDB to do more complex things
o Kernel address sanitizer
· There was a talk yesterday on it
· It's a new tool added to 4.0 kernel that will find memory errors
· Will find things like memory use after free
· Do you do ARM & x86?
o Work on both
o ARM-64 kernel was available before KGDB
· Linaro has done some back-ports so that the major platforms do have
back-ends in the kernel
· Kernel Debug interfaces
o Tend to user in-kernel debug interfaces when debugging drivers
· e.g. for DMA will enable DMA debug
o Kernel has a lot of kernel debug config options
· Grep on your config file & see what are turned on by default
· Downside is that these are compile-time so may need to re-compile to
have them
§ Your race condition might not reproduce if you re-compile
o If you're adding a new driver, you want to enable debug options up front
o Dynamic debug can be enabled selectively to trigger print message or debug
message for a specific module
· She does it with her x86 system with Intel MEI driver when needs to
debug that
· Good is that you can trigger on per-module basis or specific debug
message, if you are anticipating a specific event code or init code
· Remember to make it persistent if you want it to persist across
reboots. Default is not persistent.
o Tracepoints
· Even with dynamic debug, you have the code already in there
· Advantage of tracepoints is that you don't incur the overhead of
having the code compiled in. It will be like code branch modification.
· However, don't have as wide trace points support across the kernel
· She will use this heavily when debugging IOMMU type issues & likes to
use it when it is available
· Can enable/disable at runtime, so don't need to recompile or install
new image.
· Can do stuff like find out whenever when move from host to VM & VM to
host
· Can trigger on a particular event
· This helps to give more granularity instead of just sifting through a
ton of messages
· Git bisect to isolate bad commit
o She uses this a lot
· Kernel debug configuration options
o Favourite: Lock debugging when starting on new work to get a baseline for
what issues already exist
o Some of these are needed for kmemcheck to work
· Complaint that enabling some debug makes life worse
o Sometimes debug code has bugs too…
o Make sure your code runs both with & without a certain option enabled
· Kmemcheck
o She gives some options that you need
· Kernel Address Sanitizer
o There are other sanitizers too: Thread sanitizer, user space tools (are
quite mature), now moving towards compile-time instrumentation
- All We Like Sheep: Copy/Paste as a Principled Engineering Tool
- Mike Godfrey, University of Waterloo
- code cloning - making copies of code (within a program?)
- types
- token for token is identical
- literals/identifiers are different
- gaps, sections may be different
- problem: code is copied, but changes may not be reflected in all copies
- is code cloning a good idea?
- no, a bad idea
- sloppy design
- inconsistent maintenance
- sometimes
- forking- sometimes makes sense
- hardware variations..e.g Linux SCSI drivers
- templating
- post-hoc customizing
- study: Apache, Gnumeric ... sometimes useful
· Abstract: https://linuxplumbersconf.org/2015/ocw/proposals/2673
· Speaker: Mike Godfrey, professor at Uwaterloo (Software analytics)
· Doesn't write code, but grinds it up & analyzes it
· All we like sheep - from Handel's Messiah. Sheep = Dolly
· What is software clones & clone detection? (audience game)
o Example 1: Why are there 2 different copies of the startup routine in
different files?
o Example 2: This is a more reasonable case of nearly identical conversion
routines, but they are unlikely to vary from each other & they are all in the
same file
o Example 3: const pointer vs. pointer to const fixed in 1 place, but not
everywhere ("Inconsistent maintenance")
· What's a clone:
o The tools have different assumptions - depends on the tolerances &
algorithms. Thus need a particular definition of similarity.
· Bellon's taxonomy
o Lists out the different types of similarity
· Code clone detection methods
o Either you have a stream or a graph
o Problem is that as you become more language dependent, it gets more
complicated
o So, there are some other methods
· Lightweight semantics: Look at callers & what get called only to see
what overlap you get. Then do a more precise hand-analysis.
· Hashes are good for exact matches, but sometimes you need some
flexibility
· When is cloning a good idea?
o There are big authorities that say it's always bad
o Myth: Code cloning is bad in the long run
· Down sides
§ Code bloated
§ Ossifies
§ Inconsistent maintenance (fixing the code in all the copied places)
· Fix it by refactoring to the good design.
· XP's rule of 3: (XP = Extreme Programming) If you do something 3
times, you've discovered an abstraction & should spend some time on it
· Is inconsistent maintenance really a problem?
§ Things are changed inconsistently 1/2 the time, many times they intend to
diverge
· Is cloned code buggier?
§ Most bugs have little to do with clones. Cloned code was typically less buggy
· Linux SCSI driver cloning example
§ Does cloning predict pcompatible bus type dependencies?
· 10% for random
· 50% for vendor
· Even higher for cloned
· Is cloning considered harmful?
o Paper: "Cloning considered harmful considered harmful"
· When you de-clone, you may lose some information
· Sometimes you have specific design goals in mind when you do clones
· They came up with about 10 patterns in 3 groups
§ Forking: Want to do it similarly, but somewhere else & don't want a ton of
ifdefs (e.g. Linux SCSI drivers)
· Works well when commonalities of end solutions are unclear (e.g.
platform variation) & the cloning is obvious and well documented
§ Templating: Similar, but not similar enough to make 1 copy with parametrized
differences
§ Post-hoc customizing: their "misc" bucket
· If you don't own the code
· e.g. "clone & own" from Microsoft Research
· Cloning is often useful in the long run if you understand the
technical benefits
-
- Coccinelle
- goal: help developers scan and transform a large legacy C code base
- applications
- bug finding
- bug fixing
- code modernization - see repetitive code, want to introduce new API, e.g. devm APIs
- code metrics - how often is function used in different contexts
- semantic patches
- · Abstract: https://linuxplumbersconf.org/2015/ocw/proposals/2703
- · Speaker: Julia Lawall, Inria
- · There will be a 2-hr tutorial as part of KVM forum & also posted to
- the web
- · Bug fixing: Humans are error prone. You want a tool to fix it
- everywhere.
- · Many tools exist, but have different focus (e.g. using the most exact
- technology). Coccinelle wants to be accessible to C developers
- · Example of !E & C: The search is run on the Linux kernel regularly
- nowadays as part of 0-day testing, since this is a common mistake
- · Example: Memzero_explicit
- o You don't want the compiler to get rid of the memset because you don't want
- someone to be able to see the sensitive information
- o But, you don't want to apply your fix to all memsets in the kernel
- o We only want to handle case of the local variable that's an array, with
- memset at the end of a function
- o "…" will tell Coccinelle that you have something before & after, then the
- "…" is the shortest distance between them
- o "…" will follow through all execution paths (i.e. both if & else branches)
- o What about false positive?
- · In the example ev is the alias to buf, so we don't need to transform
- the memset
- · Coccinelle focuses on the syntax (the name) and doesn't realize it's
- used in a different way. So, we can refine the semantic patch by saying there
- isn't another expression referring to the array in some other way
- · The cast in the example is not required
- · Comment: Seems very similar to clang but clang doesn't support the
- kernel sources yet
- · More advanced example: new feature of Conccinelle 1.0.2 - removing
- assignments from function call
- o Audience thinks it's generally frowned upon
- o @S: recording the statement
- o @p: recording the position
- o Next rule inherits some rules from the above
- o Down side is that it solves the problem, but not necessarily the same order
- o Down side is that after a transformation, all of the positions become
- invalid
- o Coccinelle was able to add braces in if statement that became >1 statement
- long
- · Idea is you should be able to take an existing patch and then
- generalize it to use over your whole code base
- o You should still ensure that it did what you intended
- · Q: Once you use Cocinelle script, do you inspect the result manually?
- o A: Coccinelle just genereates a patch for you & then you can apply the
- patch to your code. There is an in-place option, but recommend to inspect
- · Q: What's being run in the 0-day tests?
- o A: Some go straight to dev, some get sent to her & she approves & fwds
- o Audience comment: Should be able to search on kbuild mailing list
-
- Using clang static analyzer with the Linux kernel code
- · Abstract: https://linuxplumbersconf.org/2015/ocw/sessions/3231
- · Speaker: Behan Webster, LLVMLinux project lead
- · Differs between using with kernel instead of user space code
- · The static analyzer doesn't "understand" the code b/c then would have
- to have big chunk of code from compiler
- · Clang & LLVM are tool-ized (in a library), so you can build other
- tools from them. Not all compilers are like this.
- · Clang static analyzer only looks at one .o file at a time
- · See slide for clang-analyzer website link
- · Clang static alanyzer is part of clang compiler itself
- · You can turn on & off certain checks. There are a lot of checkers out
- there, but many (about 75%) are user-space-specific.
- o Unix API: Linux is not exactly like Unix, but they kept it enabled b/c you
- can get some interesting info from this checker
- · Q: When it makes assumptions, is there any record of those
- assumptions?
- o A: Yes
- · Some of the checkers are not bounded in time - could run forever.
- Usually 2-4x longer compile time.
- · Every tool has its strengths & weaknesses, so don't use this instead
- of another tool. Use the tools together.
- · The original ouptut is not readable, so you use scan-build to get a
- directory with HTML files that you can go through interactively
- · Difficulties w/ the kernel
- o Can't run scan-build make w/ kernel b/c doesn't compile with clang
- o Can't use released clang (3.6, 3.7) - must use mainline tip clang
- o Ccc-analyzer issue: not being tested w/ Linux kernel and the kernel
- contains a lot of corner cases
- · Demo
- o The instructions don't work if followed verbatim. Maybe in a week b/c need
- to upstream a patch.
- o Vexpress test environment will build a QEMU test environment
- o Path length: # of steps it thinks it takes to get to the error
- o There are some parse error problems that are hidden in the slide - it was
- working last Plumbers
- · Clang & LLVM are written in C++, so the checkers are written in C++
- · Q: How do you compare clang with sparse or smatch?
- o A: Sparse happened b/c we didn't have a static analysis tool at the time.
- It pre-dates Coccinelle. It was trying to be like check-patch & find some
- common problems. It's very simplistic. Coccinelle has a more complete
- understanding of how the code fits together.
- o A: Don't know much about Smatch. Smatch can find logic issues & going
- outside of buffers. It might be built on top of sparse…
- · Q: You can put your checkers on that page. Is there a way to dump a
- simplified model of your code (like a graph)?
- o A: Not with clang static analyzer, but there are other tools e.g. Creduce
- · Q: What is LLVM?
- o A: It's a Toolchain Toolkit, modular set of libraries for building tools.
- It's also a toolchain suite built from the aforementioned libraries. It
- includes clang. GCC uses GPL, which doesn't work for many business models. GCC
- was also architected to not easily be libtoolized. LLVM has a BSD-like license
- so you can build your tools around it, including proprietary.
- o A (Costa): Some things you can also do in GCC, but with GCC it's more
- difficult. LLVM is a simpler code base to hack on & easier to get code into
- LLVM - CI model with reverting once you have submitted a few patches
- successfully.