Graphics, Modesetting & Wayland Microconference Notes




Welcome to Linux Plumbers Conference 2015



The structure will be short introductions to an issue or topic followed by a discussion with the audience.


A limit of 3 slides per presentation is enforced to ensure focus and allocate enough time for discussions.



Please use this etherpad to take notes. Microconf leaders will be giving a TWO MINUTE summary of their microconference during the Friday afternoon closing session.



Please remember there is no video this year, so your notes are the only record of your microconference.



Miniconf leaders: Please remember to take note of the approximate number of attendees in your session(s).



SCHEDULE



Atomic modesetting status and next steps
Live sources and sinks
ION feature upstreaming
HWComposer and Wayland
Wayland status update



Atomic modesetting status (Daniel Vetter, Rob Clark)
---------------------------------------------------------------------

Upstream and enabled by default since 4.3

Async commits
  - complicated to implement in generic code atm
  - implemented in each driver - Intel (WIP - Maarten Lankhorst), Tegra (mainline), Exynos (WIP - Gustavo Padovan), msm (mainline)

Plan for generic fence (in & out) handling:
    Should async flips/modesets use fences as default return?
    - <Mostly people nodding>
    - Using a different flag on the modeset or the flip (to specify event or fence return)
        - Q: do you ever want both a fence and an event back?
              - A: Depends on what the userspace use case wants
                    Aso need opensource userspace
                        - Comment: All nexus devices have opensource hwcomposer
                             - but we have to wait for them to ship, which is usually a bit later
                - Want to make sure its possible, there will probably be cases in the future that need more info then a fence pt
                
    - vblank and IRQ handling is a bit mixed up, needs a cleanup
        - deferring fence for vblank cleanups may take too long 
        - Paulo Zanoni working on this, but v complicated
        - complicated by legacy UMS drivers
    - per plane fences
    - events need extending anyway, e.g. for CRTC ID
      - someone (nvidia?) reminded that we may need events in addition to carry timestamp back to userspace (related to SGI_video_sync/SGI_swap_control/OML_sync_control?)

Writeback (live sources + sinks) still missing
      - covered by Laurent later
      - permission issues (arbitrary users capturing screen content)
      - WIP patches exist from Qualcomm to pipe through to V4L2
      Why writeback just one plane?
        - Not sure of the usefulness yet, open to suggestions
        - Mali display processor can do it, but haven't found a use for it
         - One usecase is to see if the frame is frozen, need 60 frames per sec
             - compare different frames to make sure its not frozen
         - some hardware has scaling and colourspace conversion done in a different unit that can't display but looks like a display controller with one plane
         - Exynos GSC is odd, and probably shouldn't have been merged upstream in kms
               - possibly been broken for 6-7 releases now

Using properties to extend atomic
  - colour manager implementation (WIP - Kausal - Intel)
  - alpha blending (e.g. premultiplied or not) and Z-order properties
  - buffer compression
  - all of these require open-source userspace to prove the concept
    - open HWComposer implementations do help this
      - Android is the most sophisticated use case, but its also the furthest away from mainline
           - chrome and wayland are getting closer
               - wayland has some naive implementations since need to know the details of the tradeoffs of actual hardware
      - two things to help the most to get android using kms:
             - sync support, all vendors use. ability to return fences is critical
             - custom pixle formats for compression
             - complexity would still be concerning, but probably could be done
             
           - problem is none of the open userspace stacks use fences
Further ecosystem/community work
  - need real open HWComposer implementation
  - clean up and extend docs
    - jazzed-up kerneldoc landing for 4.3/4.4
    - https://01.org/linuxgraphics/gfx-docs/drm/ regenerated nightly
    - please add manpages to libdrm!
  - more drivers converted to atomic
    - currently i915, tegra, msm, rcar-du (renesas) mostly landed; exynos queued; rockchip WIP
  - intel-gpu-tools testsuite for compliance-style checking
    - currently i915-specific but being ported to be generic
    - http://cgit.freedesktop.org/xorg/app/intel-gpu-tools


Live sources and sinks in KMS (Laurent Pinchart)
----------------------------------------------------------------

WIP patchset available for live sources (e.g. streaming media devices)
  - http://lists.freedesktop.org/archives/dri-devel/2015-April/081200.html
  - new 'live source' object created referring to a single stream, ioctls available to enumerate sources
  - new flag to create KMS framebuffer objects from live source object
  - then use framebuffer to display on plane
  - current patchset can use V4L2 sources, but not intrinsically tied to V4L
  - enumeration and creation of live source objects may require platform-specific knowledge
  - cross-process difficulties for passing identifiers

Security: same process needs to 'own' both ends of live sources and sinks, i.e. V4L2 and KMS
    - producer should be in control of the pipeline, e.g. for writeback need to request V4L2 sink node from KMS
    - initiating externally introduces security issues

Timestamp co-ordination
    - true live source means that timestamps are unlikely to be meaningful
    - anyone co-ordinating timestamps will require userspace mediation, e.g. compositor, so back to frame-based rather than live model
    - frame lateness: no support in V4L2 API for this; need feedback from kernel/hw
    - expose latency for hardware pipeline?
    - partly solved by KMS atomic API
    - mismatch between frame-based model (post-hoc feedback to calculate per-frame latency after the fact) and streaming model (fixed latency calculated up front)


HWComposer and Weston (Anand Balagopalakrishnan)
-------------------------------------------------------------------------

- targeted at automotive, with heavy usage of hardware planes (10-12 layers per app)
- specific automotive usecases, e.g. start-up time to showing camera requirements
- EGLImage insufficient for latency
- co-ordination between different processes, repeatedly dropping and claiming DRM master
  - e.g. multi-CRTC with one head per application
  - did have infrastructure for handling this upstream, but was dropped due to lack of interest
    - commits 9c7060f7e3b09837621f93bd8666cf4cfac45001 and 3fdefa399e4644399ce3e74e65a75122d52dba6a
  - some usecases potentially candidates for live sources, e.g. MCU co-processor controlling camera
- port of gralloc to sit on top of GEM rather than ION
- HWComposer uses this to create ideal HW setup for non-GPU composition
  - HWC considerations: pixel format (e.g. YUV colourspace conversion), layer size, frame rate, z-order, transparency (inc. global vs. per-pixel)
- current integration with Weston does not support mixing 2D/KMS/plane composition, and 3D/GPU composition
  - new compositor-*.c backend implementation
  - explicit vs. implicit fencing/sync - just use explicit w/ atomic modesetting
  - add support for GPU composition as well, then release source
  - is there sufficient interest in this? yes
- HWComposer API in generic userspace
  - uses dmabufs rather than ION/etc specific
  - still requires gralloc-specific flags
  - 


ION feature upstreaming (Sumit Semwal)
------------------------------------------------------

- ion is a buffer allocator, etc
- features to upstream
  - allocation
    - what does ion not do well? userspace required to understand constraints of all devices; works relatively well for static platform configurations, not good for upstream
    - what would be better? deal with constraints without requiring userspace to know the entire layout
    - need to define and share/expose device constraints in the kernel, use these constraints to choose allocator
    - two ways to do this: migrate pages, or delay allocation until all devices attached
      - transparent page migration (when further constraint is exposed) would allow to skip deferring attachments
        - caching/physical pinning makes this difficult to implement: not feasible
      - delayed allocation difficult for android: each application may only deal with one device in the chain
  - sharing: done with dmabuf
  - ion abuses dma sync apis without creating a mapping
    - could perhaps replace uncached with cma? maybe, but corner cases fall down
    - 'ion stops trying to do anything special with coherency'
- cenalloc: a centralised device constraint sharing and allocator framework
  - constraint sharing seems to be stalled upstream
  - allocator depends on constraint sharing
- constraint sharing
  - is this a good idea?
  - can it co-exist with ION? should definitely be able to
  - possibly build up smaller / more targeted api first, especially wrt coherency and caching
    - changing the DMA API is quite costly
- allocation
  - upstream status of ION unknown; will see in Android MC
    - moving out of staging should have open userspace
  - more users for the ION interface=
  - patches to add CMA-backed ION heap & associate with separate struct device
  - ION API is essentially internal/non-frozen for Android; gralloc is the fully-baked/exposed API
    - much more difficult for non-external trees, still hard to break kernel/userspace ABI
    - but some external users call ION directly
  - ION DT API difficult and still unresolved
    - massive rathole into 'what is DT' ...
  - potential integration with Benjamin Gaignard's secure memory allocation framework (patches on dri-devel etc)
- coherency
  - issues with DMA API
    - same device can have multiple views (wrt coherency) into the same buffer
    - assumes x86 is always fully coherent - this isn't actually true at all
    - simple sync API would be really nice
      - but vastly complicated by GPU usecases and devices with separate/per-context IOMMUs
    - V4L2 might require interface changes to correct attach/map order w/ dmabuf API


Wayland status (Bryce Harrington)
---------------------------------------------
 
  - Wayland/Weston 1.9
    - core fairly stable/mature - been very quiet for a few releases now
    - few bugs remaining before release can stabilise
      - race with single buffer attached to multiple surfaces; some ideas being kicked around but need to be careful as it may break existing expectations
      - either disallow single buffers being committed to multiple surfaces or refcount buffer releases
    - EGL vs. damage
      - surface vs. buffer co-ordinates - need to adjust for surface/buffer transforms (rotation, scaling), but EGL doesn't know about those
      - need new protocol to specify damage in buffer co-ords
      - currently all implementations just declare the entire buffer to be damaged
    - another race with wl_display object when using separate event queues
      - need atomicity of creating callback objects and setting event queue
      - potential ahead-of-time proxy wrapper
      - still unresolved, RFCs on wayland-devel@
    - pointer-lock / relative pointer support
      - required for games (particularly FPS)
      - RFC on wayland-devel@
  - xdg-shell
    - protocol for desktop functionality (akin to EWMH/NetWM)
    - needs more wide adoption by separate desktop environments (not just GNOME)
  - Weston 1.9
    - wl_scaler protocol
      - allow compositor to perform scaling
    - presentation timing feedback
      - allows clients to find out exactly when buffers were displayed on-screen
        - implemented and merged
      - ahead-of-time buffer queueing split out
    - atomic modesetting support
      - patches are WIP (about half merged) on wayland-devel@
    - generic dmabuf protocol
      - initial protocol just merged, with support for EGL and KMS import
    - libweston
      - allow external programs/users to make use of Weston compositor implementation
      - RFC patches on list, ABI not stable but initial work underway; possibly exposed for 1.10
  - release timeline
    - alpha released earlier this week
    - beta coming ~early Sep
    - RCs and release late Sep
  - review queue
    - currently review takes too long; a lot falls between the cracks
      - working to fix this
  - future
    - stabilising generic xdg-shell
    - screensaver inhibition (e.g. for media players)
    - XWayland enhancements - needs a maintainer ...
  - does weston support RDP/VNC?
    - yes