4 min read

Using LLVM for seamless interop with Clang languages

by Vihan Bhargava
  • Compilers
  • Objective-C
  • iOS
  • Clang

Recently I’ve been working on a project, VSL which is a bare-metal, high-level, compiled language. The problem is with new languages, particularity obscure ones made in someone’s spare-time is it…

LLVM is the magic glue that will let us easily implement interop.
LLVM is the magic glue that will let us easily implement interop.

Recently I’ve been working on a project, VSL which is a bare-metal, high-level, compiled language. The problem is with new languages, particularity obscure ones made in someone’s spare-time is it lacks a community or ecosystem. On the contrary you have languages such as C, C++, Swift, and Objective-C which have passionate communities and libraries.


#How can this be fixed?

Interoperability is the approach VSL has choosen. They are a couple languages that have native interoperability with other languages, first ones coming to mind are D, and Swift. D takes a more ABI-compatiblity approach while Swift uses the same runtime as the language it interfaces with. How can two distinctly different languages communicate? We can thank LLVM for this.

#Approach

Just to be familiar with VSL syntax before I start. Something like this (VSL):

public class UIViewController {
    public func viewDidLoad() external(symbol_name);
}

declares a function UIViewController with method viewDidLoad . viewDidLoad ‘s body however is ‘external’ so the behavior of it will be some other function (named symbol_name ) that we’ll link to.


They are a couple things which we’ll note that should work:

  • Dynamic Dispatch
  • alloc/dealloc overrides
  • super interaction
  • Subclassing & Implementation
  • Type bridging e.g. NSString to a VSL String (I won’t discuss this but it’s relatively trivial to add)

All of these need to work in both languages. Now the basic idea behind using LLVM is generating the appropriate calling code by bridging between both ABI/calling convention differences. Let’s try declaring a function that allows a C-conformant language to call the Objective-C UIView#addSubview method:

#include <UIKit/UIKit.h>

extern "C" {
    __attribute__((always_inline))
    void UIView_addSubview(UIView* uiView, UIView* view) {
        [uiView addSubview: view];
    }
}

Compiling this (with the vsl-objc-bindgen tool) we get:

(with optimizations to remove dead code)

This way we let clang do all the heavy lifting and get us our calling code. (the #0 refers to the alwaysinline attribute and a few others). With our calling code, we see that Clang has generated the messaging to the Objective-C functions. Now let’s write a VSL class that implements this:

public class UIView: OpaquePointer {
    public func addSubview(view: UIView) external(UIView_addSubview)
}

func main() {
    let view: UIView = UIView()
    let subview: UIView = UIView()
    view.addSubview(subview)
}

VSL here generates an external method for the subview as:

declare void @UIView_addSubview({}*, {}*) local_unnamed_addr

define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr {
entry:
  %2 = tail call i8* @malloc(i64 0)
  %3 = bitcast i8* %2 to {}*
  tail call void @UIView_addSubview({}* %3, {}* %3)
  ret i32 0
}

Now once we link all of these together we get code that calls the Objective-C method!

We’ll run the following build commands:

$ vsl-objc-bindgen compile ios iphonesimulator UIView.addSubview.m -F UIKit
$ vsl build UIView.addSubview.vsl -S -o UIView.addSubview.vsl.ll -T x86_64-apple-ios11.3.0
$ llvm-link artifacts/vsl-x86_64-apple-ios11.3.bc UIView.addSubview.vsl.ll -o=- | opt -o=- -O2 | llvm-dis
define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 {
entry:
  %2 = tail call i8* @malloc(i64 0)
  %3 = tail call i8* @malloc(i64 0)
  %4 = bitcast i8* %3 to {}*
  %5 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_, align 8, !invariant.load !9
  tail call void bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to void (i8*, i8*, {}*)*)(i8* %2, i8* %5, {}* %4)
  ret i32 0
}

Looking at this you’ll notice we’re doing @malloc(i64 0) which will give us a null pointer so this code will immediately crash. That is undesirable so we will now need to interface our initializer. We can do that by doing a similar thing, this time calling [[UIView alloc] initWithFrame: cgRect] . For reference, CGRect has size 32 so we’ll use a UInt32 in VSL so we don’t need to bridge the CGRect struct yet. Let’s write the code for this:

extern "C" {
    __attribute__((always_inline))
    UIView* UIView_initWithFrame(CGRect frame) {
        return [[UIView alloc] initWithFrame: frame];
    }
}

Note: CGRect is a ‘struct’ but it actually refers to struct CGRect*

And again compiling this we get:

define %0* @UIView_initWithFrame(%struct.CGRect* byval align 8) #0 {
  %2 = alloca %struct.CGRect, align 8
  %3 = load %struct._class_t*, %struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_", align 8
  %4 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_, align 8, !invariant.load !9
  %5 = bitcast %struct._class_t* %3 to i8*
  %6 = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %5, i8* %4)
  %7 = bitcast i8* %6 to %0*
  %8 = bitcast %struct.CGRect* %2 to i8*
  %9 = bitcast %struct.CGRect* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %8, i8* align 8 %9, i64 32, i1 false)
  %10 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_.2, align 8, !invariant.load !9
  %11 = bitcast %0* %7 to i8*
  %12 = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*, %struct.CGRect*)*)(i8* %11, i8* %10, %struct.CGRect* byval align 8 %2)
  %13 = bitcast i8* %12 to %0*
  ret %0* %13
}

We’ll create a external call in VSL:

public typealias CGFloat = Double
public typealias CGRect = Pointer<UInt32>

public func CGRectMake(x: CGFloat, y: CGFloat, width: CGFloat, height: CGFloat) -> CGRect external(CGRectMake)

public class UIView {
    public init(with frame: CGRect) external(UIView_initWithFrame)

    // ...
}

Now we’ll call the initializer along with our previous setSubview :

func main() {
    let cgRect = CGRectMake(x: 0, y: 0, width: 0, height: 0)

    let view = UIView(frame: cgRect)
    let subview = UIView(frame: cgRect)
    view.addSubview(subview)
}

(we can create bindings for the first two lines but right now we’ll just create a CGRect with all fields as zero). Compiling this again now we get:

declare {}* @UIView_initWithFrame({}*) local_unnamed_addr

declare void @UIView_addSubview({}*, {}*) local_unnamed_addr

declare {}* @CGRectMake(i64, i64, i64, i64) local_unnamed_addr

define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr {
entry:
  %2 = tail call {}* @CGRectMake(i64 0, i64 0, i64 0, i64 0)
  %3 = tail call {}* @UIView_initWithFrame({}* %2)
  %4 = tail call {}* @UIView_initWithFrame({}* %2)
  tail call void @UIView_addSubview({}* %3, {}* %4)
  ret i32 0
}

Joining this with the previous LLVM results we can finally get a full UIView example:

; Function Attrs: alwaysinline ssp uwtable
define %1* @UIView_initWithFrame(%struct.CGRect* byval nocapture readonly align 8) local_unnamed_addr #0 {
  %2 = load i8*, i8** bitcast (%struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_" to i8**), align 8
  %3 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_, align 8, !invariant.load !9
  %4 = tail call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %2, i8* %3)
  %5 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_.2, align 8, !invariant.load !9
  %6 = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*, %struct.CGRect*)*)(i8* %4, i8* %5, %struct.CGRect* byval nonnull align 8 %0)
  %7 = bitcast i8* %6 to %1*
  ret %1* %7
}

define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr {
entry:
  %2 = tail call {}* @CGRectMake(i64 0, i64 0, i64 0, i64 0)
  %3 = tail call {}* bitcast (%1* (%struct.CGRect*)* @UIView_initWithFrame to {}* ({}*)*)({}* %2)
  %4 = tail call {}* bitcast (%1* (%struct.CGRect*)* @UIView_initWithFrame to {}* ({}*)*)({}* %2)
  tail call void @UIView_addSubview({}* %3, {}* %4)
  ret i32 0
}

So as you can see in the above we have been able to interop between VSL and Objective-C without any runtime overhead. In the above example we don’t do memory management but you can link deintializers with [object release] calls and a little work. Additionally I haven’t discussed how to implement super and/or dynamic dispatch but this is the core of what happens. For that things get a little more comple

#Dynamic Object Orientation

I’m going to discuss how to solve dynamic dispatch first because it’ll lend its way to implementing super and subclassing support. You might be thinking we should try attempting static dispatch however you’re either going to:

  1. be writing in your target language and therefore you won’t have overhead
  2. be writing in the source language and therefore it doesn’t have knowledge of your subclass

An example is the presentViewController method. It’ll take a UIViewController not a subclass so we should just bind it. The problem is Objective-C and C++ can have rather complex implementations of things such as dynamic dispatch and vtable calls which is why we need to write some code to avoid going through that mess.

Let’s start with a basic example:

@dynamic(static)
public class UIViewController: OpaquePointer { /* bridged implementation */ }

public class MyViewController: UIViewController {
    public let myField: String = "hello"
    public func viewDidLoad() {
        print("yass view loaded")
    }
}

func presentMyView(from controller: UIViewController) {
    let myViewController = MyViewController(nibName: OpaquePointer.null, bundle: OpaquePointer.null)
    controller.presentViewController(controller, animated: true, completion: OpaquePointer.null)
}

The dynamic(static) annotation may seem like an oxymoron but it says that the class uses static inheritance this will be important later on.

So in order to do this we will need to create an abstraction layer that lets us handle dynamic dispatch in both VSL and Objective-C. The structure for MyViewController along with its vtable would therefore look like (psuedo-C):

struct UIViewController.vtable {
    void(*viewDidLoad)();
} __attribute__((__aligned__(8)));

struct MyViewController.type {
    struct UIViewController.vtable *id; // May not be needed
    VSLString *myField;
}

Using this we can create an ‘intermediate’ UIViewController that can both 1) store the VSL fields 2) be able to map to a clang type. Let’s see how a VSL to clang mapping would work:

@interface VSLViewController : UIViewController
- (instancetype)initFromData:(void*)data;
@end

struct UIViewController_vtable {
    void(*viewDidLoad)(UIViewController*);
} __attribute__((__aligned__(8)));

@interface VSLViewController () {
    void* vslData;
}

@end

@implementation VSLViewController

- (instancetype)initFromData:(void*)data {
    if ((self = [super init])) {
        vslData = data;
    }
    return self;
}

- (void)viewDidLoad {
    [super viewDidLoad];
    return ((struct UIViewController_vtable*) *vslData)->viewDidLoad(self);
}

@end

Here we create the intermediate ‘VSLViewController’ in which we manage a vtable with the data which we keep as a void* since we don’t know what type that exactly is at compile-time. However what we do know is that the vtable is the first node in the data struct.


Now to see how a Clang to VSL mapping looks. Let’s introduce an extension of MyViewController.type with an ‘owner’

struct MyViewController.type { // From before
    struct MyViewController.metatype metatype;
    VSLString *myField;
}

struct UIViewController.metatype {
    struct UIViewController.vtable *id;
    UIViewController* owner;
} __attribute__((packed, aligned(8)));

This is still compatible with the top but now we have an owner which represents the UIViewController, we’ll then want to ensure this is in sync with the VSL component:

@implementation VSLViewController

-(instancetype)initFromData:(void*)data {
    if ((self = [super init])) {
        vslData = data;
        ((struct UIViewController_metatype*) vslData)->owner = self;
    }
    return self;
}

@end

…and this should be all the wrapping that is needed. myViewController[8] providing the interface for Clang. This is where the @dynamic(static) comes in, in VSL this specifies that we’ll be using the object itself’s owner field as the initial super . We’re using __attribute__((packed, aligned(8))) so we’ll be able to ensure that the location of the owner is consistent.

For summary, here’s a diagram of what is happening:

memory and method resolution diagram, leftwards is lower memory space
memory and method resolution diagram, leftwards is lower memory space

#VSL

An implementation of this (along with dynamic dispatch, subclassing, super, etc. support) can be found in vsl-objc-bindgen (I’ve used the tool’s compile function earlier). You can run it using:

$ vsl-objc-bindgen gen ios iphonesimulator $(xcrun --sdk iphonesimulator --show-sdk-path)/System/Library/Frameworks/UIKit.framework/Headers/UIKit.h UIKit.m UIKit.vsl -i "<UIKit/UIKit.h>" -s UIView
$ vsl-objc-bindgen compile ios iphonesimulator UIKit.*.m

and it’ll generate an LLVM BC file with bindings. While this targets VSL, using the above you can use the Clang API to iterate over given framework and generate code in your language of choice