Ep#106: Beyond Coding: How Airtrain.ai Revolutionizes AI Integration for Developers ft. Emmanuel Turlay (CEO & Co-founder, Airtrain.ai)

Episode Summary

In this episode, Adil welcomes Emmanuel Turlay, CEO and co-founder of Airtrain.ai, a company that simplifies AI and machine learning for large language models with a focus on no-code solutions. Turlay, with a background in particle physics and experience at notable tech firms like Instacart and Cruise, shares insights into the transformative power of AI in development and problem-solving. His journey from conducting research at CERN to leading a startup underscores the importance of data management and scalable software practices. Airtrain.ai aims to democratize access to AI technologies, enabling data scientists and ML engineers to more efficiently harness large language models for innovative solutions. The conversation highlights Turlay's vision for AI's role in future software development and his commitment to making cutting-edge AI more accessible and practical for a broad range of applications.

Key Takeaways	Time
Emmanuel's background in particle physics research and experience shipping software at scale	2:20
Key factors in doing machine learning at scale	3:03
importance of data quality	3:50
Comparing Tesla's self-driving approach to companies like Cruise and Waymo	5:37
Capabilities of large language models like GPT-4 for coding and problem solving	10:21
Why smaller specialized fine-tuned models may be better than giant universal models	12:02
Airtrain's products and target customers	15:02
no code users building on AI models	15:52
Measuring product usage through fine-tuning jobs, inference API calls, and more	23:38
Upcoming plans around dataset management and optimization	28:27

Transcript

[00:00:03] Adil Saleh: Hey. Greetings, everybody. This is Adil, your host. We are almost like, our 110 episodes now. And, you know, most recently, we we started, inviting, products that are more for developer experience, more, you know, AI heavy products because, they are so much, working towards, you know, changing the world, changing the way we work, changing the way, you know, you know, these engineers and engineering and problem solving. They operate, especially machine learning, data science has evolved so much with these, large language models and, you know, generative AI, in in the recent past. So I'm so thrilled to, you know, invite, you know, Emmanuel today. He's the he's he's the CEO and cofounder at Airtrain.ai. It's also a YC back company, I assume. And they are pretty much doing some heavy lifting on a machine learning and AI for, no code, large large language model, you know, infrastructure. So thank you very much, Emmanuel, for taking the time today. 00:01:04 Exploring Emmanuel Turlay's Journey in Building Data Science Tools [00:01:04] Emmanuel Turlay: Of course. My pleasure. Thank you for having me. [00:01:07] Adil Saleh: Love that. Man, I'm quite intrigued by, your journey. I'm sure, you know, building a product like AirTrain, it takes a lot of technical skill set, on on a ground level, on a practical level, not just an academic level. I'm sure you have a PhD, in in in your field, and you have you're the subject matter expert in domain, with the domain experience. How did you see yourself as a software engineer back in the back in the years starting out with these companies, at scale? You've you've done working, you know, with teams that are doing it at scale, infrastructure engineering, you know, specifically machine learning and and and data science. How do you evolve yourself and see yourself as as a different founder today having all of that experience? [00:01:50] Emmanuel Turlay: For sure. So, my PhD is actually not in computer science. I did a PhD in particle physics. I used to do physics research at CERN and the big physics lab in Switzerland with the Large Hadron Collider. But the skills that I learned doing my PhD there are directly applicable to machine learning because even before machine learning was a thing, so it was, you know, year 2010 roughly, We were already at CERN dealing with a large amount of data, all the collision data that we were recording. And so we had to process this data at scale. So we had our own sort of academic cloud even before AWS was was a thing. And so I learned really good practices around shipping software at scale and also, dealing with data, like, you know, labeling data, cataloging data, processing and and so on. So that was for my key background. Then when I joined the tech industry, so about 2014, I joined Instacart. So Instacart is a large grocery delivery company here in the US And, they were in hyper growth mode, so they were getting more and more orders every month, more and more customers. And so, in order to optimize the business, we had to measure everything that we were doing. All the deliveries, every, like all the the time it took to pick certain items, the time it took to drive from a from the the store to the the customer's house and so on. And so processing this data was also pretty pretty massive. And then the next step was to work at Cruise, which is a self driving car company here in San Francisco. And so you can imagine that the volume of data that is generated by a self driving car on the road between the image sensors, the lidar sensors, the metadata around the drives, and also sometimes like sound clips and so on. You can imagine that this volume of data is pretty huge. So this really shaped my understanding of what it takes to do machine learning at scale. And, you know, ML, we always underestimate that, but it's mostly a data problem. Like all things in machine learning and AI, it's a data problem. If you don't have a high quality data set, then your model is not gonna be that good. You can play with the hyperparameters as much as you want. You can train for as long as you want with the best GPUs. If your initial training data is not good, you're not gonna get any good results. So basically, knowing how to sift through large amounts of data and remove, you know, the problematic instances, kind of it's ready to extract the trends in the data. That's really the most important part of data science and machine learning. And so, basically my whole career I've been iterating on tools to make this as easy as possible for for engineers that don't necessarily have the skills to do this kind of large scale software. And so I've been trying to build similar tools. So that's what I'm doing now in my entrepreneurial journey. I'm trying to develop tools that make it much easier for data scientists and ML engineers to iterate on on their tasks, so that they can get to results and insights faster without having to, you know, learn how to use kubernetes or learn how to run jobs in the cloud or learn how to access GPU's or how to package a Docker image and so on. So that's kind of my inspiration, making the technical things easier for people that care more about the insights that you get from data. Mhmm. 00:05:01 Discussion on Tesla's Efficiency in Autonomous Driving [00:05:01] Adil Saleh: Okay. I have a seldom understand understanding on on how this work as, from a technical standpoint, but, I would definitely learn from you. I know you're you're you're the person kept well, I have one question since you've mentioned about car manufacturing. Why is this Tesla not so efficient when it comes to autonomous driving as opposed to, you know, just electric vehicle? There are so many Chinese are making, and they're competing big time, in the market. So what is it one reason that they're not able to apart from all the regulations and everything, I believe I'm sure you do as well. They're not as autonomous as a Schumann would love to Yep. In terms of safety and all that. [00:05:37] Emmanuel Turlay: Yeah. So there's two major differences between, Tesla and the the self driving capability of a Tesla and what's companies like Waymo, Cruise, and all the Chinese one that you mentioned are doing. The first difference is that Tesla's are only using cameras. They're not using LIDAR. So cameras are great for shapes and colors, but they don't really give you a lot of information about depth, the distance of objects. If you're using LIDAR, LIDAR is like a rotating laser that basically sends a laser signal and measures the time that the signals take to come back so you can actually measure the distance of objects. So if you only use cameras to estimate the distance of objects, your error is pretty high And so it's pretty hard to, to get a car to drive itself in that way. So that's one for the first factor. The second factor is that companies like Cruise and Waymo operate within a very specific, geographical area. They are basically robo taxis. They're not individual cars. So you don't buy a car from those companies. You basically book a ride the way you would with an Uber or Lyft or or something like that. So those cars operate within a particular domain. So certain streets in San Francisco or Austin, Texas and then other cities. So it means that the companies can have very detailed mapping of those streets, can also, remove certain streets that are too dangerous or too risky because they're like potholes or construction sites. And so they can diminish the risk of failures, thanks to this deep knowledge of the domain area. The Tesla cars claim to be able to drive you almost anywhere. So you could say, oh, drive me to the top of the mountain, for example, here in the Bay area, drive me to Tahoe, which is like 4 hours away. And the car is supposedly able to do that. However, obviously, Tesla does not have very high definition maps of all the details along the routes. And there's many obstacles. You know, you may encounter weather events. There's snow sometimes in Tahoe in Tahoe and so on. So because they take this different approach, so camera only and, within an arbitrary geographical area, their failure rates are much higher. I think it's a difference in approach and, you know, I'm not I'm not sure which one is best, but think it's a bit safer to start in a confined domain with the best sensors including lidars. It's it's safer to start this way and then go towards the personal car ownership where you will be, eventually able to ask your car to go almost anywhere. 00:07:55 Discussion on Large Language Models and their Impact on Problem Solving in Engineering [00:07:55] Adil Saleh: Amazing. Thank you very much for clarifying. I mean, I had these I mean, of course, I knew the fact that they're not so accurate with that, but, of course, there are some reasons behind it. And as you mentioned, since the you you engineers, you understand, you know, all the back sales of what goes like. And second question I have about large language models. We get to meet a lot of these products, you know, early on when GPT did came up and GPT 4. Everybody was so excited to, you know, find a way and ride the wave to get, you know, generated by somehow inside their product even when they're not even PLC or they're very start, pre product market fit, and they're trying to do that. And now the trend is, you know, companies that they realize that they cannot achieve, the efficiency of, you know, of of data coming in. I mean, they need to do a lot of prompt engineering. They need to build some models on top that overlaps over, overlaps a dataset of chat GPT, and they need to they need to take it as an augmented source then as a as a native source. Why is that happening? Like, how you see, these these big LLMs, like, 2 or 3 that are that are there, are not able to achieve that level of efficiency into problem solving, when it comes to computer science. I'm sure it is quite, quite beneficial when it comes to content writing, when it comes to script writing, ads, all of these marketing tools that are doing pretty, pretty handsome. But what about what about engineering? [00:09:22] Emmanuel Turlay: Are you are you asking why they can't do better at coding? Or are you Yeah. [00:09:26] Adil Saleh: Better at, like, why, you know, all these generative AI models and all these LLMs Mhmm. Can achieve the level of efficiency when it comes to problem solving on the computer science and data science, machine learning, all of these engineering problems. [00:09:39] Emmanuel Turlay: So actually, the most recent models including GPT 4, there's also CodeLama, 7db who came out a few couple weeks ago are pretty impressive in their level of problem solving. So they are able not only to actually code a function if you, if you lay out the specs for the functions, but they can also architect an entire application, including the code base. There's a few products out there where you basically prompt the product to build me a chat application between 2 people, between 2 users. The communication should be real time and so on. And the, the model will be able to lay out the the specs, lay out the the structure of the repository, structure of the code, and then to actually start writing the code itself. So, of course, you still need a little bit of glue between the different aspects and you need a human to review every step because, you know, they're not a 100% accurate. But still the capabilities of those models are pretty impressive to me. I can see a future in the next few years, maybe within 5 years where, engineers, software engineers will be mostly reviewing code written by, by AI. At least for, usual tasks. For example, when you're a software engineer, many times in your career, you have to do all the boilerplate work of, you know, writing the the baseline the baseline skeleton of your app, you know, writing some API endpoints, writing some authentication. All those things that we've have to rewrite every every few weeks or every few months. All those basic things will be taken care of by AI and humans will review code written by AI, review specs, write the specs, and also will be basically spending their time on more complex problems that the AI cannot solve. So those models are going to become essentially like interns or junior engineers doing sort of the boilerplate work. And and humans will be able to spend time on more sort of high level problems and high level architecture. So it's definitely a big shift. And you know I think those models are pretty impressive already. That being said, I don't think all companies are going to be training their their own models in the future. I think there's going to be a handful of big providers like OpenAI, like perplexity, like, Anthropic and Google and Meta and so on. And, companies are gonna have to build more specialized models, for example, per language. If you are a company that writes a certain language, like, I don't know, like Go or Rust or C plus plus and maybe you want to have a copilot, an AI copilot that is extremely good at those particular area. Or if, for example, if you're a fintech company, you need to have a copilot that knows all the specificities of writing code for fintech, all the regulation aspects and so on. So there's gonna be a lot more specialized models in the future. Right now, we have those large models that can do almost anything very well. But I think in the future, we'll have smaller models. And this is kind of the bets that we're taking with with AirTrain and my company. 00:12:25 Discussion on AirTrain and Language Models with Emmanuel Turlay [00:12:25] Emmanuel Turlay: We're betting that people are at some point are going to move from those large models and into smaller fine tune models that are cheaper and faster to run, so that they can have, like, more on prem models instead of having to rely on cloud based models all the time. [00:12:39] Adil Saleh: Yeah. And they can ship fast. Like, it's it's all about, you know, you, you know, we gotta trade against time at all times, like, as an engineer as well with these cloud learning models, in place. So what is your favorite one? Like, there are, quite a few handful of me. What is your best LLM? [00:12:55] Emmanuel Turlay: So I might be biased because, I'm French, but I really like the Mistral models. Mistral AI is a company that came out last October, and they already they released their 1st model, a 7 b model back in October. I think it was a open source model, and they released the mistrialmedium model a couple months ago. And those models are much smaller than the GPT 4 models. But they're on par almost by performance. Yeah. You can fine tune this whole 70 very easily. Mistral medium is is not open source, so it's not easy to fine tune it, but, the performance is pretty impressive. So I think I'm very happy to see companies that are still doing all this amazing work, open source because OpenAI, despite the name is not open source and their models are not open source. Which means that they cannot be sort of reviewed by independent organisms and benchmarked. We always have to trust OpenAI's word when it comes to privacy and safety and so on. So I really like that there's those open source models out there whether it's LAMA 2 from Meta or Mistral, 7b and the others. So I really like Mistral. It's actually the one that we use the most at at AirTrain for fine tuning because it's it's really very, very performance. So, yeah, that's one of my favorites. [00:14:09] Adil Saleh: Yeah. Amazing. Thank you very much, for, you know, explaining that in detail that why is that is that your favorite. And I'm sure all these listeners and a lot of these are, you know, technical cofounders and founders of, of building these these products. So now getting on to AirTrain, like, like, where did you get this thought? Of course, you had this domain experience and all. Like, why did you build this platform? And, I mean, what kind of customer segments you're targeting right now? Is this category still there? Is it growing, or it's still in the making? [00:14:39] Emmanuel Turlay: Yeah. For sure. So, just to go back a little bit on my journey, my last job before starting this company was at Cruise, the self driving car company. I was a senior staff engineer in charge of machine learning infrastructure. So I developed I led a few projects to build for machine learning engineers to build models faster, to be able to train models faster and iterate on models and and retrain the models on a weekly or biweekly basis with new data coming from the road. So basically, I had a lot of expertise in orchestration of training pipelines. And so when I started my own company, my first product, was an open source orchestration framework for machine learning. So to develop training pipelines. So that's what we tried to sell for about a year. So the name was not Airtrain at the time. It was called SIMATIC. And, we decided to pivot last summer to apply this technology to large language models. So, we basically wanted to apply this kind of batch offline compute resource to to language models. And so we focused on evaluation at scale. So batch evaluation when you're trying to evaluate a model on an entire evaluation dataset, so like a 100 of thousands of rows, and also fine tuning. So those are the 2 sort of big computer workload that have to happen offline. And so we try to address this. And so we built AirTrain to make it extremely easy for people even without technical skills. So there's no code at all involved to evaluate various models on their test datasets. So if you have a test dataset of a few, you know, a few 1,000 rows, you can upload it to air training and you can, parameterize a number of different models. For example, Mistral, OpenAI models, LAMA 2 and so on. And you can evaluate the output of those models according to your own criteria. So if you care about, for example, creativity or toxicity safety you can see which of those models perform the best and then you can fine tune also your based on your data set can fine tune those models. We also have a playground, a free playground that people can use to quickly prompt each model. So all models at the same time. So you can set up, you know, 3, 4, 5 models side by side and then prompt them all together and see which one give the best answer. So that's another product that we have. So yeah, the idea is to help people move away from GPT 4 and all those big commercial commercial models because they're very expensive And so we're trying to make people save money by moving to smaller models. So we are going after product teams that are building tool building products on top of AI models. So unlike our previous product, which was really targeted at machine learning engineers, this product is more targeted at, developers, not necessarily ML developers, just developers that are building on top of AI models and everybody uses GPT-four to prototype because it's the best model. But as soon as people want to scale their app, it's not really affordable to build a large scale application on top of GPT-four. It's so expensive. And so we're trying to help them transition to smaller fine tune models so that they don't spend as much. Mhmm. So this is our target audience. And this is a segment that is really growing. [00:17:49] Adil Saleh: Oh, yes. Now I understand that. Mhmm. It it's really, really growing. And, you know, all of these companies that initially thought that they could do anything apart from these big giants that are still doing, overlapping, things on top of chat GPT and, all of these language models there because they can do it. 00:18:05 Airtrain AI: A Conversation with Emmanuel Turlay [00:18:05] Adil Saleh: They have, you know, the capital and revenue and all of that. But these, you know, startups, they are kind of struggling. Like, we we also have b to b SaaS company. Just the other day, we were trying to, get in touch with a founder that that was part of our community, YC Bank founder, And we were using that platform for integrations for HubSpot and Salesforce, and that that product shut down. And now we're trying to make sure that, you know, how we can not only optimize the cost, but also, make sure that it's not giving us a technical debt a year or 2 year later. So we need to make sure that we do whatever the scene that we're, we we are doing today, that's scalable for for years to come. Mhmm. So, I mean, same goes for this. Like, we are we going to have AI capabilities as well? Are we not thinking about, you know, at all about, you know, using, this is shared JWT and all these, other language models that are pretty expensive at scale. Maybe I mean, even if we are funded, we we won't prefer to do that. And I'm so sure that, after this episode goes live, my CTO will look at, your product as well. And, if he already doesn't know, I'm sure he he he's a strong chancer. He already knows, what you guys are doing. So I appreciate that you've been pretty elaborated. So what kind of, customer segments you're talking? Of course, product managers. What kind of, products, like, any ad can be generative in terms of, you know, on on the content side and the writing side of things. It can it can go either way, I assume. [00:19:30] Emmanuel Turlay: Yeah. It can go either way. It's basically anybody who's built an application on top of one of those large models and is thinking about pushing their application a bit further in terms of scale, but is kind of afraid of their their bills, being being too high because, you know, GPT 4 and and 3.5 are pretty expensive. So we're targeting those companies. If those, if those companies also have requirements in terms of privacy, we can definitely help them there because we can help help them run a model on premise inside their infrastructure so that their users' data never leaves, their VPC and it's never used for training by those those large companies. So we can help them run their own models on premise. So that's also one expertise we have is on premise deployments because this is what we used to do at Cruise and this is what we used to do also with our with the initial product. So, yeah, this is the segment. The good thing about this industry is that, it's much wider and broader than just the ML industry before AI. So there's many more customers out there. Even one interesting thing that I've seen happening in the last 12 months is that now even nontechnical people, so like product managers, people that have some high level understanding of the technical aspects of software, can build almost build applications with no code tools now. They can hook up, you know, a model with a a RAG pipeline, for example, all that without any code, which means that, there's even more than just software engineers that that or even more people that can use those tools. So, I think the no code segment is going to grow more and more in the next few years. [00:20:59] Adil Saleh: Yeah. Just because of the absolutely. You know, we get to meet a lot of, a lot of founders only last year, you know, that are that have the product benefit background, have background more in the in the revenue teams, and they're they're building platforms here. Finding right cofounders in their building platforms, and they're now penetrating pretty big. So, like, at air air train, though, what does how does your, GTM look like? Like, what kind of, customers have you acquired so far? If you just talk about the users, you have a free plan, and then you have a plan that helps, you know, people fine tune that and all of that. So is that sort of a high test model or is that self serve? [00:21:35] Emmanuel Turlay: Yeah. So it's it's self served. We do have, sort of self serve products. We have a few, we had like a few 100 users that are using the self serve product. We also have high touch, sort of, white glove service for certain customers that we are piloting with. So, So, you know, we have a handful of pilots running with, companies like from, you know, early stage to unicorn sized companies where we are doing more of the white glove approach because we're trying to refine the approach the fine tuning approach. As I mentioned earlier your fine tuning model your fine tuning model is only as good as your input data set. So if you wanna have a high quality model, you need to iterate quite a bit on the data sets. And we're trying to make those tools as automated as possible so that it becomes more and more self served. But, yeah, in terms of go to market motion, we we, you know, we do offer to see outreach and so on. We go to conferences. But we also have a YouTube channel that we launched a few months back. So, if people are interested in following us and listening to, you know, tutorial and AI news and so on, they can find Airtrain AI on YouTube and subscribe. I post myself. I'm the host. I post videos on a weekly basis. So that's that's one way that we get a lot of traffic to to to the app as well. 00:22:44 Interview with Emmanuel Turlay on his AI startup AirTrain [00:22:44] Adil Saleh: Yeah. And also you have a community that's pretty engaged too. [00:22:47] Emmanuel Turlay: Yeah. So we have a community, back with our old product with with somatic orchestration product. We have a a discord community of a a few 100 people. For AirTrain, we also have a Slack. It's not it's not very active at this time because we're pretty busy building out the products. But we're always welcoming of new people and happy to hear feedback, questions. We're happy to help people. Even if they're not paying customers, we're always happy to answer questions and help people in their AI journey. [00:23:12] Adil Saleh: Mhmm. Wonderful. Wonderful. I mean, I was kind of curious on on your journey, from a data standpoint. Like, how you're seeing on top of those data points? Let's talk about these, high growth free accounts that you're you know, that that you want to really expand into the fine tune model once they get bigger. Mhmm. Over time, you're trying to grow sort of a notion model that you have. Mhmm. So how you're seeing on top of those action events like they're they're doing, inside the platform? Are those number of, maybe some integrations that they have done? Like, would really love to [00:23:54] Emmanuel Turlay: And so we can measure what our users are doing on the platform. We can even measure what are the most popular models in our playground. See which one we can, for example, deprecate because they're not used anymore and so on. So that's one way in which we measure all the actions that are taken by our users. And another way is that we have our inference API where we serve the fine tune models that we prepare to our users. Once they fine tune once they've successfully fine tuned the model, they can query the model through our API, and we charge per per token. And, so we also measure this traffic, and we we measure the size of the input, size of the output. We measure the latency of the request so that we can optimize it and benchmark it against industry standards and so on. So these are the two ways in which we measure. We also of course measure the funnel, how people get to our website and get into our app. Trying to figure out how many people come from our YouTube channel, how many people come from search, come from, ads that we run, comes from, you know, whether we post on Hacker News or or or other forums like that Reddit. We try to attribute this origin as much as possible so we so we can know what channels work the best for us and so that we can invest more in those. [00:25:04] Adil Saleh: I mean, it seems like your your, you know, API call that, you know, that is your success metric where you get, like, you know, you charge for token, like, per call. Right? [00:25:13] Emmanuel Turlay: Yeah. That's right. We we also charge for for fine tuning. Although at this time, it's still, it's still kind of free, within a limit because we are still integrating with with billing. But, we our main source of revenue is definitely the the API because, you know, when you are fine tuning a model, you may be running a few, you know, maybe 5 or 10 jobs or in different experiments. But once the model is ready, then you don't fine tune anymore. So but but once your model is being served, then your traffic, your app traffic basically translates to, API traffic for us. And so this is where we collect revenue mostly. So this is the one that we need to make sure that it's optimized, it's fast, it doesn't fail, the uptime is as, as high as possible, and that, we can also serve those models efficiently. Because serving those models requires specialized hardware, like GPUs that are very expensive. And so we need to make sure that we utilize those GPUs the most efficiently possible so that we don't burn too much cash on them. Mhmm. Also, big big part of the engineering work. [00:26:14] Adil Saleh: Got it. Got it. You know, that that that comes it's a part and parcel for, for for a product like this. You gotta make sure you optimize the card on your end. You're also serving the free users and then people that find whom you need to make sure, you charge them, you know, it's a comparable, like, market center price, but also make sure you you you cover the cost and have some some margin. So thinking of, the price points on the fine tune model for an average SaaS that has about 100 customers, they want to fine tune those data points. On average, how much would it cost for our audience to be? [00:26:46] Emmanuel Turlay: Yeah. So typically, a fine tuning job is, of the of course, it depends on the size of the input dataset, but it's usually of the order of between $5.10 for one job. So it's very cheap. You may have to run a few jobs to iterate on the results and the output. So overall it's not gonna, you know, break the bank. So and then after that, we are charging per 1,000,000 tokens. So a 1,000,000 token is roughly a 1,000,000 words. And so, we're charging, on that basis. In terms of the price point, it depends on what model it is. But we are definitely competitive with all the other providers out there. We're not trying to do a price war. That's not our goal because of course others can get much cheaper potentially. We're just trying to give people a good experience and make sure that they get value from the platform. We are a pre revenue startup so we have revenue but it's not like we're trying to get profitable at this time. We're just trying to make sure that our customers are satisfied and keep coming back. So we're not going after a price war with our competitors here. 00:27:46 Exploring the Future of AI and Conversational Agents [00:27:46] Adil Saleh: Mhmm. Yeah. Absolutely. I love that. I love that. So what is it? This is one last question. We're pretty much on time. I appreciate that you Mhmm. You know, it's been 2 minutes past, and I was absolutely into this conversation. One last thing. What is that 10 x feature that you are you think that you have today inside your platform or you're gonna build in the future? [00:28:03] Emmanuel Turlay: Yeah. So we're, about to build more tooling around dataset management. So the data loop is historically, even before the AI wave, was, is both the most important in machine learning is the data loop. It's what's called the data loop basically iterating on data, getting the raw data, filtering it and so on and then getting it ready for training. So it's both the most important and also the most underserved. Because data is no longer a sexy, sort of product area. People want to build models and build impressive things but the data side is usually a bit more tedious and so fewer companies are going after it. And so we are hoping to build some dataset management and dataset cleaning and optimize and optimization tools for people to quickly identify in their training datasets the, examples that are bad, so low quality, so that they can filter them out, so to increase the quality of their datasets and also the sort of the holes in the dataset. If there's any missing data for a particular type of usage, they can, you know, use AI to actually generate synthetic data. It's actually a pretty common practice now to generate synthetic data. So we would like to help our users build, a dataset that covers has a flat distribution across all the use cases that they wanna serve so that the fine tune model is the highest possible quality for that. So we're gonna be building that in the next few weeks. But in terms of what's really popular right now in our product, it's really our playground. So it's free for people to use. So all, your listeners can sign up today. And basically, they can compare, you know, Gemini with GPT 4, with Mistral, all side by side with the same prompt. You know, you send one prompt and all those models will respond at the same time. So it's it's pretty entertaining because you can see how the small models do. Sometimes you can give them riddles and the big models fail but the small models do well. So it's pretty entertaining. [00:29:53] Adil Saleh: I'm gonna try that first in the morning. Yeah. First thing in the morning. [00:29:56] Emmanuel Turlay: Do sign up. It's really easy. It's a playground, and you can I think we support maybe 12 or 14 models at this time, so you can really put them all side by side? All the all the llama models, all the mystery models, all Gemini, and so on. So that's that's pretty entertaining for for people to get a a sense of how those different models perform. [00:30:15] Adil Saleh: Love that. Very smart. Like, in the first thing that you mentioned that you're trying to enable your, customers to better and faster fine tune their datasets. Mhmm. So that's your success metric. That's very smart, of you, Manu. I really appreciate your time today, and it was a real powerful conversation. Thank you very much for helping me understand these, you know, the degrees and technologies and being my teacher. You're such an educator. Thank you very much. [00:30:40] Emmanuel Turlay: Well, thank you so much for having me. This was a pleasure. [00:30:43] Adil Saleh: Love that. Have a good rest of the day.

Ep#106: Beyond Coding: How Airtrain.ai Revolutionizes AI Integration for Developers ft. Emmanuel Turlay (CEO & Co-founder, Airtrain.ai)

Emmanuel Turlay

Episode Summary

Transcript

Meet ORA — Your AI Agent for Customer Management

Keep listening

Meet ORA — Your AI Agent for Customer Management