Some projects using the Poppy platform shall need the use of speech recognition and/or text-to-speech techniques.
This topic aims at listing the possibilities. It is a wiki: everyone can contribute and edit THIS first post.
Also, users who have tested some of them are welcome to provide feedback.
I don’t think Julius was a good idea. Did you really look at it? It’s designed around Japanese, and so in depth, I think that’s a bad idea. Our Syntax, and Japanese syntax are so different, if you read it, they already know it’s a fussy monster and the problem not being solved for recognizing English.
It’s just do to the single board processors, this thing built around that it has to retain zero dependencies on windows. The single board computers like the Raspberry Pi and Banana Pi, really don’t run Windows. I’m looking at all of the different versions of operating systems that are required, and DROID, has the only real visual recognition suites that work on Linux. It looks like you’ll have to write some part of AI, in Python, or it won’t consolidate all of these sensors and motors, let alone microphones and speakers.
This is one of the lists, I have to go through this huge list of web sites to find the Software. The repository really has almost everything. You still have go all the way through system requirements before you really pick. Do you really want to worry about your robot not talking because, Google only wants to link input and output, never give you a true speech to text engine. They want you tied to their servers. If Google is hacked, or there’s power outage in your area, do you want your robot to wind up speechless? That’s the real problem, if everything from microphone hit’s a dead end being sent to Google, and Google’s not home, you’re robot’s technically deaf, and not responding vocally. I find a few good ones but, then, some how it has to ported to the DROID operating system, where there is a Visual Recognition Suite for the single board computer like the one already in it’s head.
It allows to generate text sentences to describe images using deep learning techniques and is open-source. You can try it directly using the the web interface.
Could we imagine using it with the Poppy camera and a text-2-speech software, such that the robot can speak about what it sees in front of it? As the source code is available, one can also imagine using in a “developmental” way in an online learning fashion, but it is another story
Wow, this software is a genius !!! It can even guess things that are true and no normal human could guess; I made an experiment, submitting a photo with several Poppy creatures and other robots. Here is the result:
Now, why am I saying the software is a genius to say that these robots are lamps?
Because a very early experiment we made in the Flowers team was to build a set of robotic lamp-like creatures, which we called “Flowers Fields”:
It was the design which led us to the Ergo-Robot design, and the Ergo-Robot experiment in turn allowed us to learn a lot about creating DIY robots. And this learning has been very useful for the Poppy Humanoid …
as I am working on the voice from voxygen company which is not open source… I would be very interested to implement this topic into the poppy we have here in grenoble.
I keep you inform after our conference PAris le web
We were really impressed by the quality and speed of the speech recognition and TTS in french of the cherry project (https://www.youtube.com/watch?v=URB1kDDScfM). Do you know if they are open source and if they are the two mentioned in the wiki (Julius and and Mary tts) ?
Thanks for the enthousiasm regarding our project, I really appreciate it.
The video was done while using gTTS (see our wiki here).
You can have a look at the code here also. But you’ll have a specific link in the wiki for gTTS.
Thanks for this informations and yours great works on Cherry
I have added CMU Sphinx that you have used, to the original post of this feed.
In the video, do you know if it was CMU Sphinx or the google API for the speech recognition ?
I support at 200% your efforts that you have described in the cherry wiki to switch to an open source solution for both TTS & SR instead of using Google solutions or other non open source solutions, for these reasons :
Google is very malicious to not release his TTS & SR solution under open source licence, using it in the cloud will be terrible when the robots will be more democratized, and if a lot of all them use the Google TTS & SR solutions, Google will be able to scan and monetize the most of the conservations made beetwen robots and humans (and NSA and co will also enjoy it ) ;
All the robots made with on line Google TTS & SR solution will not work in the countries that protect their web economy and technological independance as China for example where the most of Google solutions are blocked ;
All the robots using on line Google TTS & SR will be under the USA government laws, and they will have the power to block all the countries that are under embargo, as they made it for Siria or Cuba for some american platform of moocs, sourceforge or googlecode.
For open source using french TTS & SR, it seems that we have a important problem, with the lack of high quality open source solutions (well it seems that it’s the case, i hope I’m wrong and there is high quality open source solution that we haven’t yet discovered), it’s really (and will be worst in the future with the democratisation of the robots) an handicap for the popular, collaborative or/and open source innovation, and the french speaking users. The french speakers will be handicapped with the lack of speaking or understanding robots, or they will be more expensive, or they will be less functionnal, or they will be under dependence of Google or other kind of GAFAM, in comparison with the other languages people that have high quality open source TTS & SR. The Cherry project seems to be an excellent example of the problem.
Do you also think that problem will have important repercussions for the innovation and dependence of the french speaking countries ? Or it’s not so important ?
If yes, what can we do to avoid this problem ?
Contact and explain the stakes to the leaders of the projects that have used public funding or public research to invite them to make open source their french TTS & SR solutions, and find with them solutions in order that the release under open source licence can be interesting for them ? (Mbrola, Voxygen ?)
Try to make a crowdfunding / mutualisation between the public projects & public research & and open source projects that can have benefit about TTS & SR to improve the existing open source solution as Mary TTS ?
Contact some organism of the francophony ?
Other ideas ?
Yes, certainly, lets’ see when they will publish the list of deported solution for STT if there is also other open source solution than Julius and CMU sphinx , it could be interesting to add it here.
I put here their text about this for the other readers of this tread : " We want to be transparent with our supporters, but at the same time we need to keep some details of our project private until negotiations are complete and contracts are signed. Remember that Kickstarter is not
a store. It is a place to build support and access capital to complete amazing projects. Our platform is still more than 10 months away from release and we have a lot of details to iron out before we ship.
We are currently evaluating several STT application interfaces (APIs). Our software is designed to use multiple APIs simultaneously. Partially this is to improve performance, but it is also to prevent getting locked into a single technology or vendor. When we’ve selected and executed agreements with our upstream STT providers we will communicate our selection to end users.
We will also remain open to adding STT vendors in the future or bringing this portion of our technology in-house.
To preserve end user privacy we are looking at several mechanisms to randomize STT query destinations, mask IP addresses and conceal other personally identifiable information.
Mycroft is open source so users who don’t like our STT or AI selection can always deploy their own STT or AI back end.
This article http://www.objetconnecte.com/rentree-moocs/ seems to means that there will have a new session of this mooc.
If it’s right, it’s maybe an opportunity to contact the authors to think about how some activities asked to the students can enhance the use and common knowledge of the open source SR and TTS ? (example of an collaborative activity : participate to an collaborative benchamarking on the existing open source TTS solution in different languages using for example a common text (that will contain ? ! : numbers acronyms…) and putting the resulting audio file in a wiki ?)
j’ai vu que quelqu’un l’avais déjà proposé mais le commentaire semble être passé inaperçu dans la masse. Pourquoi ne pas utiliser jasper ? http://jasperproject.github.io/documentation/ , je compte l’essayer, je suis encore en attente du livreur pour rasp b+ – – Idleman Auteur de l’article Parce que jasper utilise pocketsphinx, qui ne reconnais que très peu de mots en français, qui est un cauchemar a compiler et dont les performances sont relativement médiocres :).
http://www.tensorflow.org/ is now under apache licence. We will be able to have a quality speech recognition in french without sending the data to google and/or to have a local installation of the SR that can be used without internet connexion ?
Thanks for the link. I do not understand very well. In the video, they say a lot about the speech recognition but regarding the examples, I do not see any think about speech recognition.
I have the read more in details.
But +1 for the fact all is in Python
Which TTS or speech recognition could run on the Odroid? I was wondering whether MaryTTS for instance could run easily on such a small processor. @Laura , @Sophie or @Maximilien, which systems have you tried? Why did you finally choose gTTS?