[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Emacspeak] TTS Server Implementation Questions

To: Victor Tsaran <vtsaran@xxxxxxxxx>
Subject: Re: [Emacspeak] TTS Server Implementation Questions
From: John Covici <covici@xxxxxxxxxxxxxx>
Date: Tue, 09 Apr 2024 15:05:19 -0400
In-reply-to: <CABUWEteLMZ5UC39J8M_4OE6z8rsmH8q5s9GqqH+rhMGmEY3FkA@mail.gmail.com>

The problem I was looking at, is when reading a long buffer, if I stop
reading the cursor  is still where I started.

On Tue, 09 Apr 2024 14:42:53 -0400,
Victor Tsaran wrote:
> 
> [1  <text/plain; UTF-8 (quoted-printable)>]
> I guess, the question stands: what user-facing problem are we trying to
> solve?
> 
> 
> On Tue, Apr 9, 2024 at 3:14 AM Parham Doustdar <emacspeak@xxxxxxxxxxxxx>
> wrote:
> 
> > That's true, Emacspeak doesn't currently "read" from the speech server
> > process as far as I've seen, it only "writes" to it. Fixing that isn't
> > impossible, but definitely time consuming.
> > The other concrete issue is that last time I checked, console screen
> > readers read all the text in one chunk. They don't use the audio CSS
> > (forgive me if I don't use the correct name here) that Emacspeak has, which
> > requires you to play audio icons, speak text with different pitch, and
> > pauses. All of this means that you have to do extra heavy-lifting to really
> > track the index, because the index you get back from the TTS engine isn't
> > simply a position in the buffer -- it is just the position in the current
> > chunk of text it has recently received.
> > So that's why I'm curious if we really think it's worth it. It could be,
> > or not, I'm not opinionated, but I'm also realizing that in our community,
> > we don't really have a good mechanism to discuss and decide on things like
> > this.
> >
> > On Tue, Apr 9, 2024 at 8:35 AM Tim Cross <theophilusx@xxxxxxxxx> wrote:
> >
> >>
> >> You are overlooking one critical component which explains why adding
> >> indxing support is a non-trivial exercise which would require a complete
> >> redesign of the existing TTS interface model.
> >>
> >> For indexing information to be of any use, it has to be fed back into the
> >> client and used by the client. For example, tell the client to
> >> update/move the cursor to the last position spoken.
> >>
> >> There is absolutely no support for this data to be fed back into the
> >> current system. The current TTS interface has data flowing in only one
> >> direction, from emacs to emacpseak and from emacspeak to the TTS server
> >> and form the tts server to the tts synthesizer. There is no existing
> >> mechanism to feed information (i.e. index positions) back from the TTS
> >> engine to emacs. While getting this information from the TTS engine into
> >> the TTS server is probably reasonably easy, there is no existing channel
> >> to feed that information up into Emacspeak.
> >>
> >> Not only would it be necessary to define and implement a whole new model
> >> to incorporate this feedback, in addition to also working with TTS
> >> engines which do not provide indexing information, you would also likely
> >> need to implement some sort of multi speech cursor tracking so that the
> >> system can track cursor positions in different buffers.
> >>
> >> The reason this sort of functionality seems easy in systems like speakup
> >> or speech-dispatcher is because those systems were designed with this
> >> functionality. It is incprporated into the base design and part of the
> >> various communication protocols the design implement. Adding this
> >> functionality is not something which can just be 'tacked on'.
> >>
> >> The good news of course is that being open source, anyone can go ahead
> >> and define a new interface model and add indexing capability. However,
> >> it may be worth considering that it has taken 30 years of development to
> >> get the current model to where it is at, so I think you can expect a
> >> pretty steep climb initially!
> >>
> >> John Covici <covici@xxxxxxxxxxxxxx> writes:
> >>
> >> > Its a lot simpler -- indexing is supposed to simply arrange things so
> >> > that when reading a buffer, and you stop reading, the cursor will be
> >> > at or near the point where you stopped.  Speakup has had this for a
> >> > long time and that is why I use it on Linux.  But its only good for
> >> > the virtual console.  Now speech dispatcher has indexinng built in, so
> >> > if you connect to that and use one of the supported synthesizers,
> >> > indexing works correctly and I don't see any performance hit.  I think
> >> > all the client has to do is connect to speech dispatcher, but check me
> >> > on this.
> >> >
> >> > On Mon, 08 Apr 2024 08:25:27 -0400,
> >> > Robert Melton wrote:
> >> >>
> >> >> Is indexing supposed to be like per reading block, or like one
> >> global?  Is the idea
> >> >> that you can be reading a buffer, go to another buffer, read some of
> >> it, then come
> >> >> back and continue? IE: Index per "reading block"?
> >> >>
> >> >> Assuming it is global for simplicity, it is still a heavy lift for
> >> implementation on
> >> >> Mac and Windows.
> >> >>
> >> >> As they do not natively report back as words are spoken, now
> >> >> you can get this behavior at an "Utterance" level, by installing hooks
> >> and callbacks
> >> >> and tracking those. With that you would need to additionally keep
> >> copies of the future
> >> >> utterances, even if they already where queued with the TTS.
> >> >>
> >> >> Considered from the POV of index per reading block, then you need to
> >> find ways to ident
> >> >> each one and its position and index them and continue reading.
> >> >>
> >> >> Sounds neat, but at least for my servers, right now, the juice isn't
> >> worth the sqeeze, I
> >> >> am still trying to get basic stuff like pitch multipliers working on
> >> windows via wave
> >> >> mangling and other basic features, hehe.
> >> >>
> >> >> > On Apr 8, 2024, at 05:20, Parham Doustdar <parham90@xxxxxxxxx>
> >> wrote:
> >> >> >
> >> >> > I understand. My question isn't whether it's possible though, or how
> >> difficult it
> >> >> > would be, or the steps we'd have to take to implement it.
> >> >> > My question is more about whether the use cases we have today make
> >> it worth it to
> >> >> > reconsider. All other questions we can apply the wisdom of the
> >> community to solve, if
> >> >> > we were convinced that the effort would be worth it.
> >> >> > For me, the way I've got around this is to use the next/previous
> >> paragraph
> >> >> > commands. The chunks are good small enough that I can "zoom in" if I
> >> want, and yet
> >> >> > large enough that I don't have to constantly hit next-line.
> >> >> > Sent from my iPhone
> >> >> >
> >> >> >> On 8 Apr 2024, at 11:13, Tim Cross <theophilusx@xxxxxxxxx> wrote:
> >> >> >>
> >> >> >> 
> >> >> >> This is extremely unlikely to be implemented. It is non-trivial and
> >> >> >> would require a significant re-design of the whole interface and
> >> model
> >> >> >> of operation. It isn't as simple as just getting index information
> >> from
> >> >> >> the TTS servers which support it. That information has to then be
> >> fed
> >> >> >> backwards to Emacs through some mechanism which currently does not
> >> >> >> exist and would result in a far more complicated interface/model.
> >> >> >>
> >> >> >> As Raman said, the decision not to have this was not simply an
> >> oversight
> >> >> >> or due to lack of time. It was a conscious design decision. What
> >> your
> >> >> >> asking for isn't simply an enhancement, it is a complete redesign
> >> of the
> >> >> >> TTS interface model.
> >> >> >>
> >> >> >> "Parham Doustdar" (via emacspeak Mailing List) <
> >> emacspeak@xxxxxxxxxxxxx> writes:
> >> >> >>
> >> >> >>> I agree. I'm not sure which TTS engines support it. Maybe, just
> >> like notification streams
> >> >> >>> are supported in some servers, we can implement this feature for
> >> engines that support it?
> >> >> >>> Sent from my iPhone
> >> >> >>>
> >> >> >>>>> On 8 Apr 2024, at 10:24, John Covici <emacspeak@xxxxxxxxxxxxx>
> >> wrote:
> >> >> >>>>
> >> >> >>>> I know this might be contraversial, but, indexing would be very
> >> useful
> >> >> >>>> to me,  sometimes I read long buffers and when I stop the
> >> reading, the
> >> >> >>>> cursor is still where I started, so no real  way to do this
> >> adequately
> >> >> >>>> -- I would not mind if it were just down to the line, rather than
> >> >> >>>> individual words, but it would make emacspeak lots nicer for me.
> >> >> >>>>
> >> >> >>>>> On Fri, 05 Apr 2024 15:39:15 -0400,
> >> >> >>>>> "T.V Raman" (via emacspeak Mailing List) wrote:
> >> >> >>>>>
> >> >> >>>>> [1  <text/plain; us-ascii (7bit)>]
> >> >> >>>>> as a single call is that it ensures  atomicity i.e. all of the
> >> state
> >> >> >>>>> gets set at one shot from the perspective of the elisp layer, so
> >> you
> >> >> >>>>> hopefully never get TTS that has its state  partially set.
> >> >> >>>>> note that the other primary benefit of tts_sync_state
> >> >> >>>>>
> >> >> >>>>> Robert Melton writes:
> >> >> >>>>>> On threading. It is all concurrent, lots of fun protecting of
> >> the state.
> >> >> >>>>>>
> >> >> >>>>>> On language and voice, I was thinking of them as a tree,
> >> language/voice,
> >> >> >>>>>> as this is how Windows and MacOS seem to provide them.
> >> >> >>>>>>
> >> >> >>>>>> ----
> >> >> >>>>>>
> >> >> >>>>>> Oh, one last thing. Should TTS Server implementations be
> >> returning a \n
> >> >> >>>>>> after command is complete, or is just returning nothing
> >> acceptable?
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>> On Apr 5, 2024, at 14:01, T.V Raman <raman@xxxxxxxxxx> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>> And do spend some time thinking of atomicity and multithreaded
> >> systems,
> >> >> >>>>>>> e.g. ask yourself the question "how many threads of execution
> >> are active
> >> >> >>>>>>> at any given time"; Hint: the answer isn't as simple as "just
> >> one
> >> >> >>>>>>> because my server doesn't use threads". > Raman--
> >> >> >>>>>>>>
> >> >> >>>>>>>> Thanks so much, that clarifies a bunch. A few questions on the
> >> >> >>>>>>>> language / voice support.
> >> >> >>>>>>>>
> >> >> >>>>>>>> Does the TTS server maintain an internal list and switch
> >> through
> >> >> >>>>>>>> it or does it send the list the lisp in a way I have missed?
> >> >> >>>>>>>>
> >> >> >>>>>>>> Would it be useful to have a similar feature for voices, being
> >> >> >>>>>>>> first you pick right language, then you pick preferred voice
> >> >> >>>>>>>> then maybe it is stored in a defcustom and sent next time as
> >> >> >>>>>>>> (set_lang lang:voice t)
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>> On Apr 5, 2024, at 13:10, T.V Raman <raman@xxxxxxxxxx>
> >> wrote:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> If your TTS supports more than one language, the TTS API
> >> exposes these
> >> >> >>>>>>>>> as a list; these calls loop through the list
> >> (dectalk,espeak, outloud)
> >> >> >>>>>>>>
> >> >> >>>>>>>> --
> >> >> >>>>>>>> Robert "robertmeta" Melton
> >> >> >>>>>>>> lists@xxxxxxxxxxxxxxxx
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>> --
> >> >> >>>>>> Robert "robertmeta" Melton
> >> >> >>>>>> lists@xxxxxxxxxxxxxxxx
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> [2  <text/plain; UTF-8 (8bit)>]
> >> >> >>>>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx
> >> >> >>>>> To unsubscribe send email to:
> >> >> >>>>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>> Your life is like a penny.  You're going to lose it.  The
> >> question is:
> >> >> >>>> How do
> >> >> >>>> you spend it?
> >> >> >>>>
> >> >> >>>>       John Covici wb2una
> >> >> >>>>       covici@xxxxxxxxxxxxxx
> >> >> >>>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx
> >> >> >>>> To unsubscribe send email to:
> >> >> >>>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe
> >> >> >>>
> >> >> >>> Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx
> >> >> >>> To unsubscribe send email to:
> >> >> >>> emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe
> >> >>
> >> >> --
> >> >> Robert "robertmeta" Melton
> >> >> lists@xxxxxxxxxxxxxxxx
> >> >>
> >> >>
> >>
> > Emacspeak discussion list -- emacspeak@xxxxxxxxxxxxx
> > To unsubscribe send email to:
> > emacspeak-request@xxxxxxxxxxxxx with a subject of: unsubscribe
> >
> 
> 
> -- 
> 
> --- --- --- ---
> Find my music on
> Youtube: http://www.youtube.com/c/victortsaran
> <http://www.youtube.com/vtsaran>
> Spotify: https://open.spotify.com/artist/605ZF2JPei9KqgbXBqYA16
> Band Camp: http://victortsaran.bandcamp.com
> [2  <text/html; UTF-8 (quoted-printable)>]

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici wb2una
         covici@xxxxxxxxxxxxxx

Follow-Ups:
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Tim Cross

References:
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Tim Cross
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Parham Doustdar
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Robert Melton
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: John Covici
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Tim Cross
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Parham Doustdar
- Re: [Emacspeak] TTS Server Implementation Questions
  - From: Victor Tsaran

Prev by Date: Re: [Emacspeak] TTS Server Implementation Questions
Next by Date: Re: [Emacspeak] TTS Server Implementation Questions
Previous by thread: Re: [Emacspeak] TTS Server Implementation Questions
Next by thread: Re: [Emacspeak] TTS Server Implementation Questions
Index(es):
- Date
- Thread

|Full archive May 1995 - present by Year|Search the archive|

If you have questions about this archive or had problems using it, please contact us.

Contact Info Page