?

Log in

No account? Create an account

Silnith’s Lair

Apr. 18th, 2018

06:10 am - Using formal parsing methods on natural languages.

I've been thinking about this for years. I really love compilers, so much that I took the class on it multiple times in school. Possibly my favorite aspect of it is the formal theory built around parsing. The concept of a right-recursive descent parser is beautiful, both elegant and effective. For computer languages, it is unambiguously the best way to parse them for compilation.

But for years I've been thinking about how the concept should be useful for parsing natural languages, too. The basic idea is that you build a grammar for the language, then take a sentence of the language and parse it according to that grammar. The parser first tokenizes the sentence into a string of tokens, then tokens are matched to grammar rules to identify what grammatical construct they represent. Based on that you build a syntax tree representing what the sentence (for lack of a better word) means. The beauty here is that the parser does not try to guess what syntactical construct is coming next. It simply looks at what is there, and matches that against all possible legal grammatical constructs and picks the one that matches.

The core problem, of course, is ambiguity. In a computer language, ambiguity is unacceptable and hence disallowed. A grammar must be unambiguous for a parser to be built that can recognize it. For natural languages, ambiguity is not just allowed, it is prevalent. A lot of effective communication relies on ambiguity, and it is required for a lot of artwork and humor. Therefore in order to parse a natural language, the parser must be able to handle ambiguity gracefully.

In a recursive-descent parser, ambiguity manifests in the form of shift-reduce conflicts and reduce-reduce conflicts. Basically when applying a grammar, the parser finds more than one possible match in the grammar, and it does not know what action to take. In more precise terms, it finds multiple matching grammar rules and must decide (somehow) which one to apply. Again, in computer languages, ambiguity is not acceptable and so no grammar is allowed where such a conflict can happen. In theory. In practice it is a lot of painful work to come up with completely unambiguous grammars so most parser generators allow some ambiguous constructs and rules provided that all resulting conflicts are resolved somehow. A simple example would be, For all shift-reduce conflicts, shift.

What if, instead of statically choosing one grammar rule to use for parsing, we non-deterministically check all possible parses? In the same fashion as a deterministic finite state automata versus a non-deterministic finite state automata, we could create a non-deterministic right-recursive descent parser. In the case of conflict, instead of only applying one rule, we apply all rules. If no grammatical rules match, then of course nothing is applied.

In this way we can do something much more analogous to what people do in their heads when reading; we keep track of multiple possible interpretations of what they are reading and resolve down to one when they reach the end of a sentence. (Oftentimes a human will need to go back and re-read a sentence that is ambiguous to try a different parsing of it. A computer with vastly more short-term memory could do it concurrently.)

There is a piece missing from this. A key part of writing a parser is writing the tokenizer. Grammars are written in terms of streams of tokens. In a computer language, a sequence of characters can be identified and mapped to a type of token before the grammar rules are applied. That is why most languages have reserved words and rules about how numbers must be formatted and the like. In a natural language, different words representing different parts of speech can have the same spelling. For example, the word tastes could be a plural noun or a present-tense third-person singular verb. Leading could be an adjective or a noun. Lead could be a present-tense first-person singular verb, a past-tense first-person singular verb, either of those in third-person form, a noun, or an adjective. Since natural language grammar rules depend on these parts of speech, there is ambiguity in the part of the parser normally separated out as the lexer. One natural solution to this would be to move the identification of part of speech out of the lexer and into the grammar itself. This would solve the ambiguity problem by reducing it to a known problem (an ambiguous grammar) and applying the solution we already have. A major drawback is that it would drastically increase the size of the grammar itself. (Insert mathematician joke here.) Another solution would be to extend the non-determinism into the lexer. Instead of the lexer passing a single definitive token type to the parser, it could pass an ambiguous token that contains a set of all possible interpretations. The grammar parser would then need to try each of these possibilities. This would keep the grammar smaller (and theoretically more pure) but would complicate the implementation of both the parser and lexer, and may require storing more state at runtime.

Non-deterministic algorithms are well-known for being problematic to implement in reality. That is why we have a formal method for converting an NFA into a DFA, so we can actually run it in a reasonable time using a reasonable amount of memory. Since at the time of writing I know of no formal method for converting a non-deterministic LR(n) parser into a deterministic LR(n) parser, we would need to fall back on holding multiple parser states in memory simultaneously. I believe this will not be a problem in practice. When it comes to compiling a computer program, most languages have grammatical rules akin to this: program := statement (semicolon statement)*. Natural languages have very few of these constructs and they are used infrequently. Most natural language sentences are relatively quite short. Even if applied to entire paragraphs at a time, the size of the input is minuscule compared to the size of input most compilers receive on a daily basis. Even exponential algorithms can be executed if the input size is small enough. The parser can prune non-matching states as it executes, so in most cases where the amount of ambiguity is small the actual runtime states kept in memory will also be small.

For the lexer-based approach to assigning parts of speech to each word, even though the size of the problem may at first seem daunting, I believe it is not an issue. We can imagine a database of every word in a natural language that contains mappings of every verb tense conjugation or noun declension. In practice this would be slightly larger than a normal dictionary. In practical terms this is small enough to be considered a trivial problem for any standard database. For efficiently looking up a word by one of its spelling variants, imagine this database as a table where each row is a word and each column is its conjugation or declension. Now imagine creating a database index for each column. At this point I can wave my hands and invoke standard database theory and claim that the issue is solved. In practice this will involve tries or B-trees or some such, since it is a known problem with known solutions I do not need to go into detail.

I'll write more later. Right now Whiny Cat is whining at me to go back to bed, so I'll go nap for a few more hours.

Current Mood: sleepysleepy

Feb. 21st, 2018

02:57 am - Can somebody help me, please?

I just watched The Opposition with Jordan Klepper, and he had on two students from the Parkland, Florida shooting. They were both well-spoken and clear, but Jordan was doing a comedic interview instead of a real interview so he never actually asked them what they were trying to achieve beyond do something. (Do something is what saddled us with the gate-raping TSA and similar atrocities. Do something virtually always leads to disaster.)

Does anybody have a link or reference where either of these young women outline what it is they want to happen? I managed to find their Twitter accounts, but not much useful information.

Delaney Tarr
Carly Novell

Carly referenced a Florida bill HB231.

Jaclyn Corin references SB 1434.

I suppose I'll have to dig through the Florida state legislature website tomorrow to try and see what these bills do. I really want to know how any of these proposed laws would have stopped Nikolas Cruz.

Feb. 19th, 2018

01:22 am - Let's talk about gun control.

Seriously. We should talk about this.

Please post a comment outlining legislation you want passed. Include an explanation of how that legislation would have prevented at least one of the recent school shootings. I'll provide some starting points.

Those are just school shootings. For non-school shootings, here are a couple starters:

As the students of the Florida shooting are demanding, We need to do something! So let's do something. Propose a new law that could have prevented at least one of these events.

Jul. 20th, 2017

11:35 pm - Russian adoptions.

Trump Jr. says his meeting with a lawyer for the Russian government only talked about adoptions. He thinks this somehow exonerates him because he wasn’t colluding with a foreign government representative. Here’s what needs to be repeated as often and loudly as possible.

When the Russian government talks about Americans adopting Russian orphans, what they mean is Lift US sanctions against Putin and his murdering associates.

This isn’t immediately obvious, which is why Putin’s agents are doing it. But have no doubt at all, whenever a Russian agent says adoptions, they mean lift US sanctions and nothing else. So what Trump Jr. is actually saying is, I met with a lawyer for the Russian government after the lawyer promised me illegal help with our campaign, and when we got to the meeting, the Russian government lawyer immediately started talking about lifting US sanctions on Putin’s friends.

Read that a few more times. This is blatant admission of criminal activity. There is no "if", "but", or "maybe". It is illegal. This is offering aid and comfort to an enemy of the United States, which is defined in the Constitution as treason.

Just to lay it all out as repeatedly as possible, this is the sequence. There was a Russian named Sergei Magnitsky who exposed a massive tax-theft scam by Putin’s associates. (Massive as in hundreds of millions of dollars equivalent.) Sergei Magnitsky was key in the investigation into Putin’s associates when he was murdered in jail pending trial.

In response to this obvious corruption and criminal activity, the US passed the Magnitsky Act in 2012 to freeze the American assets of Russian officials guilty of gross abuse of human rights (like murder).

In retaliation for the Magnitsky Act, Putin froze all the adoptions of Russian orphans by American parents, and told the Americans that the freeze would remain in place until they convinced Congress to repeal the Magnitsky Act. So Putin is responsible for adoptions being halted. The US is not responsible for that happening. And Putin has been very clear that it is inextricably linked to the sanctions.

So every time you hear Trump Jr., or a Republican talking head, or even Trump himself say that Putin or Russian government lawyers or Russian diplomats are only talking about adoptions, remember that they are really saying the Russians are trying to convince them to lift US sanctions against criminal Russian officials.

Trump himself just admitted to yet another undisclosed conversation with Putin, and he said it was centered around adoptions. That means that, once again, Trump is admitting to collusion with Putin against American interests. It is really not possible to overstate just how significant and damaging to America this all is.

Jun. 23rd, 2017

12:51 am - GOP "healthcare" bill

The GOP should name its healthcare bill KTP.

Kill The Poor

The modern Republican party wants poor people to die in agony. They really do. If you're a multi-millionaire, you get subsidized by everybody else's tax money. If you're poor, you die in agony from preventable illnesses.

How did the political party of my grandfather turn from a reasonable and rational collection of ideas into a fascist mouthpiece for billionaire propaganda?

Dec. 19th, 2016

12:09 am - message to the Electoral College

This is what I ended up mailing to the members of the Electoral College last week. Sadly, nearly 200,000 other people also mailed to them, so the chance my message was read is virtually zero.

This is an attempt to summarize the clear rational argument for why the Electoral College should reject Donald Trump on December 19th, and instead elect Bernie Sanders.

  1. President Trump is an existential threat to the United States of America.
    1. His Cabinet is filled with white nationalists. ( NYTimes DemographicsPro Mother Jones )
      1. "White nationalists" is what neo-Nazis named themselves to avoid the stigma of "Nazi". ( CNN Vanity Fair )
      2. They have already put forward the idea of re-introducing internment camps like we put Japanese-American citizens in during World War 2. ( Business Insider Reuters )
        1. President Reagan apologized for those internment camps, to bring them back is to dishonor his memory. ( NPR Smithsonian )
      3. There has been a surge in hate crimes since Trump's victory. This surge will become a pattern once Trump takes office. ( Forbes Slate )
    2. He plans to cut all funding for NASA's climate change research. ( The Guardian Deutsche Welle )
      1. The current warming projections depend on massive investments in carbon-capture technology. ( International Energy Agency Global CCS Institute The Guardian Springer )
        1. Carbon capture technology does not exist yet. ( The Verge Carbon Brief MIT Technology Review MIT Technology Review )
      2. Without NASA's research, we will see warming of at least double what is agreed to in the Paris accords. ( EPA The White House )
        1. The Paris accords have already been adopted by other major polluters such as China. ( Scientific American NY Times )
        2. The US is the largest polluter, without our action all other nations' actions are largely useless. ( The Atlantic )
    3. His Cabinet is filled with the very corrupt people he promised to eliminate from Washington. ( NYTimes Washington Post The Intercept )
    4. He will have complete control over the most powerful spy agencies in the world.
      1. The NSA can literally record all text and voice communication across the country. ( Wired Schneier on Security The Guardian CNET Vice )
        1. This information can end up in the hands of the FBI, the IRS, and local police. ( Reuters Techdirt Techdirt Extreme Tech )
        2. The potential for blackmail and other forms of control is limitless. ( Business Insider The Telegraph kieran healy NY Mag EFF The Intercept WashingtonsBlog Reuters Huffington Post )
      2. We have already built the nationwide apparatus for totalitarianism. The only thing we are missing is a strong leader to come into power and use it for ill purposes. ( University of Chicago Press Daily Kos McClatchy DC Wired )
      3. Donald Trump has a history of petty and vindictive behavior. ( )
  2. President-Elect Trump does not have a mandate.
    1. He lost the popular vote. ( Inquisitr CS Monitor USA Today IBT )
    2. Polling during the primary process showed he would have lost to Bernie Sanders by a landslide. ( Huffington Post Real Clear Politics )
      1. Those same polls predicted a very close race between Clinton and Trump, which is exactly what played out in the general election.
  3. The Electoral College is not legally bound by the popular vote. ( Politico )
    1. If it were, George W. Bush would not have been allowed into office his first term, he lost the popular vote but won the electoral vote.
  4. The Electoral College was originally intended to be a final government check on blind public sentiment.
    1. Alexander Hamilton wrote in the Federalist Papers that the Electors were meant to be men of conscience and wisdom who would select an appropriate leader without being swayed by the passions of capricious public sentiment. ( Wikipedia )
  5. We cannot elect Donald Trump.
    1. His stated positions and actions match the textbook definition of fascism. ( The New Yorker Salon Slate Newsweek )
      1. Our grandparents died by the millions fighting fascism in Europe. To allow it to take hold in the US is to spit on their memories.
    2. Russia ran a concerted campaign to support Donald Trump. ( Associated Press USA Today CNN )
      1. Eighteen different national intelligence agencies have all said Russia was openly and deliberately trying to sway the American election for Trump. ( Chicago Tribune NY Times FactCheck.org )
      2. When Trump's victory was announced, the Russian parliament broke into a raucous ovation. ( The Independent Public Radio International The Daily Beast )
      3. Russia spent millions on propaganda efforts to smear Clinton.
        1. The claims that Clinton would start war with Russia originated in Russian-spread propaganda. ( Fortune Forbes The Gateway Pundit Foreign Policy Russia Insider )
      4. Russia hacked every one of Trump's opponents' campaigns, but did not hack Trump's campaign. ( Raw Story NBC News Wired )
      5. Allowing Trump to gain office is to allow Russia to succeed in their plan to subvert American democracy.
  6. We cannot elect Hillary Clinton. ( Salon Daily News Bin )
    1. Clinton's disapproval ratings are second only to Trump's. ( ABC News Gallup )
    2. A huge swath of the population voted specifically to deny Clinton the Presidency. ( Politics USA Hillary Men )
  7. The Electoral College must elect Bernie Sanders.
    1. The people have expressed a clear anger with the establishment parties.
      1. Both parties had "outsider" challengers that received huge public support.
      2. Electing any of the establishment Republican leadership will anger the majority of voters.
        1. Kasich is not a choice the public will accept.
        2. This anger will be bi-partisan.
    2. Bernie Sanders has bi-partisan appeal. ( Observer )
      1. He is a choice the public will accept.
    3. Bernie Sanders ran on a platform of economic populism that was very similar to Donald Trump's.
      1. There was a lot of overlap between those who supported Sanders and those who supported Trump. ( US News The Daily Beast The Atlantic Washington Post )
    4. Bernie Sanders' campaign was undercut by underhanded tactics by the DNC. ( Time Politifact Politico Washington Post )
      1. The DNC scheduled the fewest debates in recent history. ( Huffington Post Daily Caller )
      2. The DNC scheduled debates for low-viewership nights. ( Washington Post Politico National Review )
    5. Sanders is not a Democrat.
      1. Electing Sanders does not validate the DNC. It is a scenario the DNC specifically tried to prevent.
    6. Sanders is not endorsed by Vladimir Putin and the KGB.
      1. Electing Sanders is a strong rebuke of Russia's attempts to control America's democratic process. ( NY Times )

Oct. 23rd, 2016

08:47 pm - The Fourth Amendment

Yaay. My little history note actually got moderated up on Slashdot.

https://yro.slashdot.org/comments.pl?sid=9803457&cid=53132839

Jun. 25th, 2016

06:33 am - Previous post hidden

In an unsurprising development, it seems nobody who read my last post actually captured the message I was trying to convey. Consequently, I am pulling it. The intended viewers were people who watched The Daily Show, The Nightly Show, and Full Frontal regularly and had seen the last few episodes about gun control and the Orlando shooting.

Context really is everything. Without that context, the post made no sense. It gave the impression my political views were the opposite of what they actually are, and rather than try to explain all the discrepancies I am just going to pull the thing and forget it.

Current Mood: depresseddepressed

Jun. 14th, 2016

07:24 am - Serious question of the day

Come up with gun control legislation that would have prevented the assassination of Martin Luther King Jr. without disarming everybody in the Civil Rights Movement.

Note that all the major figures of the movement were investigated by the FBI, so that cannot be used as a metric for denying access to firearms.

Please, I would really like to see what solutions people can create.

May. 16th, 2016

12:31 am - Beware the bison!

And in today’s “You sweet, sweet dumb-ass” story, a father and son visiting Yellowstone put a bison calf in their SUV and drove it to a ranger station because they thought the calf was freezing to death.

I admire the duo for their compassion, but I call them morons for not understanding how well bison hide and fur insulates, and for thinking that separating a calf from its mother is a good idea. I'm also a tad surprised the mother did not destroy the SUV. Next time, just try to bring a ranger with you to the calf.

Here is the Fark thread.

Navigate: (Previous 10 Entries)