Soft Sciences Vs. Hard Sciences

teo123 · Post by **teo123** » Tue Feb 26, 2019 7:02 am

brimstoneSalad wrote: Once in a blue moon you find a relic to confirm things.

We've been over this right in this thread. Let me paraphrase what I said earlier, and you refused to respond to that.
In both natural and social sciences, confirmations of theories are often accidental.
When Hrozny discovered something that turned out to confirm the Saussure's Laryngeal Theory, he was looking for something completely unrelated to that.
And, as I've read in the Stephen Hawking's popular-science book Theory of Everything, the Big Bang Theory was confirmed just as accidentally. People who built the equipment that detected that background radiation weren't looking for the background radiation, they were looking for a way to efficiently communicate with the satellites in orbit. They, like Hrozny, weren't even aware of the Big Bang Theory, they first assumed that background radiation was an illusion produced by wrong measurements.

brimstoneSalad wrote:We should accept what experts in a field tell us unless something else in a more rigorous, scientifically harder, field contradicts that.

And that's a very hard thing to do properly. Physics is certainly more reliable than sociology, but it's also a lot harder to properly understand. Think of when I thought that airplanes contradicted basic physics, and I refused to take the sociological evidence that airplanes exist seriously, because of that.
And, right now, I believe I've found linguistic evidence that, contrary to what the mainstream history says, the Illyrian language didn't die out in antiquity, but that it existed until at least the 12th century (when the Third Slavic Palatalization ceased to operate in Croatian, yet the Havlik's Law still operated, as can be seen in the toponym "Kalnik"). Am I justified to believe that? Nobody knows.

brimstoneSalad · Post by **brimstoneSalad** » Tue Feb 26, 2019 3:23 pm

teo123 wrote: ↑Tue Feb 26, 2019 7:02 am In both natural and social sciences, confirmations of theories are often accidental.

It's fine for confirmation to be accidental sometimes, but the point in hard sciences is that they don't need to be.
If you wanted to confirm the Big Bang by detecting background radiation deliberately, you could do it.

You can't always rely on it even being possible to confirm linguistic theories, because the needed information could have been lost.

Most of the time in hard sciences, the practice follows specific methodology where an experiment is done for results to support or disprove a hypothesis. That's just not reliable with softer sciences.

teo123 wrote: ↑Tue Feb 26, 2019 7:02 amAnd that's a very hard thing to do properly. Physics is certainly more reliable than sociology, but it's also a lot harder to properly understand. Think of when I thought that airplanes contradicted basic physics, and I refused to take the sociological evidence that airplanes exist seriously, because of that.

You could have also just asked a physicist if airplanes contradicted basic physics. And ultimately I explained to you how they didn't.

teo123 wrote: ↑Tue Feb 26, 2019 7:02 am And, right now, I believe I've found linguistic evidence that, contrary to what the mainstream history says, the Illyrian language didn't die out in antiquity, but that it existed until at least the 12th century (when the Third Slavic Palatalization ceased to operate in Croatian, yet the Havlik's Law still operated, as can be seen in the toponym "Kalnik"). Am I justified to believe that? Nobody knows.

It's a softer science so it's not as unreasonable to believe that than that airplanes were impossible. The impetus to believe mainstream linguistic theories is weaker because the evidence supporting them is weaker.

It's hard to quantify, though. Is believing airplanes are impossible a hundred times less reasonable? A thousand?
Within hard sciences we can quantify it based on the p value you're rejecting... not so in soft sciences.

teo123 · Post by **teo123** » Wed Feb 27, 2019 10:33 am

brimstoneSalad wrote:If you wanted to confirm the Big Bang by detecting background radiation deliberately, you could do it.

Yes, but at the time that theory was first proposed, that couldn't have been done, and it was not known if it would ever be possible.
You could say "Of course it was expected with a high degree of certainty that it would be possible to test that background radiation at some time in the near future.", but the history is full of examples that would suggest otherwise.
Nikola Tesla, for example, suggested, at the beginning of the 20th century, that electrical computers would be impossible.
The Wright brothers were convinced that an airplane flying over the Atlantic would be impossible.
And I am sure you can find more such examples easily...
The parallels between the Big Bang Theory and the Laryngeal Theory are striking. Both of them make very specific and precise predictions, both of those predictions were not testable at the time and it was unclear if they would ever be testable, and both of them were confirmed accidentally half a century or a century later. If you accept that the Big Bang Theory is a valid scientific theory, then you should also accept that the Laryngeal Theory is a valid scientific theory, even though you don't need to accept that my interpretation of the Croatian toponyms is one.

brimstoneSalad wrote: the practice follows specific methodology where an experiment is done for results to support or disprove a hypothesis.

I am not at all convinced that's necessary. If I understood you correctly, you implied that the reason English language can be learned and studied scientifically is because it's a living language, right?
Well, guess what, it's also possible to learn a language that's been dead for millennia, and have a conversation in it. I know that, because I've done that myself:
https://www.textkit.com/greek-latin-for ... 88#p202188 (my nickname there is FlatAssembler).
If your epistemological philosophy can't deal with the fact that it's possible to learn Latin even though no method can generate new data about it, I'm afraid you cannot explain anything. And communicating in long-dead languages was not an exception, it was the norm less than two centuries ago. You know, Latin, Classical Chinese, Sanskrit...

brimstoneSalad wrote:Within hard sciences we can quantify it based on the p value you're rejecting

Well, in the paper I wrote a few days ago, I calculated a few more p-values. I've calculated, using the data from Wiktionary, that the p-value of the pattern that the Indo-European languages tend to have the same phoneme at the beginning of the words meaning "two", "ten", "tooth" and "house" was around 1%. I was a bit surprised by that, I expected it to be much lower, but that's what the math points to. I again included the calculation of the p-value of that k-r-pattern in the Croatian hydronyms I've noticed, it's around 1/10'000. So, by that logic, it's more unreasonable to reject my theory than to reject mainstream linguistics, right?
Some of the patterns I use to support my theories admittedly have weak p-values. The p-value of the pattern I noticed that the Proto-Indo-European 's' corresponds to the Proto-Austronesian 'q' is around 6%.
Still, I am not sure p-values are what matters here.

brimstoneSalad · Post by **brimstoneSalad** » Thu Feb 28, 2019 2:55 am

teo123 wrote: ↑Wed Feb 27, 2019 10:33 am Yes, but at the time that theory was first proposed, that couldn't have been done, and it was not known if it would ever be possible.

We already covered this with geocentrism. When it was first proposed, and without any empirical evidence, it would have been reasonable to doubt the *model*.

Again, I also talked about string theory as a model that fails to make any testable predictions. Totally reasonable to doubt it. String theory is arguably not good science (depending on what you expect from it), but it's not a very hard science right now. String theory, in that sense, is much more like the modeling done in linguistics.

I mentioned earlier how broad scientific categories are not homogeneous with respect to the amount of evidence and testability.
If you circle a broad category, there are inevitably going to be parts of "linguistics" which are harder than parts of "physics" (like string theory, which a lot of physicists are a little embarrassed about how much attention it gets). You'll always be able to cherry pick exceptions because these are broad categories with a lot of people working in the field; people who have different mindsets and standards for scientific rigor.

You can see some fields like psychology which have dramatic differences internally, with some researchers who are very rigorous, and other speculators who just make up ad hoc hypotheses and rake in research funding with exaggerated claims and even (sometimes) outright fraud.

We can, however, look at typical examples, and particularly matters of established consensus, to get a sense of how rigorous the field is as a whole.
We can also look at, as I've explained, how viable experimental falsification is for the field. Like I explained, psychology is harder due to all of the variables that are nearly impossible to control for: but convincing robots might change that.
Linguistics, likewise, might be revolutionized once we can really simulate the human brain on computers and see how language works in a way we have never been able to before.

Being a soft science isn't being doomed to *always* be a soft science, and it doesn't mean everything in the field is equally soft... just that most of it is, and it's going to take some work to get beyond that.

teo123 wrote: ↑Wed Feb 27, 2019 10:33 amThe parallels between the Big Bang Theory and the Laryngeal Theory are striking. Both of them make very specific and precise predictions, both of those predictions were not testable at the time and it was unclear if they would ever be testable, and both of them were confirmed accidentally half a century or a century later. If you accept that the Big Bang Theory is a valid scientific theory, then you should also accept that the Laryngeal Theory is a valid scientific theory, even though you don't need to accept that my interpretation of the Croatian toponyms is one.

The Laryngeal Theory may be one of the hardest parts of linguistics. Cosmology is a softer side of physics.
However, the Big Bang theory has not merely been confirmed by accidental observation. Microwave background radiation has also been measured much better since then.

Elemental abundance in old star systems (and the relative ages of star systems etc.) also confirms the Big Bang.

Beyond that, the Big Bang theory also stands alone as a plausible explanation. The Steady State theory has not been able to explain expansion (which the the reason the Big Bang theory came about).

Nothing about accidental confirmation happening is an issue; it is when you can ONLY confirm things by dumb luck that you have a problem.
There's no reason the microwave background radiation could not ultimately have been tested. But when it comes to linguistics, confirming evidence could be lost in time.

Hard science benefits from happenstance sometimes, but it is not at the mercy of happenstance.
Sometimes it takes a billion dollar machine that hasn't yet been built, but there's no worry that the needed information has simply been lost.

teo123 wrote: ↑Wed Feb 27, 2019 10:33 amIf I understood you correctly, you implied that the reason English language can be learned and studied scientifically is because it's a living language, right?

No.

teo123 wrote: ↑Wed Feb 27, 2019 10:33 amWell, guess what, it's also possible to learn a language that's been dead for millennia, and have a conversation in it. I know that, because I've done that myself:

Those languages survived in scripture which people continually read, and for which scholars have long learned and used these languages academically.

teo123 wrote: ↑Wed Feb 27, 2019 10:33 amIf your epistemological philosophy can't deal with the fact that it's possible to learn Latin even though no method can generate new data about it,

You can learn Klingon and make up new data about it, as long as everybody speaking it agrees.
Language becomes true by consensus of its speakers, which make it very different from external truths.

teo123 wrote: ↑Wed Feb 27, 2019 10:33 amI've calculated, using the data from Wiktionary, that the p-value of the pattern that the Indo-European languages tend to have the same phoneme at the beginning of the words meaning "two", "ten", "tooth" and "house" was around 1%. I was a bit surprised by that, I expected it to be much lower, but that's what the math points to. I again included the calculation of the p-value of that k-r-pattern in the Croatian hydronyms I've noticed, it's around 1/10'000. So, by that logic, it's more unreasonable to reject my theory than to reject mainstream linguistics, right?

I have not read your paper, but if you've done more rigorous work, then yes it could be more unreasonable to reject your theory.
However, as I am not invested in linguistics and have not taken the time to study and evaluate your work, for me it is more reasonable to believe mainstream as the default.

However, if you keep it up, you may revolutionize linguistics and turn it into a harder science. I will have to wait for your peers to recognize that and for it to become the new consensus, though.

teo123 · Post by **teo123** » Sat Mar 02, 2019 6:13 am

The Laryngeal Theory may be one of the hardest parts of linguistics.

The hardest part of linguistics is probably phonetics. You just can't deny that modern speech recognition software, based on phonetics, works much better than chance.

Cosmology is a softer side of physics.

Well, I feel much more certain that the Earth is round than that subatomic particles really work the way modern physics tells us they do.

but there's no worry that the needed information has simply been lost

Maybe not. Had the Big Bang happened much earlier than it did, we wouldn't be able to observe distant galaxies at all, they would be too far. Similarly, the information about the evolution of the first living cells (hypothetical RNA world...) has also been lost to time.

Language becomes true by consensus of its speakers, which make it very different from external truths.

This is like saying that things have value because people agree they have value, and that therefore economics can't be a real science.

I have not read your paper

The first version of the paper I submitted has been rejected because of allegedly being unclear. I doubt it would really be unclear to an actual PhD linguist (which none of the reviewers of that journal are). I think they are scrutinizing the style too much while scrutinizing the methodology too little, and that's, if you ask me, a very bad thing. I'll post it when I re-edit it (it's also hard to find time for that given how much time I have to spend learning physics at the university). For now, you still have most of the points on my web-page, in English (my paper is written in Croatian):
http://flatassembler.000webhostapp.com/toponyms.html

teo123 · Post by **teo123** » Wed Mar 06, 2019 2:10 am

OK, for those of you interested in this discussion, there is an explanation for the @Red's comments being so nonsensical. Namely (s)he appears to be ignorant of the 5th-grade linguistics and to think that sound laws are some sort of legal laws rather than scientific laws. Seriously, see here.

Though I am also having real-life problems with people who are convinced they know something about linguistics but actually don't. Namely, the redactor of the Pozega Ethnological Journal, which I submitted my text to, has spent more than a week writing a two-page long commentary on why my paper got rejected... without addressing the arguments I presented in the text at all. His basic arguments appear to be that, because he doesn't understand my arguments, he suspects that most readers also won't be able to understand them, and that I am supposedly not using a style typical for historical research.
This would be funny if it didn't actually happen. This way, it's even insulting. I mean, of course he doesn't understand most of my arguments, he is neither a linguist nor a mathematician. And an attempt to make science appear understandable to people outside of field is bad for many reasons. First of all, it's time consuming. Second, it inevitably leads to loss of precision. Third, it will make people think they understand things they don't actually understand.
And of course I am not writing in the style typical for historical research, I am writing about historical linguistics, not so much about history itself. The redactor says it's a sign I am not doing real science, and the irony in that statement is immense. A real science would eagerly accept an attempt to mathematically model the Croatian toponyms, especially by somebody who has been studying them for years. A legitimate scientific journal would scrutinize the method I am using, rather than the style I am using.
It's hard for me to fully understand what's going on. Mathematical modeling is everywhere in linguistics. Things in phonetics aren't even taken seriously today without mathematical modeling. Even the probably-illegitimate fields of linguistic research, such as phonosemantics, use mathematical models. Yet, if you try to publish a paper that attempts to mathematically model Croatian toponyms, your paper ends up rejected because of that. Like, my papers were accepted back when I didn't fully understand the field and was basically just regurgitating what I've read elsewhere, but now, when I finally have enough knowledge of the field to think my own head about it, my papers get rejected.
If the truth is on my side, it will come to light sooner or later, but this is really annoying.
My mathematics professor has also read my paper and has suggested that I publish a part of it in mathematics journal, but then many of my arguments (those based solely on historical linguistics) will not be published. It's hard to decide what to do now, I've put so much effort researching that and I don't have much time now (and won't have in near future) to seek which journal will publish my findings. For God's sake, they are getting a text written by somebody who has been studying the subject for years for free, and they are refusing to publish it.

teo123 · Post by **teo123** » Wed Mar 06, 2019 2:12 pm

Also, @brimstoneSalad, I'd like to read from you explaining your "ten red apples"-analogy in little more detail. If you took a random sample of 10 apples and all of them happened to be red, the right inference from that would be that all or almost all apples are indeed red, right? So too, if there were only ten apples left in the entire world and all of them happened to be red, and we knew nothing else about apples, the right inference from that would be that all or almost all apples are red. Yes, the p-value isn't going to be great (the sample is very small and there are many properties we can look for other than color), but that would still be the right inference. So too, if there were only 100 apples in the world, 90 of them were red and 10 of them were green, and all the green apples had split roots, why wouldn't it be the right inference that the split roots cause apples to be green?
So too, if all the surviving words with Proto-Slavic yers in modern Slavic languages obey the Havlik's law, the right inference from that would be that all or almost all words with Proto-Slavic yers indeed did obey the Havlik's law. That is, unless we assume there is some unknown mechanism that made only the words that obeyed the Havlik's law survive, a "possibility" that the Occam's Razor would eliminate immediately.

brimstoneSalad · Post by **brimstoneSalad** » Thu Mar 07, 2019 5:35 am

teo123 wrote: ↑Sat Mar 02, 2019 6:13 am
The Laryngeal Theory may be one of the hardest parts of linguistics.
The hardest part of linguistics is probably phonetics. You just can't deny that modern speech recognition software, based on phonetics, works much better than chance.

One of. Yes, good point on phonetics. Broadly the fact that words are made up of certain sounds is also logically necessary in terms of informatics, so there's concordance there in terms of empirical observation and logical necessity.

teo123 wrote: ↑Sat Mar 02, 2019 6:13 am Well, I feel much more certain that the Earth is round than that subatomic particles really work the way modern physics tells us they do.

Perhaps you should not: there's a bias in that people have misplaced confidence in things they think they understand better. That's another topic, though.

teo123 wrote: ↑Sat Mar 02, 2019 6:13 am
but there's no worry that the needed information has simply been lost
Maybe not. Had the Big Bang happened much earlier than it did, we wouldn't be able to observe distant galaxies at all, they would be too far.

You mean unobservable due to redshift or what?

teo123 wrote: ↑Sat Mar 02, 2019 6:13 amSimilarly, the information about the evolution of the first living cells (hypothetical RNA world...) has also been lost to time.

Very true, which makes fields of study like abiogenesis more speculative.
We can discover ways in which life can arise, but it may be impossible to ever know for sure how it did arise... unless we figure out there's only one possible way.

You can absolutely compare those things to the loss of information for linguistics, and when we vere into that kind of unfalsifiable speculation in any science it can soften things.

teo123 wrote: ↑Sat Mar 02, 2019 6:13 amThis is like saying that things have value because people agree they have value, and that therefore economics can't be a real science.

Like I said before, there's a more mathematical side which looks at ideal markets and rational agents given a central goal, and then there's the psychology side.

Not sure what your point is.

brimstoneSalad · Post by **brimstoneSalad** » Thu Mar 07, 2019 6:03 am

teo123 wrote: ↑Wed Mar 06, 2019 2:12 pmIf you took a random sample of 10 apples and all of them happened to be red, the right inference from that would be that all or almost all apples are indeed red, right?

Ah, not really. If you really know the sample is truly random (and that's hard to achieve) you can get a very weak sense of probability from that relative to different distributions... but that's only if you already had in mind to check if they're red or green specifically.

I'm not sure how I can easily explain why finding correlations in an extant data set isn't the same without creating a detailed data set and showing you.
I already talked about that xkcd comic that showed the principle...

But let's put that aside and say you are really just taking a random sample to find out something pre-determined. That's a terrible sample:

What are the odds of taking ten red samples from an infinite pool of 50%-50% red and green?
Actually not that low.
But even higher from a pool with mostly red (but by no means almost all red).

If there are even just 75% red apples, you get .75^10 which is almost a 6% chance.
I don't know what makes a law, but that doesn't sound like one.

And skewed much more if it's not actually a random sample... like it was biased by apples you have access to, or that didn't rot or get eaten since you're looking at old apples.

teo123 wrote: ↑Wed Mar 06, 2019 2:12 pmSo too, if there were only ten apples left in the entire world and all of them happened to be red, and we knew nothing else about apples, the right inference from that would be that all or almost all apples are red.

Not at all.
That sample is likely not very random at all.
You can't say very much about what color all apples *were*.

You could if you had additional information about how different color apples rot (e.g. at the same speed), or production and demand (e.g. no production issues relative to red:green).

teo123 wrote: ↑Wed Mar 06, 2019 2:12 pmwhy wouldn't it be the right inference that the split roots cause apples to be green?

Correlation isn't causation. Like I said before, it's also finding a correlation you didn't plan to look for before you got the data set. That's an issue. Of course there are going to be random and meaningless correlations to find in there, like shapes in clouds.

teo123 wrote: ↑Wed Mar 06, 2019 2:12 pmThat is, unless we assume there is some unknown mechanism that made only the words that obeyed the Havlik's law survive, a "possibility" that the Occam's Razor would eliminate immediately.

I think it's pretty well understood that there are cultural preferences that are pretty much arbitrary.

The law might be more credible if there wasn't an exception.
It's like having a green apple in the mix of a small sample. That shakes things up quite a bit. In a sample of ten it multiplies the probability of getting such a mix out of an actually 50-50 situation by ten. Smaller samples are even worse.

...It would be more credible still if it were theorized before any of the samples were taken.
Only figuring out what you're looking for after you gain access to the data isn't legit.

brimstoneSalad · Post by **brimstoneSalad** » Fri Mar 08, 2019 6:32 am

@teo123

To explain the apple thing more... here are the results from a sample of ten hypothetical apples, for a few qualities they have:

5 are red and 5 are green
4 have split stems, 6 have unsplit stems
2 have bent stems, 8 have straight ones
5 have speckles, 5 are smooth
1 is tall (oval), 9 are round

What conclusions do you think you can draw from that?
Do you think you can say that most apples are round, and most (but fewer than round) have straight stems?

Well, you'd be wrong. I'll tell you what conclusions you can draw from that: NONE!

Here's the data set I drew from:
0100101110 (for red vs green color)
1101001110 (for stem split)
1101101111 (for stem bend)
0111100010 (for speckles)
0111111111 (for ovalness vs roundness)

...It was a randomly generated set.
Go here and click a few times: https://onlinerandomtools.com/generate- ... ry-numbers
It doesn't take long to realize how heterogeneous the results are.
This is just like the xkcd comic on jelly beans: https://xkcd.com/882/

When you look at a random data set, you WILL find random correlations somewhere.
If you don't know *ahead* of time what you're testing, working backwards will always give you some result, but it will also be a worthless result because in random noise there will be seemingly strong correlations that actually mean nothing. You can always form an ad-hoc narrative to explain random noise, but the very process you're using is what makes it not meaningful.

Do you understand now? Or does this need more explanation?

Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences

Re: Soft Sciences Vs. Hard Sciences